Row_Number()

from the Artful Common Queries page


ISO SQL defines ROW_NUMBER() OVER as a "windowing" or "analytic" function with an optional PARTITION clause for generating a derived row number column in a resultset. Several RDBMSs—including DB2, Oracle and SQL Server—have long implemented it. It came into MariaDB with version 10.0.0, into MySQL with version 8.0.2.

The whole list of windowing/analytic functions:

CUME_DIST() Cumulative distribution value
DENSE_RANK() Rank of current row in its partition without gaps
FIRST_VALUE() Argument value from first row of window frame
LAG() Argument value from row lagging current row in partition
LAST_VALUE() Argument value from last row of window frame
LEAD() Argument value from row leading current row within partition
NTH_VALUE() Argument value from Nth row of window frame
NTILE() Bucket number of current row in its partition
PERCENT_RANK() Percentage rank value
RANK() Current row rank within its partition, with gaps
ROW_NUMBER() Current row number in its partition

Here is the simplest possible Row_Number() example. Given a table with two columns i and j, generate a resultset that has a derived sequential row_number column taking the values 1,2,3,... for a defined ordering of j which resets to 1 when the value of i changes:

DROP TABLE IF EXISTS test;
CREATE TABLE test(i int,j int);
INSERT INTO test VALUES 
(3,31),(1,11),(4,14),(1,13),(2,21),(1,12),(2,22),(3,32),(2,23),(3,33);

The result must look like this:

+------+------+------------+
| i    | j    | row_number |
+------+------+------------+
|    1 |   11 |          1 |
|    1 |   12 |          2 |
|    1 |   13 |          3 |
|    2 |   21 |          1 |
|    2 |   22 |          2 |
|    2 |   23 |          3 |
|    3 |   31 |          1 |
|    3 |   32 |          2 |
|    3 |   33 |          3 |
|    4 |   14 |          1 |
+------+------+------------+

The SQL Server query that gives the correct answer is:

SELECT
  i, j,
  ROW_NUMBER() OVER ( PARTITION BY i ORDER BY j ) AS row_number
FROM test
ORDER BY i,j;

This works in MariaDB 10.2 and MySQL 8.0.2. Are you stuck if you can't use those versions? No. Here are two ways to get the desired result without Row_Number().

First, with user variables:

SET @iprev=0, @jprev=0;
SELECT i, j, row_number
FROM (
  SELECT j, @jprev := if(@iprev = i, @jprev+1, 1) AS row_number, @iprev := i AS i
  FROM test
  ORDER BY i,j
) AS tmp;

The second method uses a join and aggregation, but is correct only if there are no duplicate values of j

SELECT a.i, a.j, count(*) as row_number
FROM test a
JOIN test b ON a.i=b.i AND a.j >= b.j
GROUP BY a.i, a.j ; 

To simplify generalising these query patterns to multiple ordering columns, use self-documenting column names:

DROP TABLE IF EXISTS test;
CREATE TABLE test(partition int, ord1 int, ord2 int);
INSERT INTO test 
VALUES (3,31,55),(1,11,19),(4,14,23),(2,22,8),(2,22,42),(1,13,56),
       (2,21,77),(2,21,7),(1,12,17),(2,23,92),(3,32,24),(3,33,62);

The user variable solution easily handles multiple ordering columns: just add the ordering column names to the outer query's SELECT list and the inner query's ORDER BY list:

SET @partitionPrev=0, @ordPrev=0;
SELECT partition, ord1, ord2, row_number
FROM (
  SELECT 
    ord1, ord2,
    @ordPrev := If(@partitionPrev = partition, @ordPrev+1, 1) as row_number, 
    @partitionPrev := partition AS partition
  FROM test
  ORDER BY partition, ord1, ord2
) AS tmp;
+-----------+------+------+------------+
| partition | ord1 | ord2 | row_number |
+-----------+------+------+------------+
|         1 |   11 |   19 |          1 |
|         1 |   12 |   17 |          2 |
|         1 |   13 |   56 |          3 |
|         2 |   21 |    7 |          1 |
|         2 |   21 |   77 |          2 |
|         2 |   22 |    8 |          3 |
|         2 |   22 |   42 |          4 |
|         2 |   23 |   92 |          5 |
|         3 |   31 |   55 |          1 |
|         3 |   32 |   24 |          2 |
|         3 |   33 |   62 |          3 |
|         4 |   14 |   23 |          1 |
+-----------+------+------+------------+

In the aggregating solution for multiple columns, the theta join gets more complex as ordering columns are added:

SELECT a.partition, a.ord1, a.ord2, count(*) as row_number
FROM test a
JOIN test b 
  ON a.partition=b.partition 
  AND (a.ord1>b.ord1 OR (a.ord1=b.ord1 AND a.ord2>=b.ord2))
GROUP BY a.partition, a.ord1, a.ord2 ;  


Return to the Artful Common Queries page