Median

from the Artful Common Queries page


Statistically, the median is the middle value--the value that is smaller than that found in half of all remaining rows, and larger than that found in the other half:

SELECT a.hours As Median
FROM BulbLife As a, bulbLife AS b
GROUP BY a.Hours
Having Sum( Sign( 1-Sign(b.Hours-a.Hours ))) = Floor(( Count(*)+1)/2)  ;

This formula works, but it doesn't scale—it's O(n*n)—and it's awkward to use.

So we posted a MySQL implementation of Torben Mogenson's algorithm for calculating the median (http://ndevilla.free.fr/median/median/node20.html). It's said to be amongst the fastest, but it proved too slow. Then Joe Wynne offered an algorithm which looks correct and does scale. Here it is as a MySQL stored procedure:

DROP PROCEDURE IF EXISTS Median;
DELIMITER |
CREATE PROCEDURE Median( tbl CHAR(64), col CHAR(64), OUT res DOUBLE )
BEGIN
  DECLARE arg CHAR(64);
  SET @sql = CONCAT( 'SELECT ((COUNT(*))/2) INTO @c FROM ', tbl );
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
  DROP PREPARE stmt;
  SET @a = CONVERT(FLOOR(@c), SIGNED);
  IF @a = @c THEN 
    BEGIN
      SET @a = @a-1;
      SET @b = 2;
      SET arg = CONCAT( 'AVG(', col, ')' );
    END;
  ELSE
    BEGIN
      SET @b = 1;
      SET arg = col;
    END;
  END IF;
  SET @sql = CONCAT('SELECT ', arg, ' INTO @res FROM (SELECT ', col, ' FROM ', tbl, 
                    ' ORDER BY ', col, ' LIMIT ?,?) as tmp');
  PREPARE stmt FROM @sql;
  EXECUTE stmt USING @a, @b;
  DROP PREPARE stmt;
  SET res=@res;
END;
|
DELIMITER ;

What we really need is a perfectly general function that scales and can be dropped into queries. As it turns out, we can have two of those three requirements—scalability and drop-in-ability. We can't have generality because MySQL functions can't invoke PREPARE. Still, two of three ain't bad. Stay tuned.

Return to the Artful Common Queries page