## Average top 50% values per group

### from the Artful Common Queries page

 Each row of a `games` table records one game score for a team: ```DROP TABLE IF EXISTS games; CREATE TABLE games(id INT, teamID INT, score INT); INSERT INTO games VALUES (1,1,3),(2,1,4),(3,1,5),(4,1,6),(5,2,6), (6,2,7),(7,2,8),(8,2,7),(9,2,6),(10,2,7); ``` How would we write a query that returns the average of the top 50% of scores per team? The per-team median value is its middle value--lower than the highest 50% and higher than the lowest 50% of values for that team--so a shortcut is to query the team medians, then aggregate on a join that selects per-team scores above the medians. How to find per-team medians? If a resultset has an odd number of rows, at least one row has the true median score. If it has an even number of rows, the median score is an average of two central values. This query adapts Joe Celko's formula in "SQL for Smarties" averaging "low" and "high" medians, substituting `If(...)` syntax for the cumbersome `Case When...` expressions .. ```DROP TABLE IF EXISTS medians; CREATE TABLE medians SELECT p1.teamid, AVG(P1.score) AS median FROM games AS P1, games AS P2 WHERE p1.teamid=p2.teamid GROUP BY p1.teamid HAVING ( SUM(If(P2.score<=P1.score,1,0)) >= (COUNT(*)+1)/2 AND SUM(If(P2.score>=P1.score,1,0)) >= (COUNT(*)/2+1) ) OR ( SUM(If(P2.score>=P1.score,1,0)) >= (COUNT(*)+1)/2 AND SUM(If(P2.score<=P1.score,1,0)) >= (COUNT(*)/2+1) ); +--------+--------+ | teamid | median | +--------+--------+ | 1 | 4.5000 | | 2 | 6.8333 | +--------+--------+ ``` Now join `games` to `medians` accepting only top-half values: ```SELECT g.teamid, AVG(g.score) AS Top50Avg FROM games g JOIN medians m ON g.teamid = m.teamid AND g.score >= m.median GROUP BY g.teamid ORDER BY Top50Avg DESC; +--------+----------+ | teamid | Top50Avg | +--------+----------+ | 2 | 7.2500 | | 1 | 5.5000 | +--------+----------+ DROP TABLE medians; ``` Yes, all the logic can be moved into one query: ```SELECT g.teamid, AVG(g.score) AS Top50Avg FROM games g JOIN ( SELECT p1.teamid, AVG(P1.score) AS median FROM games AS P1, games AS P2 WHERE p1.teamid=p2.teamid GROUP BY p1.teamid HAVING ( SUM(If(P2.score<=P1.score,1,0)) >= (COUNT(*)+1)/2 AND SUM(If(P2.score>=P1.score,1,0)) >= (COUNT(*)/2 + 1) ) OR ( SUM(If(P2.score>=P1.score,1,0)) >= (COUNT(*)+1)/2) AND SUM(If(P2.score<=P1.score,1,0)) >= (COUNT(*)/2+1 ) ) AS m ON g.teamid = m.teamid AND g.score >= m.median GROUP BY g.teamid ORDER BY Top50Avg DESC; ```Last updated 13 Mar 2020