Avoiding repeat aggregation

from the Artful Common Queries page


In a good introductory tutorial on MySQL subqueries, Jeremy Cole developed a triply nested query to retrieve the name, population, head of state, and number of official languages in countries with the most official languages. The query uses two tables from the MySQL world database:

CREATE TABLE country (
  Code char(3) NOT NULL DEFAULT '' PRIMARY KEY,
  Name char(52) NOT NULL DEFAULT '',
  Population int(11) NOT NULL DEFAULT '0',
  HeadOfState char(60) DEFAULT NULL,
  ... other columns ...
);
CREATE TABLE countrylanguage (
  CountryCode char(3) NOT NULL DEFAULT '' PRIMARY KEY,
  Language char(30) NOT NULL DEFAULT '',
  IsOfficial enum('T','F') NOT NULL DEFAULT 'F',
  Percentage float(4,1) NOT NULL DEFAULT '0.0'
);

The query needs to aggregate language counts by country twice: once for all language counts by country, and again to identify countries with the highest number of languages:

SELECT name, population, headofstate, top.num
FROM Country
JOIN ( 
  SELECT countrycode, COUNT(*) AS num
  FROM CountryLanguage
  WHERE isofficial='T'
  GROUP BY countrycode
  HAVING num = (
    SELECT MAX(summary.nr_official_languages)
    FROM  (
      SELECT countrycode, COUNT(*) AS nr_official_languages
      FROM CountryLanguage 
      WHERE isofficial='T' 
      GROUP BY countrycode
    ) AS summary
  )
) as top ON Country.code=top.countrycode;
+--------------+------------+-------------+-----+
| name         | population | headofstate | num |
+--------------+------------+-------------+-----+
| Switzerland  |    7160400 | Adolf Ogi   |   4 |
| South Africa |   40377000 | Thabo Mbeki |   4 |
+--------------+------------+-------------+-----+

This is fine with small tables—the query returns in about 25 msec on a modest modern machine. But if the table being aggregated is very large and if the aggregation is complex, the query will be slow. Substituting a temporary table for the double nesting can improve performance in two ways—(i) the aggregation needs to be done just once, (ii) we can use an exclusion join rather than a HAVING clause:

DROP TABLE IF EXISTS tops;
CREATE TABLE tops ENGINE=MEMORY
  SELECT countrycode, COUNT(*) AS num
  FROM CountryLanguage l1
  WHERE isofficial='T'
  GROUP BY countrycode;

SELECT name,population,headofstate,t3.num
FROM country c
JOIN (
  SELECT t1.countrycode, t1.num
  FROM tops t1
  LEFT JOIN tops t2 ON t1.num < t2.num
  WHERE t2.countrycode IS NULL
) AS t3 ON c.code=t3.countrycode;
+--------------+------------+-------------+-----+
| name         | population | headofstate | num |
+--------------+------------+-------------+-----+
| Switzerland  |    7160400 | Adolf Ogi   |   4 |
| South Africa |   40377000 | Thabo Mbeki |   4 |
+--------------+------------+-------------+-----+
DROP TABLE tops;

You notice we haven't actually used a TEMPORARY table? MySQL allows temporary tables to be referenced just once in a query. We use a MEMORY table as a temp table.

If your MySQL version is at least 8.0.11 or your MariaDB version is at least 10.2, Common Table Expressions (CTEs) are available to you. CTEs offer elegant syntax for temporary tables. Indeed that'll do as a CTEs rule of thumb: if the query uses a temporary table, fold code for creation and population of the temp table into a WITH clause, and you're done. Thus:

WITH tops AS (
  SELECT countrycode, COUNT(*) AS num
  FROM CountryLanguage l1
  WHERE isofficial='T'
  GROUP BY countrycode
)
SELECT name,population,headofstate,t3.num
FROM country c
JOIN (
  SELECT t1.countrycode, t1.num
  FROM tops t1
  LEFT JOIN tops t2 ON t1.num < t2.num
  WHERE t2.countrycode IS NULL
) AS t3 ON c.code=t3.countrycode;
+--------------+------------+-------------+-----+
| name         | population | headofstate | num |
+--------------+------------+-------------+-----+
| Switzerland  |    7160400 | Adolf Ogi   |   4 |
| South Africa |   40377000 | Thabo Mbeki |   4 |
+--------------+------------+-------------+-----+
2 rows in set (0.01 sec)

It's about as fast as the query using a MEMORY table. But more elegant, more modern.

Return to the Artful Common Queries page