Basic aggregation

from the Artful Common Queries page


Aggregates are more popular than any other lookup topic here, by a ratio of more than 2 to 1.

Basic aggregation is the simplest grouping query pattern: for column foo, display the smallest, largest, sum, average or some other statistic of column bar values:

SELECT foo, MIN(bar) AS bar
FROM tbl
GROUP BY foo

Return the highest bar value for each foo, ordering top to bottom by that value:

SELECT foo, MAX(bar) AS Count
FROM tbl
GROUP BY foo
ORDER BY Count DESC;

Ditto for AVG(), COUNT() etc. The pattern easily extends to multiple grouping column expressions.

For aggregating functions like MIN() and MAX() that return a single value, there may be multiple instances of the result. If you wish to see them, put the aggregating query in a subquery and join to it from a direct query on the table:

SELECT a.foo, a.bar
FROM tbl a
JOIN (
  SELECT foo, MAX(bar) AS Count
  FROM tbl
  GROUP BY foo
) b ON a.foo=b.foo AND a.bar=b.count
ORDER BY a.foo, a.bar;

MySQL introduced the SQL extension GROUP_CONCAT(), which makes short work of listing items in groups. For example, given a table of suppliers and the parts they make ...

CREATE TABLE supparts(supID char(2),partID char(2));
INSERT INTO supparts VALUES
('s1','p1'),('s1','p2'),('s1','p3'),('s1','p4'),('s1','p5'),('s1','p6'),
('s2','p1'),('s2','p2'),('s3','p2'),('s4','p2'),('s4','p4'),('s4','p5');

List suppliers for each part:

SELECT partID,GROUP_CONCAT(supID ORDER BY supID) AS Suppliers 
FROM supparts 
GROUP BY partID;
+--------+-------------+
| partID | Suppliers   |
+--------+-------------+
| p1     | s1,s2       |
| p2     | s1,s2,s3,s4 |
| p3     | s1          |
| p4     | s1,s4       |
| p5     | s1,s4       |
| p6     | s1          |
+--------+-------------+

When there are several tables to be joined, the beginner may feel overwhelmed by the complexity of the problem. Suppose you're asked to retrieve the top computer desk salesperson for this schema:

drop table if exists salespersons, orders, orderlines, products;
create table salespersons(salespersonid int,name char(8));
insert into salespersons values(1,'Sam'),(2,'Xavier');
create table orders(orderid int,salespersonid int);
insert into orders values(1,1),(2,1),(3,1),(4,2),(5,2);
create table orderlines(lineid int,orderid int,productid int,qty int);
insert into orderlines values(1,1,1,1),(2,1,1,2),(3,2,2,1),(4,3,1,1),(5,4,1,1),(6,5,2,2);
create table products(productid int,name char(32));
insert into products values(1,'computer desk'),(2,'lamp'),(3,'desk chair');

Two rules of thumb help with problems like this: solve one step at a time, and work from inside out. Here "inside out" means start by building the join list needed to retrieve sales data:

from salespersons s
join orders       o using(salespersonid)
join orderlines   l using(orderid)
join products     p using(productid)  

Test those joins with a query that just lists sales data:

select s.name, p.name, l.qty
from salespersons s
join orders       o using(salespersonid)
join orderlines   l using(orderid)
join products     p using(productid)
+--------+---------------+------+
| name   | name          | qty  |
+--------+---------------+------+
| Sam    | computer desk |    1 |
| Sam    | computer desk |    2 |
| Sam    | lamp          |    1 |
| Sam    | computer desk |    1 |
| Xavier | computer desk |    1 |
| Xavier | lamp          |    2 |
+--------+---------------+------+

Now we just need to filter for 'computer desk' sales, add aggregation, and pick off the top seller:

select s.name, sum(l.qty) as n      -- sum quantities
from salespersons s
join orders       o using(salespersonid)
join orderlines   l using(orderid)
join products     p using(productid)
where p.name='computer desk'    
group by s.name                     -- aggregate by salesperson
order by n desc limit 1;            -- order by descending sum, pick off top value
+------+------+
| name | n    |
+------+------+
| Sam  |    4 |
+------+------+

If columns other than the GROUP BY columns must be retrieved, and if the grouping expression does not have a strictly 1:1 relationship with (ie., are not "functionally dependent" on) those columns, then to avoid returning arbitrary values for those non-grouping columns, you need to either add those columns to the Group By clause, or put the GROUP BY query in a subquery and join that result to the other columns, for example:

SELECT s.partID, s, thiscol, s.thatcol, anothercol, x.Suppliers
FROM supparts s
JOIN (
  SELECT partID,GROUP_CONCAT(supID ORDER BY supID) AS Suppliers 
  FROM supparts 
  GROUP BY partID
) x USING(partID)

If sql_mode includes ONLY_FULL_GROUP_BY, MySQL returns an error for a query that Selects non-aggregated columns that aren't functionally dependent on the grouping columns.

Before version 5.7.5 MySQL, with ONLY_FULL_GROUP_BY set, blindly rejected queries like this with the message that those columns are missing from the GROUP BY expression. Starting in 5.7.5 though, MySQL is better at detecting functional dependencies, and it won't generate such an error if it concludes that the ungrouped columns are functionally dependent on grouoed columns. If it decides they aren't functionally dependent, the message says that. For full discussion of this, see Roland Bouman's excellent discussion at http://rpbouman.blogspot.nl/2014/09/mysql-575-group-by-respects-functional.html

Return to the Artful Common Queries page