What else did buyers of X buy?

from the Artful Common Queries page


We often want to know how certain column values associate with other column values, for example "What else did buyers of x buy?", or "What projects did Sean, Ian and Gerard all work on?"

Start with buyers of x. The table that summarises this information might be a View that encapsulates joins from customers to orders to orderitems to products, perhaps scoped on a recent date range. Here we ignore all such detail. We focus only on the SQL patterns that solve this kind of problem:

DROP TABLE IF EXISTS userpurchases;
CREATE TABLE userpurchases( custID INT UNSIGNED, prodID INT UNSIGNED );
INSERT INTO userpurchases 
VALUES (1,1),(1,2),(2,4),(3,1),(3,2),(4,2),(4,3),(5,1),(5,2),(5,3);
SELECT custID, GROUP_CONCAT(prodID ORDER BY prodID) AS PurchaseList
FROM userpurchases
GROUP BY custID;
+--------+--------------+
| custID | PurchaseList |
+--------+--------------+
|      1 | 1,2          |
|      2 | 4            |
|      3 | 1,2          |
|      4 | 2,3          |
|      5 | 1,2,3        |
+--------+--------------+

One solution begins with a self-join on prodID as often as necessary to get the answer. For example, to list all products bought by customers who'd already bought at least one other product, join userpurchases to itself on matching custIDs and non-matching prodIDs:

SELECT DISTINCT p2.prodid 
FROM userpurchases p1
JOIN userpurchases p2 ON p1.custID = p2.custID AND p1.prodID <> p2.prodID;
+--------+
| prodid |
+--------+
|      1 |
|      2 |
|      3 |
+--------+

To find what else buyers of product 1 bought, copy the above join and group by custID:

SELECT p1.custID,GROUP_CONCAT(p2.prodid) as 'Buyers of #1 Also bought'
FROM userpurchases p1
JOIN userpurchases p2 ON p1.custID=p2.custID AND p1.prodID <> p2.prodID 
WHERE p1.prodID = 1
GROUP BY p1.custID;
+--------+--------------------------+
| custID | Buyers of #1 Also bought |
+--------+--------------------------+
|      1 | 2                        |
|      3 | 2                        |
|      5 | 2,3                      |
+--------+--------------------------+

What customers bought both product 1 and product 2?

SELECT DISTINCT p1.custID
FROM userpurchases p1
JOIN userpurchases p2 ON p1.custID=p2.custID AND p1.prodID=1 AND p2.prodID=2
+--------+
| custID |
+--------+
|      1 |
|      3 |
|      5 |
+--------+

The problem can also be solved with just Counts and a Having clause. Customers who bought products 1 and 2 ...
(i) have custIDs in userpurchases where prodID is in (1,2) and
(ii) have a distinct prodID count >= the number of prodIDs in (1,2), which is 2:

SELECT custID
FROM (
  SELECT custID, COUNT(DISTINCT prodID) AS N
  FROM userpurchases
  WHERE prodID IN(1,2)
  GROUP BY custID
  HAVING N>=2          -- N of (1,2) = 2
) x;

When the criterion group has more than 2 items, that aggregating solution is easier to implement. For example, you track members and their projects, and you often have to query which members participated in all of a specified list of projects:

DROP TABLE IF EXISTS members,project_members;
CREATE TABLE members(emp_id int,emp_name char(5));
INSERT INTO members VALUES (1,'Emp1'),(2,'Emp2'), (3, Emp3'),(4,'Emp4'),(5,'Emp5');
CREATE TABLE project_members (project_id int, emp_id int);
INSERT INTO project_members 
VALUES (1, 1),(1, 2),(1, 3),(2, 1),(2, 2),(3, 1),(4, 3),(5, 1),(5, 2),(5, 3),(5, 4),(6, 1);  
SELECT project_id AS Project, GROUP_CONCAT(emp_id) AS Members
FROM project_members
GROUP BY project_id;
+---------+---------+
| Project | Members |
+---------+---------+
|       1 | 1,2,3   |
|       2 | 1,2     |
|       3 | 1       |
|       4 | 3       |
|       5 | 1,2,3,4 |
|       6 | 1       |
+---------+---------+

What projects did members 1, 2 and 3 work on?

SELECT project_id
FROM (
  SELECT project_id, COUNT(DISTINCT emp_id) AS N
  FROM project_members
  WHERE emp_id IN(1,2,3)
  GROUP BY project_id
  HAVING N>=3                -- N of (1,2,3) = 3
) x;
+------------+
| project_id |
+------------+
|          1 |
|          5 |
+------------+

Here is a variation on the pattern. You have tables people(id, name), membership(id, name) and people_membership(id, membership_id, people_id). You need to find names of all persons and all their memberships where one of the memberships has id=X. It's another two-step:

1. Assemble the people who have a membership with id=X

2. Join from that derived result to people_memberships to get their other memberships, then to people to get their personal info:

SELECT 
  GROUP_CONCAT(m.name ORDER BY m.membership_id SEPARATOR ', ') AS 'Member Name'
  pX.id,
  pX.name
FROM (
  SELECT p.id,p.id,p.name
  FROM people p
  JOIN (
    SELECT id
    FROM people_membership 
    WHERE membership_id=X
  ) pmX ON p.id=pmX.id
) pX
JOIN people_membership pm ON pX.id=pm.id
JOIN membership        m  ON pm.membership_id=m.membership_id
GROUP BY pX.id
ORDER BY pX.name;

Here is another variation on the pattern. You track companies, their agents, and their sales:

drop table if exists companies,agents,sales;
create table companies( id int,name char(5) );
insert into companies values (1,'a ltd'),(2,'b ltd');
create table agents( id int, agentname char(5), companyID int );
insert into agents values (1,'ann',1),(2,'bill',1),(3,'chad',2),(4,'david',2),(5,'ed',3);
create table sales( itemname char(5), agentID int );
insert into sales values('socks',1),('shoes',2),('shirt',3),('tie',4);

You need a report that lists all items sold by a given agent and by all other agents in that agent's company. It's a three-step:

1. Write a query to retrieve agents and their sales.

2. Join the result of [1] with the agents table on company.

3. Scope the result of [2] on a particular agent's name.

SET @agent='ann';
SELECT s.*
FROM agents a
JOIN (
  SELECT x.id,x.agentname,x.companyID,y.itemname    -- sales by agent
  FROM agents x
  JOIN sales y ON x.id=y.agentID
) s ON a.companyID=s.companyID                      -- users by company
WHERE a.agentname=@agent;                           -- scope by user
------+-----------+-----------+----------+
 id   | agentname | companyID | itemname |
------+-----------+-----------+----------+
    1 | ann       |         1 | socks    |
    2 | bill      |         1 | shoes    |
------+-----------+-----------+----------+


Return to the Artful Common Queries page