Common MySQL Queries
Aggregates | | Pagination |
| DDL | |
| | Pivot tables |
| | |
| | |
| | |
| | |
| | |
| | |
| | Relational division |
| | |
| | |
| | |
| | Sequences |
| | |
Aggregates and Statistics | | |
| Frequencies | |
| | |
| | |
| Graphs and Hierarchies | |
| | |
| | |
Data comparison | JOIN | Spherical geometry |
| | |
| | Statistics without aggregates |
Date and time | | |
| | |
| | |
| | |
| | |
| Join or subquery? | |
| | |
| | Stored procedures |
| | |
| | |
| | |
| NULLs | Strings |
| | |
| | |
| Ordering resultsets | |
| | |
Schedules | | |
| |
Basic aggregation
This is the simplest grouping query pattern. For column foo, display the first (smallest), last (largest) or average value of column bar.
SELECT foo, MIN(bar) AS bar
FROM tbl
GROUP BY foo
To return the highest value, and order top to bottom by that value:
SELECT foo, MAX(bar) AS Count
FROM tbl
GROUP BY foo
ORDER BY Count DESC;
Ditto for AVG(), COUNT() etc. It is easily extended for multiple grouping column expressions.
Aggregates across multiple joins
Given a parent table and two child tables, a query which sums values in both child tables, grouping on a parent table column, returns sums which are exactly twice as large as they should be. In this recent example from the MySQL General Discussion list:
CREATE TABLE packageItem (
packageItemID INT,
packageItemName CHAR(20),
packageItemPrice DECIMAL(10,2)
);
INSERT INTO packageItem VALUES(1,'Delta Hotel',100.00);
CREATE TABLE packageCredit (
packageCreditID INT,
packageCreditItemID INT,
packageItemType CHAR(10),
packageCreditAmount DECIMAL(10,2)
);
INSERT INTO packageCredit VALUES
(1,1,'Deposit',25.00),
(2,1,'Balance',92.00);
CREATE TABLE packageItemTax (
packageItemTaxID INT,
packageItemTaxItemID INT,
packageItemTaxName CHAR(5),
packageItemTaxAmount DECIMAL(10,2)
);
INSERT INTO packageItemTax VALUES
(1,1,'GST',7.00),
(2,1,'HST',10.00);
The query ...
SELECT
i.packageItemID,
i.packageItemName,
i.packageItemPrice,
SUM(t.packageItemTaxAmount) as Tax,
SUM(c.packageCreditAmount) as Credit
FROM packageItem i
LEFT JOIN packageCredit c ON i.packageItemID=c.packageCreditItemID
LEFT JOIN packageItemTax t ON i.packageItemID=t.packageItemTaxItemID
GROUP BY i.packageItemID;
returns ...
+---------------+-----------------+------------------+-------+--------+
| packageItemID | packageItemName | packageItemPrice | Tax | Credit |
+---------------+-----------------+------------------+-------+--------+
| 1 | Delta Hotel | 100.00 | 34.00 | 234.00 |
+---------------+-----------------+------------------+-------+--------+
With three child tables, the sums are tripled. Why? Because the query aggregates across each join.How then to get the correct results? With correlated subqueries:
SELECT
packageItemID,
SUM(packageItemPrice) AS PriceSum,
( SELECT SUM(c.packageCreditAmount)
FROM packageCredit c
WHERE c.packageCreditItemID = packageItemID
) AS CreditSum,
( SELECT SUM(t.packageItemTaxAmount)
FROM packageItemTax t
WHERE t.packageItemTaxItemID = packageItemID
) AS TaxSum
FROM packageItem
GROUP BY packageItemID;
+---------------+----------+-----------+--------+
| packageItemID | PriceSum | CreditSum | TaxSum |
+---------------+----------+-----------+--------+
| 1 | 100.00 | 117.00 | 17.00 |
+---------------+----------+-----------+--------+
If subqueries are unavailable or too slow, replace them with temp tables.
Aggregates excluding leaders
You have a table of grouped ranks ...
DROP TABLE IF EXISTS grps,ranks;
CREATE TABLE grps (grp int);
INSERT INTO grps VALUES(1),(2),(3),(4);
CREATE TABLE ranks(grp int,rank int);
INSERT INTO ranks VALUES(1, 4 ),(1, 7 ),(1, 9 ),(2, 2 ),(2, 3 ),(2, 5 ),(2, 6 ),(2, 8 ),(3, 1 ),(4,11 ),(4,12 ),(4,13 );
and you wish to list ranks by group omitting the leading rank in each group. The simplest query for group leaders is ...
SELECT grp, MIN(rank) as top
FROM ranks r2
GROUP BY grp
+------+------+
| grp | top |
+------+------+
| 1 | 4 |
| 2 | 2 |
| 3 | 1 |
| 4 | 11 |
+------+------+
The simplest way to get a result that omits these is an exclusion join from the ranks table to the above result:
SELECT r1.grp, r1.rank
FROM ranks r1
LEFT JOIN (
SELECT grp, MIN(rank) as top
FROM ranks r2
GROUP BY grp
) AS r2 ON r1.grp=r2.grp AND r1.rank = r2.top
WHERE r2.grp IS NULL
ORDER BY grp, rank;
+------+------+
| grp | rank |
+------+------+
| 1 | 7 |
| 1 | 9 |
| 2 | 3 |
| 2 | 5 |
| 2 | 6 |
| 2 | 8 |
| 4 | 12 |
| 4 | 13 |
+------+------+
Aggregates of specified size
Find the values of a table column c1 for which there are a specified number of listed values in another column c2.
To get an overview of the values of c2 for each value of c1:
SELECT
c1,
GROUP_CONCAT(c2 ORDER BY c2) AS 'C2 values'
FROM table
GROUP BY c1;
To retrieve a list of c1 values for which there exist specific values in another column c2, you need an IN clause specifying the c2 values and a HAVING clause specifying the required number of items in the list ...
SELECT c1
FROM table
WHERE c2 IN (1,2,3,4)
GROUP BY c1
HAVING COUNT(c2)=4;
This is easy to generalise to multiple column expressions, and a HAVING clause specifying any number of items from the IN list.
To list c1 values that have exactly one instance of each c2 value, add DISTINCT to the count criterion:
SELECT c1
FROM table
WHERE c2 IN (1,2,3,4)
GROUP BY c1
HAVING COUNT(DISTINCT c2)=4;
Avoiding repeat aggregation
In a good introductory
tutorial on MySQL subqueries, Jeremy Cole developed a triply nested query to retrieve the name, population, head of state, and number of official languages in countries with the most official languages. The query uses two tables from the MySQL
world database:
CREATE TABLE country (
Code char(3) NOT NULL DEFAULT '' PRIMARY KEY,
Name char(52) NOT NULL DEFAULT '',
Population int(11) NOT NULL DEFAULT '0',
HeadOfState char(60) DEFAULT NULL,
... other columns ...
);
CREATE TABLE countrylanguage (
CountryCode char(3) NOT NULL DEFAULT '' PRIMARY KEY,
Language char(30) NOT NULL DEFAULT '',
IsOfficial enum('T','F') NOT NULL DEFAULT 'F',
Percentage float(4,1) NOT NULL DEFAULT '0.0'
);
The query needs to aggregate language counts by country twice: once for all language counts by country, and once again to identify countries with the highest number of languages:
SELECT name, population, headofstate, top.num
FROM Country
JOIN (
SELECT countrycode, COUNT(*) AS num
FROM CountryLanguage
WHERE isofficial='T'
GROUP BY countrycode
HAVING num = (
SELECT MAX(summary.nr_official_languages)
FROM (
SELECT countrycode, COUNT(*) AS nr_official_languages
FROM CountryLanguage
WHERE isofficial='T'
GROUP BY countrycode
) AS summary
)
) as top ON Country.code=top.countrycode;
+--------------+------------+-------------+-----+
| name | population | headofstate | num |
+--------------+------------+-------------+-----+
| Switzerland | 7160400 | Adolf Ogi | 4 |
| South Africa | 40377000 | Thabo Mbeki | 4 |
+--------------+------------+-------------+-----+
In addition, one of the nested subqueries is buried in a
HAVING clause. This is fine with small tables, but if the table being aggregated is very large and the aggregation is complex, performance may be unsatisfactory. Substituting a temporary table for the double nesting can improve performance in two ways:
- the aggregation needs to be done just once
- we can use an exclusion join, which is usually faster than a
HAVING clause, to find countries with the maximum counts:
DROP TABLE IF EXISTS top;
CREATE TABLE top ENGINE=MEMORY
SELECT countrycode, COUNT(*) AS num
FROM CountryLanguage l1
WHERE isofficial='T'
GROUP BY countrycode;
SELECT name,population,headofstate,t3.num
FROM country c
JOIN (
SELECT t1.countrycode, t1.num
FROM top t1
LEFT JOIN top t2 ON t1.num < t2.num
WHERE t2.countrycode IS NULL
) AS t3 ON c.code=t3.countrycode;
+--------------+------------+-------------+-----+
| name | population | headofstate | num |
+--------------+------------+-------------+-----+
| Switzerland | 7160400 | Adolf Ogi | 4 |
| South Africa | 40377000 | Thabo Mbeki | 4 |
+--------------+------------+-------------+-----+
DROP TABLE top;
You notice that we haven't actually used a
TEMPORARY table? Indeed we haven't, because of the MySQL limitation that temporary tables cannot be referenced multiple times in a query. Until that's lifted, we get almost as much speed improvement from using a
MEMORY table as a temporary table.
Cascading aggregates
When you have parent-child-grandchild tables, eg
companies, users, actions, and your query requirement is for per-parent aggregates from the child table and per-child aggregates from the grandchild table, then cascading joins yield spuriously multiplied counts, and correlated subqueries fail because the second correlated subquery cannot find a visible joining column.
One solution is to use derived tables. Assuming ...
CREATE TABLE companies (id int, name char(10));
CREATE TABLE users (id INT,companyid INT);
CREATE TABLE actions (id INT, userid INT, date DATE);
then...
- Join
companies & users once to establish a derived company-user table.
- Join them a second time, this time aggregating on
users.id to generate user counts per company.
- Join the first derived table to the actions table, aggregating on
actions.id to report actions per user per company:
Here is the SQL:
SELECT cu1.cid, cu1.cname, cu2.cid, cu2.uCnt, ua.aCnt
FROM (
SELECT c.id AS cid, c.name AS cname, u1.id AS uid
FROM companies c
INNER JOIN users u1 ON u1.companyid=c.id
) AS cu1
INNER JOIN (
SELECT c.id AS cid, COUNT(u2.id) AS uCnt
FROM companies c
INNER JOIN users u2 ON u2.companyid=c.id
GROUP BY c.id
) AS cu2 ON cu1.cid=cu2.cid
INNER JOIN (
SELECT u3.id AS uid, COUNT(a.id) AS aCnt
FROM users u3
INNER JOIN actions a ON a.userid=u3.id
GROUP BY u3.id
) AS ua ON ua.uid=cu1.uid;
Cross-aggregates
Given the table authorbook(authid INT, bookid INT), what query finds the books who have authors with more than one book in the table?
Even one level of recursion can induce a mild trance. Escape the trance by taking the problem one step at a time. First write the query that finds the authors with multiple books. Then join an outer query to that on authorid, and have the outer query select bookid:
SELECT a1.bookid
FROM authorbook a1
INNER JOIN (
SELECT authid,count(bookid)
FROM authorbook a2
GROUP BY authid
HAVING COUNT(bookid)>1
) AS a3 ON a1.authid=a3.authid;
Group data by datetime periods
To group rows by a time period whose length in minutes divides evenly into 60, use this formula:
GROUP BY ((60/periodMinutes) * HOUR( thistime ) + FLOOR( MINUTE( thistime ) / periodMinutes ))
where thistime is the TIME column and periodMinutes is the period length in minutes. So to group by 15-min periods, write ...
SELECT ...
GROUP BY ( 4 * HOUR( thistime ) + FLOOR( MINUTE( thistime ) / 15 ))
...
The same logic works for months ...
GROUP BY ((12/periodMonths) * YEAR( thisdate ) + FLOOR( MONTH( thisdate ) / periodMonths ))
It could be made to work for weeks with a function that maps the results of WEEK() to the range 1...52.
When the desired grouping period is a value returned by a MySQL date-time function, matters become simpler: just group by the desired value. Thus to group by weeks, write ..
SELECT ...
GROUP BY WEEK( datecol)
...
If there is no MySQL date-time function that returns the desired grouping period, you will need to write your own stored function.
Sales commissions: double aggregation
Employees' sales commission rates increase as sales totals increase, according to specified bands of sales total amounts—like a graduated income tax in reverse. How do we automate sales commission calculations?
DROP TABLE IF EXISTS tblsales, tblcom;
CREATE TABLE tblsales(employeeID int,sales int);
INSERT INTO tblsales VALUES(1,2),(1,5),(1,7),(2,9),(2,15),(2,12);
SELECT * FROM tblsales;
+------------+-------+
| employeeID | sales |
+------------+-------+
| 1 | 2 |
| 1 | 5 |
| 1 | 7 |
| 2 | 9 |
| 2 | 15 |
| 2 | 12 |
+------------+-------+
CREATE TABLE tblcom(
comstart DECIMAL(6,2),
commend DECIMAL(6,2),
comfactor DECIMAL(6,2),
pct INT
);
INSERT INTO tblcom VALUES
(1.00,10.00,0.10,10),(11.00,20.00,0.20,20),(21.00,30.00,0.30,30),(31.00,40.00,0.40,40);
SELECT * FROM tblcom;
+----------+---------+-----------+------+
| comstart | commend | comfactor | pct |
+----------+---------+-----------+------+
| 1.00 | 10.00 | 0.10 | 10 |
| 11.00 | 20.00 | 0.20 | 20 |
| 21.00 | 30.00 | 0.30 | 30 |
| 31.00 | 40.00 | 0.40 | 40 |
+----------+---------+-----------+------+
The first problem is to work out how commission ranges map to sales totals to determine base amounts for calculation of each part-commission. We assume the ranges are inclusive, ie a range that starts at 1 euro is meant to include that first euro:
- if amt < comstart, base amount = 0
- if amt <= commend, base amount = amt-comstart+1
- if amt > commend, base amount = commend - comstart+1
This is a nested IF():
IF(s.amt<c.comstart,0,IF(s.amt<=c.commend,s.amt-c.comstart,c.commend-c.comstart))
The second problem is how to apply every commission range row to every employee sales sum. That's a
CROSS JOIN between aggregated sales and commissions:
SELECT *
FROM (
SELECT employeeID,SUM(sales) AS amt
FROM tblSales
GROUP BY employeeID
) AS s
JOIN tblcom
ORDER BY s.employeeID;
+------------+------+----------+---------+-----------+------+
| employeeID | amt | comstart | commend | comfactor | pct |
+------------+------+----------+---------+-----------+------+
| 1 | 14 | 1.00 | 10.00 | 0.10 | 10 |
| 1 | 14 | 11.00 | 20.00 | 0.20 | 20 |
| 1 | 14 | 21.00 | 30.00 | 0.30 | 30 |
| 1 | 14 | 31.00 | 40.00 | 0.40 | 40 |
| 2 | 36 | 31.00 | 40.00 | 0.40 | 40 |
| 2 | 36 | 1.00 | 10.00 | 0.10 | 10 |
| 2 | 36 | 11.00 | 20.00 | 0.20 | 20 |
| 2 | 36 | 21.00 | 30.00 | 0.30 | 30 |
+------------+------+----------+---------+-----------+------+
Now check how the formula applies on every commission band for every sales total:
SELECT
s.employeeID,s.amt,c.comstart,c.commend,
IF(s.amt<=c.comstart,0,
IF(s.amt<c.commend,s.amt-c.comstart+1,c.commend-c.comstart+1)
) AS base,
c.comFactor AS ComPct,
IF(s.amt<=c.comstart,0,
IF(s.amt<c.commend,s.amt-c.comstart+1,c.commend-c.comstart+1)
) * comFactor AS Comm
FROM (
SELECT employeeID,SUM(sales) AS amt
FROM tblSales
GROUP BY employeeID
) AS s
JOIN tblcom c
ORDER BY s.employeeID,comstart;
+------------+------+----------+---------+-------+--------+--------+
| employeeID | amt | comstart | commend | base | ComPct | Comm |
+------------+------+----------+---------+-------+--------+--------+
| 1 | 14 | 1.00 | 10.00 | 10.00 | 0.10 | 1.0000 |
| 1 | 14 | 11.00 | 20.00 | 4.00 | 0.20 | 0.8000 |
| 1 | 14 | 21.00 | 30.00 | 0.00 | 0.30 | 0.0000 |
| 1 | 14 | 31.00 | 40.00 | 0.00 | 0.40 | 0.0000 |
| 2 | 36 | 1.00 | 10.00 | 10.00 | 0.10 | 1.0000 |
| 2 | 36 | 11.00 | 20.00 | 10.00 | 0.20 | 2.0000 |
| 2 | 36 | 21.00 | 30.00 | 10.00 | 0.30 | 3.0000 |
| 2 | 36 | 31.00 | 40.00 | 6.00 | 0.40 | 2.4000 |
+------------+------+----------+---------+-------+--------+--------+
Finally,
SUM the formula results to aggregate commissions on aggregated sales:
SELECT
s.employeeID,
s.amt,
SUM(IF(s.amt<=c.comstart,0,
IF(s.amt<=c.commend,s.amt-c.comstart+1,c.commend-c.comstart+1)) * c.pct/100
) AS Comm
FROM (
SELECT employeeID,SUM(sales) AS amt
FROM tblSales
GROUP BY employeeID
) AS s
JOIN tblcom c
GROUP BY s.employeeID;
+------------+------+----------+
| employeeID | amt | Comm |
+------------+------+----------+
| 1 | 14 | 1.800000 |
| 2 | 36 | 8.400000 |
+------------+------+----------+
Show only one child row per parent row
Given tables parent(id int not null primary key, etc...) and child (id int not null primary key, pid int not null references parent (id), etc...), how do we write a query that retrieves only one child row per pid even when the child table has multiple matching rows? MySQL permits use of GROUP BY even when the SELECT list specifies no aggregate function, so this will work:
select p.id, c.id
from parent p
join child c on p.id=c.pid
group by p.id;
But is it accurate? No, because it displays only the first c.pid value it happens to find. For further discussion see 'Within-group aggregates'.
Skip repeating values
You want to report all unique values of a column and skip all rows repeating any of these values.
SELECT col
FROM foo
GROUP BY col
Within-group aggregates
You have a products table with columns item, supplier, price:
DROP TABLE IF EXISTS products;
CREATE TABLE products(item int,supplier int,price decimal(6,2));
INSERT INTO products VALUES(1,1,10),(1,2,15),(2,2,20),(2,1,21),(2,2,18);
SELECT * FROM products;
+------+----------+-------+
| item | supplier | price |
+------+----------+-------+
| 1 | 1 | 10.00 |
| 1 | 2 | 15.00 |
| 2 | 2 | 20.00 |
| 2 | 1 | 21.00 |
| 2 | 2 | 18.00 |
+------+----------+-------+
Multiple suppliers offer various prices for the same item. You need to find the supplier with the lowest price for each item.
Your first thought may be to GROUP BY item, but that is not guaranteed to return the correct supplier value for each minimum item price. Grouping by both item and supplier will return more information than you want. Nor can you write WHERE price=MIN(...) because the query engine will evaluate the WHERE clause before it knows the MIN value.
This is the problem of aggregating within aggregates. It is sometimes called the 'groupwise aggregates' problem, but the term 'groupwise' is ambiguous at best, so we think better names for it are subaggregates, inner aggregates, or within-group aggregates.
It's easy to show that the within-group aggregates problem is a form of the problem of returning values from non-grouping columns in an aggregate query. Suppose you write ...
SELECT item,supplier,MIN(price)
FROM products
GROUP BY item;
Will this reliably return the correct supplier per item? No. Unless there is exactly one supplier per item, the supplier value returned will be arbitrary. To retrieve the correct supplier for each item, you need more logic.
The simplest and often best-performing solution to the within-aggregates problem is an outer self exclusion join...
SELECT p1.item,p1.supplier,p1.price
FROM products AS p1
LEFT JOIN products AS p2 ON p1.item = p2.item AND p1.price > p2.price
WHERE p2.item IS NULL;
...because in the resultset built by joining on left item = right item and left price > right price, the left-sided rows for which there is no greater right-sided price are precisely the per-item rows with the smallest prices.
You can also accomplish this by building a table of aggregated minimum prices. Before MySQL 4.1, it has to be a temporary table:
CREATE TEMPORARY TABLE tmp (
item INT,
minprice DECIMAL DEFAULT 0.0
);
LOCK TABLES products READ;
INSERT INTO tmp
SELECT item, MIN(price)
FROM products
GROUP BY item;
to which you then join the products table:
SELECT products.item, supplier, products.price
FROM products
JOIN tmp ON products.item = tmp.item
WHERE products.price=tmp.minprice;
UNLOCK TABLES;
DROP TABLE tmp;
From MySQL 4.1 on, the temporary table can be a correlated subquery. This is the most intuitively obvious syntax for the problem. Often it's also the slowest solution:
SELECT item, supplier, price
FROM products AS p1
WHERE price = (
SELECT MIN(p2.price)
FROM products AS p2
WHERE p1.item = p2.item
);
Another solution, sometimes the fastest of all, is to move the aggregating subquery from the WHERE clause to the FROM clause:
SELECT p.item, p.supplier, p.price
FROM products AS p
JOIN (
SELECT item, MIN(price) AS minprice
FROM products
GROUP BY item
) AS pm ON p.item = pm.item AND p.price = pm.minprice;
Try all solutions to find which is fastest for your version of the problem.
To find more than one value per group, you might think the LIMIT clause would work, but LIMIT is limited in subqueries. See Within-group quotas.
Within-group aggregates with a wrinkle
We have a wages table holding wage rates by waiter and startdate, and a tips table which tracks hours worked and tips received per waiter per day. The requirement is to report wages and concurrent tips per waiter per day.
DROP TABLE IF EXISTS wages,tips;
CREATE TABLE wages( id int, waiter int, start date, rate decimal(6,2));
INSERT INTO wages VALUES
( 1, 4, '2005-01-01', 5.00 ),
( 2, 4, '2005-03-01', 6.00 ),
( 3, 5, '2007-01-05', 7.00 ),
( 4, 5, '2008-03-20', 8.00 ),
( 5, 5, '2008-04-01', 9.00 );
CREATE TABLE tips(
id int,
date date,
waiter int,
hours_worked smallint,
tabs smallint,
tips decimal(6,2)
);
INSERT INTO tips VALUES
( 1, '2008-02-29', 4, 6.50, 21, 65.25 ),
( 2, '2008-03-06', 5, 6.00, 15, 51.75 ),
( 3, '2008-03-21', 4, 2.50, 5, 17.85 ),
( 4, '2008-03-22', 5, 5.25, 10, 39.00 );
SELECT * FROM wages;
+------+--------+------------+------+
| id | waiter | start | rate |
+------+--------+------------+------+
| 1 | 4 | 2005-01-01 | 5.00 |
| 2 | 4 | 2005-03-01 | 6.00 |
| 3 | 5 | 2007-01-05 | 7.00 |
| 4 | 5 | 2008-03-20 | 8.00 |
| 5 | 5 | 2008-04-01 | 9.00 |
+------+--------+------------+------+
SELECT * FROM tips;
+------+------------+--------+--------------+------+-------+
| id | date | waiter | hours_worked | tabs | tips |
+------+------------+--------+--------------+------+-------+
| 1 | 2008-02-29 | 4 | 7 | 21 | 65.25 |
| 2 | 2008-03-06 | 5 | 6 | 15 | 51.75 |
| 3 | 2008-03-21 | 4 | 3 | 5 | 17.85 |
| 4 | 2008-03-22 | 5 | 5 | 10 | 39.00 |
+------+------------+--------+--------------+------+-------+
For the above dataset, the result which correctly matches wages and tips would be:
+------+------------+------+------+-------+------+--------+------+------------+
| tid | Date | Hrs | tabs | tips | wid | waiter | rate | start |
+------+------------+------+------+-------+------+--------+------+------------+
| 1 | 2008-02-29 | 7 | 21 | 65.25 | 2 | 4 | 6.00 | 2005-03-01 |
| 2 | 2008-03-06 | 6 | 15 | 51.75 | 3 | 5 | 7.00 | 2007-01-05 |
| 3 | 2008-03-21 | 3 | 5 | 17.85 | 2 | 4 | 6.00 | 2005-03-01 |
| 4 | 2008-03-22 | 5 | 10 | 39.00 | 4 | 5 | 8.00 | 2008-03-20 |
+------+------------+------+------+-------+------+--------+------+------------+
Why is this different from an ordinary within-groups aggregate? The correct wage rate for a tips row is not the wages row for that waiter with the latest date; it is the wages row having the latest date before the date in the given tips row.
One way to proceed is to build a temporary table from a join of wages to tips on waiter and wages.start < tips.date, then exclusion-join that result to itself to remove all rows except those with the latest wage rate per tips row. A two-step:
-- wages-tips join removing later wage changes:
DROP TABLE IF EXISTS tmp;
CREATE TABLE tmp
SELECT
t.id AS tid, t.date AS Date, t.hours_worked AS Hrs,t.tabs,t.tips,
w.id AS wid, w.waiter, w.rate, w.start
FROM tips t
JOIN wages w ON w.waiter=t.waiter AND w.start<=t.date;
-- self-exclusion join to remove obsolete wage rows:
SELECT t1.*
FROM tmp t1
LEFT JOIN tmp t2 ON t1.tid=t2.tid and t1.start<t2.start
WHERE t2.waiter is null
ORDER BY t1.Date;
+------+------------+------+------+-------+------+--------+------+------------+
| tid | Date | Hrs | tabs | tips | wid | waiter | rate | start |
+------+------------+------+------+-------+------+--------+------+------------+
| 1 | 2008-02-29 | 7 | 21 | 65.25 | 2 | 4 | 6.00 | 2005-03-01 |
| 2 | 2008-03-06 | 6 | 15 | 51.75 | 3 | 5 | 7.00 | 2007-01-05 |
| 3 | 2008-03-21 | 3 | 5 | 17.85 | 2 | 4 | 6.00 | 2005-03-01 |
| 4 | 2008-03-22 | 5 | 10 | 39.00 | 4 | 5 | 8.00 | 2008-03-20 |
+------+------------+------+------+-------+------+--------+------+------------+
DROP TABLE tmp;
That's fine, but can we skip the temp table? Yes—by adding the condition wages.start <= tips.date to each side of the exclusion join:
SELECT
t.id AS tid, t.date, t.hours_worked AS Hrs,t.tabs,t.tips,
w.id AS wid, w.waiter, w.rate, w.start
FROM tips t
JOIN wages w ON w.waiter=t.waiter AND w.start <= t.date
LEFT JOIN wages w2 ON w.waiter=w2.waiter AND w2.start<=t.date AND w.start<w2.start
WHERE w2.id IS NULL
ORDER BY t.date;
Much simpler, and it gives the same result as the two-step.
Within-group quotas (Top N per group)
A table has multiple rows per key value, and you need to retrieve, say, the first or earliest two rows per key.
If the groups are fairly small, this can be done efficiently with a self-join and counts. For example the following table (based on a tip by
Rudy Limeback) has three small data groups:
DROP TABLE IF EXISTS test;
CREATE TABLE test (
id INT,
entrydate DATE
);
INSERT INTO test VALUES
( 1, '2007-5-01' ),
( 1, '2007-5-02' ),
( 1, '2007-5-03' ),
( 1, '2007-5-04' ),
( 1, '2007-5-05' ),
( 1, '2007-5-06' ),
( 2, '2007-6-01' ),
( 2, '2007-6-02' ),
( 2, '2007-6-03' ),
( 2, '2007-6-04' ),
( 3, '2007-7-01' ),
( 3, '2007-7-02' ),
( 3, '2007-7-03' );
The first two rows per
ID are the rows which, for a given
ID, have two or fewer rows with earlier dates. If we use an
inequality join with the
COUNT(*) function to find the earlier rows per
ID ...
SELECT t1.id, t1.entrydate, COUNT(*) AS earlier
FROM test AS t1
JOIN test AS t2 ON t1.id=t2.id AND t1.entrydate >= t2.entrydate
GROUP BY t1.id, t1.entrydate
+------+------------+---------+
| id | entrydate | earlier |
+------+------------+---------+
| 1 | 2007-05-01 | 1 |
| 1 | 2007-05-02 | 2 |
| 1 | 2007-05-03 | 3 |
| 1 | 2007-05-04 | 4 |
| 1 | 2007-05-05 | 5 |
| 1 | 2007-05-06 | 6 |
| 2 | 2007-06-01 | 1 |
| 2 | 2007-06-02 | 2 |
| 2 | 2007-06-03 | 3 |
| 2 | 2007-06-04 | 4 |
| 3 | 2007-07-01 | 1 |
| 3 | 2007-07-02 | 2 |
| 3 | 2007-07-03 | 3 |
+------+------------+---------+
... then we get our result immediately by removing rows where the 'earlier' count exceeds 2:
SELECT t1.id, t1.entrydate, count(*) AS earlier
FROM test AS t1
JOIN test AS t2 ON t1.id=t2.id AND t1.entrydate >= t2.entrydate
GROUP BY t1.id, t1.entrydate
HAVING earlier <= 2;
+------+------------+---------+
| id | entrydate | earlier |
+------+------------+---------+
| 1 | 2007-05-01 | 1 |
| 1 | 2007-05-02 | 2 |
| 2 | 2007-06-01 | 1 |
| 2 | 2007-06-02 | 2 |
| 3 | 2007-07-01 | 1 |
| 3 | 2007-07-02 | 2 |
+------+------------+---------+
This works beautifully with smallish aggregates. But the query algorithm compares every within-group row to every other within-group row. As the size N of a group increases, execution time increases by N*N. If the query takes one minute for groups of 1,000, it will take 16 minutes for groups of 4,000, and more than four hours for groups for 16,000.
The solution does not scale.
What to do? Forget
GROUP BY! Manually assemble the desired query results in a temporary table from simple indexed queries, in this case, two rows per
ID:
DROP TEMPORARY TABLE IF EXISTS earliers;
CREATE TEMPORARY TABLE earliers( id INT, entrydate DATE);
INSERT INTO earliers
SELECT id,entrydate FROM test WHERE id=1 ORDER BY entrydate LIMIT 2;
INSERT INTO earliers
SELECT id,entrydate FROM test WHERE id=2 ORDER BY entrydate LIMIT 2;
INSERT INTO earliers
SELECT id,entrydate FROM test WHERE id=3 ORDER BY entrydate LIMIT 2;
You need one
INSERT statement per grouping value. To print the result, just query the
earliers table:
SELECT * FROM earliers
ORDER BY id, entrydate;
+------+------------+
| id | entrydate |
+------+------------+
| 1 | 2007-05-01 |
| 1 | 2007-05-02 |
| 2 | 2007-06-01 |
| 2 | 2007-06-02 |
| 3 | 2007-07-01 |
| 3 | 2007-07-02 |
+------+------------+
DROP TEMPORARY TABLE earliers;
Most useful reports run again and again. If that's the case for yours, automate it in a stored procedure: using a cursor and a prepared statement, auto-generate an
INSERT statement for every grouping value, and return the result:
DROP PROCEDURE IF EXISTS listearliers;
DELIMITER |
CREATE PROCEDURE listearliers()
BEGIN
DECLARE curdone, vid INT DEFAULT 0;
DECLARE idcur CURSOR FOR SELECT DISTINCT id FROM test;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET curdone = 1;
DROP TEMPORARY TABLE IF EXISTS earliers;
CREATE TEMPORARY TABLE earliers( id INT, entrydate DATE);
SET @sql = 'INSERT INTO earliers SELECT id,entrydate FROM test WHERE id=? ORDER BY entrydate LIMIT 2';
OPEN idcur;
REPEAT
FETCH idcur INTO vid;
IF NOT curdone THEN
BEGIN
SET @vid = vid;
PREPARE stmt FROM @sql;
EXECUTE stmt USING @vid;
DROP PREPARE stmt;
END;
END IF;
UNTIL curdone END REPEAT;
CLOSE idcur;
SELECT * FROM earliers ORDER BY id,entrydate;
DROP TEMPORARY TABLE earliers;
END;
|
DELIMITER ;
CALL listearliers();
Average the top 50% values per group
Each row of a games table records one game score for a team:
DROP TABLE IF EXISTS games;
CREATE TABLE games(id INT, teamID INT, score INT);
INSERT INTO games VALUES
(1,1,3),(2,1,4),(3,1,5),(4,1,6),(5,2,6),
(6,2,7),(7,2,8),(8,2,7),(9,2,6),(10,2,7);
How would we write a query that returns the average of the top 50% of scores per team?
The per-team median value is its middle value--lower than the highest 50% and higher than the lowest 50% of values for that team--so a shortcut is to query the team medians, then aggregate on a join that selects per-team scores above the medians.
How to find per-team medians? If a resultset has an odd number of rows, at least one row has the true median score. If it has an even number of rows, the median score is an average of two central values. The following query adapts Joe Celko's formula in "SQL for Smarties" averaging "low" and "high" medians:
DROP TABLE IF EXISTS medians;
CREATE TABLE medians
SELECT p1.teamid, AVG(P1.score) AS median
FROM games AS P1, games AS P2
WHERE p1.teamid=p2.teamid
GROUP BY p1.teamid
HAVING (
SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2)
AND
SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1)
)
OR (
SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2)
AND
SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1)
);
+--------+--------+
| teamid | median |
+--------+--------+
| 1 | 4.5000 |
| 2 | 6.8333 |
+--------+--------+
Now join games to medians accepting only top-half values:
SELECT g.teamid, AVG(g.score) AS Top50Avg
FROM games g
JOIN medians m ON g.teamid = m.teamid AND g.score >= m.median
GROUP BY g.teamid
ORDER BY Top50Avg DESC;
+--------+----------+
| teamid | Top50Avg |
+--------+----------+
| 2 | 7.2500 |
| 1 | 5.5000 |
+--------+----------+
DROP TABLE medians;
Yes, all the logic can be moved into one query:
SELECT g.teamid, AVG(g.score) AS Top50Avg
FROM games g
JOIN (
SELECT p1.teamid, AVG(P1.score) AS median
FROM games AS P1, games AS P2
WHERE p1.teamid=p2.teamid
GROUP BY p1.teamid
HAVING (
SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2)
AND
SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1)
)
OR (
SUM(CASE WHEN P2.score >= P1.score THEN 1 ELSE 0 END) >= ((COUNT(*) + 1) / 2)
AND
SUM(CASE WHEN P2.score <= P1.score THEN 1 ELSE 0 END) >= (COUNT(*)/2 + 1)
)
) AS m ON g.teamid = m.teamid AND g.score >= m.median
GROUP BY g.teamid
ORDER BY Top50Avg DESC;
Averages from bands of values
To count and average scores in bands of 10, ie 0-9,10-19 etc:
create table scores(score int);
insert into scores values(5),(15),(25),(35);
SELECT 10 * FLOOR( score / 10 ) AS Bottom,
10 * FLOOR( score / 10 ) + 9 AS Top,
Count( score ),
Avg( score )
FROM scores
GROUP BY 10 * FLOOR( score / 10 );
+--------+------+----------------+--------------+
| Bottom | Top | Count( score ) | Avg( score ) |
+--------+------+----------------+--------------+
| 0 | 9 | 1 | 5.0000 |
| 10 | 19 | 1 | 15.0000 |
| 20 | 29 | 1 | 25.0000 |
| 30 | 39 | 1 | 35.0000 |
+--------+------+----------------+--------------+
Count unique values of one column
SELECT col_name, COUNT(*) AS frequency
FROM tbl_name
GROUP by col_name
ORDER BY frequency DESC;
Median
Statistically, the median is the middle value--the value that is smaller than that found in half of all remaining rows, and larger than that found in the other half:
SELECT l1.hours As Median
FROM BulbLife As l1, bulbLife AS l2
GROUP BY l1.Hours
HAVING SUM(CASE WHEN l2.hours <= l1.hours THEN 1 ELSE 0 END) >= (COUNT(*)+1) / 2
AND SUM(CASE WHEN l2.hours >= l1.hours THEN 1 ELSE 0 END) >= (COUNT(*)/2) + 1;
Mode
Statistically, the mode is the most frequently occurring value. Given tables parent(id int) and child(pid int, cid int), where child.pid references parent.id as a foreign key, what query finds the parent.id most often represented in the child id, that is, the modal count of child.pid?
SELECT pid, COUNT(*) AS frequency
FROM child
GROUP BY pid
ORDER BY frequency DESC
LIMIT 1;
Rank order
Without MSSQL's RANK() aggregate function, how do we display rank order in a MySQL query, for example from a table like this?
CREATE TABLE votes( name CHAR(10), votes INT );
INSERT INTO votes VALUES
('Smith',10),('Jones',15),('White',20),('Black',40),('Green',50),('Brown',20);
The query is a two-step:
1. Join the table to itself on the value to be ranked, handling ties
2. Group and order the result of the self-join on rank:
SELECT v1.name, v1.votes, COUNT(v2.votes) AS Rank
FROM votes v1
JOIN votes v2 ON v1.votes < v2.votes OR (v1.votes=v2.votes and v1.name = v2.name)
GROUP BY v1.name, v1.votes
ORDER BY v1.votes DESC, v1.name DESC;
+-------+-------+------+
| name | votes | Rank |
+-------+-------+------+
| Green | 50 | 1 |
| Black | 40 | 2 |
| White | 20 | 3 |
| Brown | 20 | 3 |
| Jones | 15 | 5 |
| Smith | 10 | 6 |
+-------+-------+------+
Backslashes in data
Backslashes multiply weirdly:
SELECT 'a\b' RLIKE 'a\b';
returns 1, as does...
SELECT 'a\\b' RLIKE 'a\\\\b';
because in a pair of backslashes, the second is not escaped by the first, so to compare two literals you double each backslash in the RLIKE argument. But if you are querying a table for such a string from the MySQL client, this doubling happens twice--once in the client, and once in the database--so to find a column value matching 'a\\b', you need to write...
SELECT desc FROM xxx WHERE desc RLIKE 'aa\\\\\\\\bb';
That's eight backslashes to match two!
Compare data in two tables
This query UNIONs queries for matching column names from two tables, and keeps just those rows which occur once in the union. Those are the rows unique to one table or the other. Customise your column list { id, col1, col2, col3 ...} as desired.
SELECT
MIN(TableName) as TableName, id, col1, col2, col3, ...
FROM (
SELECT 'Table a' as TableName, a.id, a.col1, a.col2, a.col3, ...
FROM a
UNION ALL
SELECT 'Table b' as TableName, b.id, b.col1, b.col2, b.col3, ...
FROM b
) AS tmp
GROUP BY id, col1, col2, col3, ...
HAVING COUNT(*) = 1
ORDER BY ID;
Age in years
Given a birthdate in @dob, here is a simple formula for age in years:
DATE_FORMAT(FROM_DAYS(TO_DAYS(now()) - TO_DAYS(@dob)), '%Y') + 0;
and here is one for age in years to two decimal places (ignoring day of month):
ROUND((((YEAR(now()) - YEAR(@dob)))*12 + (((MONTH(now()) - MONTH(@dob)))))/12, 2)
Appointments available
Given a clinic of physicians, patients and appointments, how to find an available appointment time for a given physician?
This is a variant of the [Not] Exists query pattern. Though we can write it with subqueries, performance will be crisper with a join. But finding data that is not there requires a join to data which is there. So in addition to tables for appointments, doctors and patients, we need a table of all possible appointment datetimes. Here's a schema illustrating the idea ...
CREATE TABLE a_dt ( -- POSSIBLE APPOINTMENT DATES AND TIMES
d DATE,
t TIME
);
CREATE TABLE a_drs ( -- DOCTORS
did INT -- doctor id
);
CREATE TABLE a_pts ( -- PATIENTS
pid INT
);
CREATE TABLE a_appts ( -- APPOINTMENTS
aid INT, -- appt id
did INT, -- doctor id
pid INT, -- patient id
d DATE,
t TIME
);
Now we can apply the [Not] Exists query pattern. To find free appointment datetimes for a given doctor in a given datetime range, we left join possible appointments to existing appointments on date and time and doctor, add Where conditions for desired appointment datetimes, and finally add a Where condition that the appointment slot be null, i.e. free...
SELECT d.did, a.d, a.t
FROM a_dt AS a
LEFT JOIN a_appts AS ap USING (d,t)
LEFT JOIN a_drs AS d
ON a.d = ap.d
AND a.t = ap.t
AND ap.did = d.did
AND ap.did = 1
WHERE a.d BETWEEN desired_start_date AND desired_end_date
AND a.t BETWEEN desired_start_time AND desired_end_time
AND ap.aid IS NULL;
Count business days between two dates
Given a table named dates with date columns d1,d2, if the two dates are in the same year, the solution is simply the date difference in days minus the date difference in weeks:
SELECT d1, d2, DATEDIFF(d2, d1) - (WEEK(d2) - WEEK(d1)) * 2 AS BizDays
FROM dates
ORDER BY d1, d2;
For dates that span different years, week numbers won't work. The answer the number of raw days, minus twice the number of whole weeks (because there are two weekend days/week), minus the number of weekend days in any remainder part-week. This algorithm works when the start and stop dates are themselves business days (but needs refinement to work when passed weekend dates--anybody want to try?):
SET @d1='2007-1-1';
SET @d2='2007-3-31';
SET @dow1 = DAYOFWEEK(@d1);
SET @dow2 = DAYOFWEEK(@d2);
SET @days = DATEDIFF(@d2,@d1);
SET @wknddays = 2 * FLOOR( @days / 7 ) +
IF( @dow1 = 1 AND @dow2 > 1, 1,
IF( @dow1 = 7 AND @dow2 = 1, 1,
IF( @dow1 > 1 AND @dow1 > @dow2, 2,
IF( @dow1 < 7 AND @dow2 = 7, 1, 0 )
)
)
);
SELECT FLOOR(@days - @wkndDays) AS BizDays;
The algorithm is easily encapsulated in a function:
DROP FUNCTION IF EXISTS BizDayDiff;
DELIMITER |
CREATE FUNCTION BizDayDiff( d1 DATE, d2 DATE )
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE dow1, dow2, days, wknddays INT;
SET dow1 = DAYOFWEEK(d1);
SET dow2 = DAYOFWEEK(d2);
SET days = DATEDIFF(d2,d1);
SET wknddays = 2 * FLOOR( days / 7 ) +
IF( dow1 = 1 AND dow2 > 1, 1,
IF( dow1 = 7 AND dow2 = 1, 1,
IF( dow1 > 1 AND dow1 > dow2, 2,
IF( dow1 < 7 AND dow2 = 7, 1, 0 )
)
)
);
RETURN FLOOR(days - wkndDays);
END;
|
DELIMITER ;
To include time in the difference, you would probably adopt the convention of returning a string like N days hh:mm:ss where N is the date difference calculated above, minus one if the time portion of d1 is later than that of d2:
DROP FUNCTION IF EXISTS BizDateTimeDiff;
DELIMITER |
CREATE FUNCTION BizDateTimeDiff( d1 DATETIME, d2 DATETIME )
RETURNS CHAR(30)
DETERMINISTIC
BEGIN
DECLARE dow1, dow2, days, wknddays INT;
DECLARE tdiff CHAR(10);
SET dow1 = DAYOFWEEK(d1);
SET dow2 = DAYOFWEEK(d2);
SET tdiff = TIMEDIFF( TIME(d2), TIME(d1) );
SET days = DATEDIFF(d2,d1);
SET wknddays = 2 * FLOOR( days / 7 ) +
IF( dow1 = 1 AND dow2 > 1, 1,
IF( dow1 = 7 AND dow2 = 1, 1,
IF( dow1 > 1 AND dow1 > dow2, 2,
IF( dow1 < 7 AND dow2 = 7, 1, 0 )
)
)
);
SET days = FLOOR(days - wkndDays) - IF( ASCII(tdiff) = 45, 1, 0 );
SET tdiff = IF( ASCII(tdiff) = 45, TIMEDIFF( '24:00:00', SUBSTRING(tdiff,2)), TIMEDIFF( tdiff, '00:00:00' ));
RETURN CONCAT( days, ' days ', tdiff );
END;
|
DELIMITER ;
SELECT BizDateTimeDiff( '2007-1-1 00:00:00', '2007-3-31 00:00:00' ) AS dtdiff;
+------------------+
| dtdiff |
+------------------+
| 64 days 00:00:00 |
+------------------+
SELECT BizDateTimeDiff( '2007-1-1 11:00:00', '2007-3-31 00:00:00' ) AS dtdiff;
+------------------+
| dtdiff |
+------------------+
| 63 days 13:00:00 |
+------------------+
SELECT BizDateTimeDiff( '2007-1-1 12:00:00', '2007-3-31 13:00:00' ) AS dtdiff;
+------------------+
| dtdiff |
+------------------+
| 64 days 01:00:00 |
+------------------+
SELECT BizDateTimeDiff( '2007-1-1 00:00:00', '2007-3-31 11:00:00' ) AS dtdiff;
+------------------+
| dtdiff |
+------------------+
| 64 days 11:00:00 |
+------------------+
To factor in national and religious holidays, you need a holidays table and a stored procedure that adds in the number of holidays between d1 and d2.
Count Tuesdays between two dates
Date arithmetic is deceptively hard. One way to appreciate the difficulties is to read
Chapter 21 in our book. Another is to try to calculate the number of Tuesdays (or another weekday) between two dates. It's not a back-of-the-napkin problem.
An earlier formula we had for this problem sometimes gave incorrect results. As a debugging aid, we wrote a brute force calculator for the problem:
SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS DayCount;
DELIMITER |
CREATE FUNCTION DayCount( d1 DATE, d2 DATE, daynum SMALLINT ) RETURNS INT
BEGIN
DECLARE days INT DEFAULT 0;
IF D1 IS NOT NULL AND D2 IS NOT NULL THEN
WHILE D1 <= d2 DO
BEGIN
IF DAYOFWEEK(d1) = daynum THEN
SET days=days+1;
END IF;
SET d1 = ADDDATE(d1, INTERVAL 1 DAY);
END;
END WHILE;
END IF;
RETURN days;
END;
|
DELIMITER ;
select
daycount('2008-3-16','2008-3-28',7)
AS 'Sats from 2008-3-16 through 2008-3-28';
You would not want to use that function on long date spans in a big table, but it will do for testing.
Now, how to count the number of
Tuesdays, say, between two dates? The basic logic is:
1. Count
weeks between the two dates.
2. If beginning and ending weekdays are the same, then if they're Tuesday, the answer is
weeks+1, otherwise it's just
weeks.
3. Otherwise, if the beginning weekday <= the ending weekday, then if Tuesday is between them, the answer is
weeks+1, otherwise it's just
weeks.
4. Otherwise the ending weekday is less than the starting weekday; if Tuesday >= the starting weekday or <= the ending weekdayy, the answer is
weeks+1, otherwise it's just
weeks.
For a convenient datasource, we'll use the two date columns
orderdate and
shippeddate in the
orders table of the NorthWind database, and we'll use our brute force function DayCount() to check results:
SET @day = 3;
SELECT
DATE_FORMAT(orderdate,'%y%m%d') AS OrdDt,
DATE_FORMAT(shippeddate,'%y%m%d') AS ShipDt,
LEFT(DAYNAME(orderdate),3) AS D1,
LEFT(DAYNAME(shippeddate),3) AS D2,
@dow1 := DAYOFWEEK(orderdate) AS 'dw1',
@dow2 := DAYOFWEEK(shippeddate) AS 'dw2',
@days := DATEDIFF(shippeddate,orderdate) AS Days,
@wks := FLOOR( @days / 7 ) AS Wks,
FLOOR( IF( @dow1 = @dow2, IF( @day = @dow1, @wks+1, @wks),
IF( @dow1 < @dow2, IF( @day BETWEEN @dow1 AND @dow2, @wks+1, @wks),
IF( @day >= @dow1 OR @day <= @dow2, @wks+1, @wks )
)
)
) AS Res,
DayCount(DATE(orderdate),DATE(shippeddate),@day) AS Chk
FROM orders
HAVING !ISNULL(res-chk) AND res-chk <> 0;
Empty set (0.00 sec)
No errors. We get the same result for @day = 1, 2, 4, 5, 6 and 7.
But the formula is buried in the specifics of one table, so abstract it to a reusable function:
DROP FUNCTION IF EXISTS NamedDaysBetween;
DELIMITER |
CREATE FUNCTION NamedDaysBetween( d1 DATE, d2 DATE, daynum SMALLINT )
RETURNS INT
BEGIN
DECLARE dow1, dow2, wks, days INT;
IF !ISNULL(d1) AND !ISNULL(d2) THEN
SET dow1 = DAYOFWEEK( d1 );
SET dow2 = DAYOFWEEK( d2 );
SET days = DATEDIFF( d2, d1 );
SET wks = FLOOR( days / 7 );
SET days = IF( dow1 = dow2, IF( daynum = dow1, wks+1, wks),
IF( dow1 < dow2, IF( daynum BETWEEN dow1 AND dow2, wks+1, wks),
IF( daynum >= dow1 OR daynum <= dow2, wks+1, wks )
)
);
END IF;
RETURN days;
END;
|
DELIMITER ;
Again check it against lots of date value pairs:
SELECT
nameddaysbetween(orderdate,shippeddate,3) - daynamecount(orderdate,shippeddate,3)
AS diff
FROM orders
HAVING !ISNULL(diff) AND diff <> 0;
Empty set (0.00 sec)
Date of first Friday of next month
Assuming a calendar table calendar(date DATE) with one row per date through the relevant period...
SET @d = NOW();
SELECT MIN(date) AS 1stFridayOfMonth
FROM calendar
WHERE YEAR(date) = IF( MONTH(@d) = 12, 1+YEAR(@d), YEAR(@d) )
AND MONTH(date) = IF( MONTH(@d) = 12, 1, MONTH(@d) + 1 )
AND WEEKDAY(date)=4;
Date of Monday in a given week of the year
The week number of a given date @d, when weeks are defined as starting on Mondays and when we agree to number weeks of the year from 1 through 53, is given by WEEK(@d, 2). Here is a way to get the date of Monday in that week:
set @d='2008-1-31';
select makedate( left(yearweek(@d),4),week( @d, 2 ) * 7 ) as 1stdayOfWeek;
+--------------+
| 1stdayOfWeek |
+--------------+
| 2008-01-28 |
+--------------+
set @d='2008-7-15';
select makedate( left(yearweek(@d),4),week( @d, 2 ) * 7 ) as 1stdayOfWeek;
+--------------+
| 1stdayOfWeek |
+--------------+
| 2008-07-14 |
+--------------+
Datetime difference
Find the difference between two datetime values in seconds, minutes, hours or days. If dt1 and dt2 are datetime values of the form 'yyyy-mm-dd hh:mm:ss', the number of seconds between dt1 and dt2 is
UNIX_TIMESTAMP( dt2 ) - UNIX_TIMESTAMP( dt1 )
To get the number of minutes divide by 60, for the number of hours divide by 3600, and for the number of days, divide by 3600 * 24.
Find available reservation periods
Given a bookings table where each row specifies one reservation period for one property, find the unbooked periods for a given property:
CREATE TABLE bookings( ID int, propertyID int, startDate date, endDate date );
INSERT INTO bookings VALUES
(1,1,'2007-1-1','2007-1.15'),
(2,1,'2007-1-20','2007-1.31'),
(3,1,'2007-2-10','2007-2-17');
SELECT * FROM bookings;
+------+------------+------------+------------+
| ID | propertyID | startDate | endDate |
+------+------------+------------+------------+
| 1