Sum for time periods

from the Artful Common Queries page


A table tracks attendance at some location:

drop table if exists t;
create table t(interval_id int,start datetime,end datetime, att int);
insert into t values
(1,'2007-01-01 08:00:00','2007-01-01 12:00:00',5 ),
(2,'2007-01-01 13:00:00','2007-01-01 17:00:00',10),
(3,'2007-01-01 10:00:00','2007-01-01 15:00:00',15),
(4,'2007-01-01 14:00:00','2007-03-07 19:00:00',20);
select * from t;
+-------------+---------------------+---------------------+------+
| interval_id | start               | end                 | att  |
+-------------+---------------------+---------------------+------+
|           1 | 2007-01-01 08:00:00 | 2007-01-01 12:00:00 |    5 |
|           2 | 2007-01-01 13:00:00 | 2007-01-01 17:00:00 |   10 |
|           3 | 2007-01-01 10:00:00 | 2007-01-01 15:00:00 |   15 |
|           4 | 2007-01-01 14:00:00 | 2007-03-07 19:00:00 |   20 |
+-------------+---------------------+---------------------+------+

In this table, att is a delta: it tracks entrances. Actual attendance is SUM(att) at any given moment. For example, if the attendance sum is x at a given moment, then after we add a row with att=y and start/end datetimes embracing that moment, attendance will be x+y. So to retrieve total attendance at 01330h on 1 Jan 2007, we write:

SELECT SUM(att) 
FROM t 
WHERE t.start <= '2007-01-01 13:30:00' AND t.end >= '2007-01-01 13:30:00';
+----------+
| SUM(att) |
+----------+
|       25 |
+----------+

Then how would we extract maximum attendance during a given period, for example, maximum attendance between 1300h and 1700h?

SQL does not deal efficiently with time. Some SQL dialects offer time series enhancements to the language; MySQL does not.

And, querying time series data for aggregate statistics gets complicated very quickly.

It gets a bit simpler with a calendar table that has a row for every possible datetime value. For our example, assume a granularity of one hour and a query period of one day. Naturally a real system would require a range of dates and perhaps a finer time granularity:

create table cal(id int,dt datetime);
insert into cal values(1,'2007-1-1 01:00:00');
insert into cal values(2,'2007-1-1 02:00:00');
insert into cal values(3,'2007-1-1 03:00:00');
insert into cal values(4,'2007-1-1 04:00:00');
insert into cal values(5,'2007-1-1 05:00:00');
insert into cal values(6,'2007-1-1 06:00:00');
insert into cal values(7,'2007-1-1 07:00:00');
insert into cal values(8,'2007-1-1 08:00:00');
insert into cal values(9,'2007-1-1 09:00:00');
insert into cal values(10,'2007-1-1 10:00:00');
insert into cal values(11,'2007-1-1 11:00:00');
insert into cal values(12,'2007-1-1 12:00:00');
insert into cal values(13,'2007-1-1 13:00:00');
insert into cal values(14,'2007-1-1 14:00:00');
insert into cal values(15,'2007-1-1 15:00:00');
insert into cal values(16,'2007-1-1 16:00:00');
insert into cal values(17,'2007-1-1 17:00:00');
insert into cal values(18,'2007-1-1 18:00:00');
insert into cal values(19,'2007-1-1 19:00:00');
insert into cal values(20,'2007-1-1 20:00:00');
insert into cal values(21,'2007-1-1 21:00:00');
insert into cal values(22,'2007-1-1 22:00:00');
insert into cal values(23,'2007-1-1 23:00:00');
insert into cal values(24,'2007-1-1 24:00:00');

To accumulate the maximum attendance sum, collect target values for defined periods in an inner query, and sum them from the outer query:

SELECT SUM( att ) 
FROM (
  SELECT
    t.start AS PeriodStart,
    t.end AS PeriodEnd,
    MIN(cal.dt) + INTERVAL 1 HOUR AS CountBegin,
    MAX(cal.dt) AS CountEnd,
    t.att                   
  FROM t
  JOIN cal ON cal.dt >= t.start AND cal.dt < t.end
  GROUP BY PeriodStart, PeriodEnd
  HAVING CountBegin < '2007-01-01 17:00:00' AND CountEnd > '2007-01-01 11:00:00'
) AS periods;
+------------+
| SUM( att ) |
+------------+
|         45 |
+------------+

If the data is more complicated, eg if we also need to track exits, the period logic needs refinement but the principle remains the same.

Here is a variation on the period sums problem posted in a MySQL forum. Multiple daily measurements are taken at variable times, and must be summed for the 24-hour period starting at 4pm each day.

drop table if exists t;
create table t( id smallint, Name char(16), timestamp timestamp(3), Value1 decimal(20,8), Value2 decimal(20,8) );
insert into t values
(1, 'Marco', '2009-05-04 16:01:01.582',    0.0000000, -0.0000400 ),
(2, 'James' , '2009-05-04 23:01:01.582', -0.0001400, 0.0000000 ),
(3, 'Louise', '2009-05-05 00:00:00.833', -0.0001100, -0.0001400 ),
(4, 'Martin', '2009-05-05 01:01:01.582', -0.0000400, -0.0001100 ),
(5, 'Jamie', '2009-05-05 02:01:01.582',    0.0001100, -0.0000400 ),
(6, 'Cecilie', '2009-05-05 03:01:01.582', 0.0000500, 0.0001100 ),
(7, 'Ben', '2009-05-05 15:01:01.582', 0.0000000, 0.0000500 ),
(8, 'Amanda', '2009-05-05 16:01:01.582', -0.0000800, 0.0000000 ),
(9, 'Ida', '2009-05-05 23:01:01.582', 0.0000000, -0.0000800 );

How to identify values as belonging within the desired intervals? A summing period begins at 4pm on a given date, and ends at 15:59:59.999 on the next day, so for a given day, the time boundaries are ...

start: concat( date(timestamp), ' 16:00:00:000' )
stop: concat( date(timestamp)+interval 1 day, ' 15:59:59:999' )

For optimal performance, write a simple query that creates such a "calendar" table so you can efficiently join from it to your data table, eg ...

drop table if exists cal; 
create table cal 
  select distinct 
  concat( date(timestamp), ' 16:00:00:000' ) as tbegin, 
  concat(date(timestamp)+interval 1 day, ' 15:59:59:999' ) as tend 
  from t; 

Here we derive the cal table dynamically.

Once we have a SQL representation of the summing periods, ie as the columns cal.tbegin and cal.tend, all we need do is join from the calendar to the data table on timestamp between cal.tbegin and cal.tend, and Group By tbegin, tend:

select cal.tbegin, cal.tend, sum(vals.value1), sum(vals.value2)
from (
  select distinct
    concat( date(timestamp), ' 16:00:00:000' ) as tbegin, 
    concat(date(timestamp)+interval 1 day, ' 15:59:59:999' ) as tend
  from t
) as cal
join (
  select timestamp,value1,value2
  from t
) as vals on timestamp between cal.tbegin and cal.tend
group by cal.tbegin, cal.tend;

What if the summing periods may be multiple days? That complicates construction of the "calendar". Add a row for a multi-date summing period ...

insert into t values (10, 'Nino', '2009-07-04 17:01:01.582',    0.0005000, -0.0006000 );

The rows that mark summing boundaries are rows with ...

(i) the smallest date > the current row's date, and
(ii) the smallest time value >= 4pm ...

In SQL that's ...

select date(a.timestamp) as startdate, min(b.timestamp) as nextdate
from t as a
left join t as b on date(b.timestamp) > date(a.timestamp) and time(b.timestamp) >= '16:00:00'
group by startdate;
+------------+-------------------------+
| startdate  | nextdate                |
+------------+-------------------------+
| 2009-05-04 | 2009-05-05 16:01:01.582 |
| 2009-05-05 | 2009-07-04 17:01:01.582 |
| 2009-07-04 | NULL                    |
+------------+-------------------------+

Then the summing boundaries are those dates except for the last nextdate, which is necessarily null, and which we need to replace with the date of the following day. So the summing boundaries are ...

select 
  concat( tmp.startdate, ' 16:00:00.000' ) as tbegin,
  concat( ifnull(tmp.nextdate,date(tmp.startdate)+interval 1 day), ' 15:59:59.999' ) as tend
from (
  select date(a.timestamp) as startdate, date( min(b.timestamp) ) as nextdate
  from t as a
  left join t as b on date(b.timestamp) > date(a.timestamp) and time(b.timestamp) >= '16:00:00'
  group by startdate
) as tmp;

... so create a temp calendar table from that query ...

drop table if exists cal;
create table cal
select 
  concat( tmp.startdate, ' 16:00:00.000' ) as tbegin,
  concat( ifnull(tmp.nextdate,date(tmp.startdate)+interval 1 day), ' 15:59:59.999' ) as tend
from (
  select date(a.timestamp) as startdate, date( min(b.timestamp) ) as nextdate
  from t as a
  left join t as b on date(b.timestamp) > date(a.timestamp) and time(b.timestamp) >= '16:00:00'
  group by startdate
) as tmp;

... and the query becomes ..

select cal.tbegin, cal.tend, sum(vals.value1) as val1sum, sum(vals.value2) as val2sum
from cal 
join ( 
  select timestamp,value1,value2 
  from t 
) as vals on timestamp between cal.tbegin and cal.tend 
group by cal.tbegin, cal.tend; 
+-------------------------+-------------------------+------------------+------------------+
| tbegin                  | tend                    | sum(vals.value1) | sum(vals.value2) |
+-------------------------+-------------------------+------------------+------------------+
| 2009-05-04 16:00:00.000 | 2009-05-05 15:59:59.999 |      -0.00013000 |      -0.00017000 |
| 2009-05-05 16:00:00.000 | 2009-07-04 15:59:59.999 |      -0.00008000 |      -0.00008000 |
| 2009-07-04 16:00:00.000 | 2009-07-05 15:59:59.999 |       0.00050000 |      -0.00060000 |
+-------------------------+-------------------------+------------------+------------------+


Return to the Artful Common Queries page