/ SQL

Quick Guide: Building Daily + Running Total Report With SQL

In this article I will show you how to build daily user growth report with cumulative number (running total) using mostly SQL, and filter capabilities of Holistics platform.

Requirements: We want to build a report that:

  • Show daily users and a cumulative user growth
  • Ability to select a date range
  • Ability to filter by some user's conditions (gender, user type, etc)

First, The Dataset & A Simple Daily Count

Assume we work with this simple users table in PosgresSQL (fields like username, email are omitted):

marketplace.users
- id (pk)
- gender (varchar)
- sign_up_date (date)

Let's start by building a simple daily count report with visualization like below. Report viewers have ability to select a date period to see only data for that range.

Here we use Holistics' date filters feature to add 2 date pickers (start date and end date), the user's inputs are then turned into variables {{start_date}} and {{end_date}} and inject into the query.

SELECT
  sign_up_date as "Date",
  count(1) as "Daily Users"
FROM marketplace.users U
WHERE sign_up_date >= {{start_date}}
  AND sign_up_date <= {{end_date}}
GROUP BY 1

Add Cumulative Column

That was simple, now, let's add a new cumulative column tell people how many total users do we have up to a particular day.

We do this by using PostgreSQL's famous Window Functions capability:

SELECT
  sign_up_date as "Date",
  COUNT(1) as "Daily Users",
  SUM(COUNT(1) OVER (ORDER BY sign_up_date)) AS "Cumulative Users" -- accumulating signups
FROM marketplace.users U
WHERE sign_up_date >= {{start_date}}
  AND sign_up_date <= {{end_date}}
GROUP BY 1

Charting this out with dual-axis and dual chart type, we will get a nice chart with
both daily number and cumulative number.

Caveat 1: Handle Days With No Data

There's a problem with the above query: days with no signups are not shown in the result table. This is considered a bug in our query, and needs to be addressed. We do want that particular day to report 0 sign ups, instead of missing out the value completely. Look at the image from step 1, day March 18 is missing from the resultset table.

We fix this by generate a list of all dates between [start_date, end_date] (using PostgreSQL's generate_series function) and use that list as a base to combine with our daily user count data.

Now our SQL query starts to get complicated:

-- date ranges for [start_date, end_date]
WITH date_ranges AS (
  SELECT date_d::date as date_d
  from GENERATE_SERIES({{start_date}}::DATE, {{end_date}}::DATE, '1 day'::INTERVAL) date_d
),
user_counts AS (
  SELECT
    sign_up_date as date_d,
    count(1) as cnt
  FROM marketplace.users U
  WHERE sign_up_date >= {{start_date}}
    AND sign_up_date <= {{end_date}}
  GROUP BY 1
)
SELECT
  D.date_d AS "Date",
  COALESCE(UC.cnt, 0) as "Daily Users",
  SUM(COALESCE(UC.cnt, 0)) OVER (ORDER BY D.date_d)) as "Cumulative Users"
FROM date_ranges D
LEFT JOIN user_counts UC ON UC.date_d = D.date_d

A few notes here:

  • We define a user_counts CTE to return daily user counts
  • We define a date_ranges CTE to return all dates within a date range.
  • We move the accumulating sum outside to the main SELECT

Caveat 2: Calculating Prior Total

Another thing looked wrong in the above image: the accumulating sums only sum for the selected date period, instead of taking into account the total user signups from the beginning of time.

To fix this, we need to perform a one-time calculation of the total users number prior to the start_date specified. We do so with a CTE, and then add that to the running total number, like so:

-- CTE to calculate prior total
WITH sums AS (
  SELECT COUNT(1) as total_prior
  FROM marketplace.users U
  WHERE sign_up_date >= {{start_date}}
),
date_ranges AS (
  SELECT date_d::date as date_d
  from generate_series({{start_date}}::DATE, {{end_date}}::DATE, '1 day'::INTERVAL) date_d
),
user_counts AS (
  SELECT
    sign_up_date as date_d,
    count(1) as cnt
  FROM marketplace.users U
  WHERE sign_up_date >= {{start_date}}
    AND sign_up_date <= {{end_date}}
  GROUP BY 1
)
SELECT
  D.date_d AS "Date",
  COALESCE(UC.cnt, 0) as "Daily Users",
  sums.total_prior + (SUM(COALESCE(UC.cnt, 0)) OVER (ORDER BY D.date_d)) as "Cumulative Users"
FROM date_ranges D
LEFT JOIN user_counts UC ON UC.date_d = D.date_d
CROSS JOIN sums

Adding a few filters like Age, Gender, etc and we get our final result, just like the first GIF image.

Summary

As you saw, in this post we shown you how to do a simple daily count + cumulative report using the power and flexibility of SQL. We address the few points:

  • Calculate running total with window functions
  • Avoid empty-row days with generated date ranges
  • Calculate sums prior to selected date range, and add that to the running total.

You can play around with the real report here.

Try On Your Own Data

If you want to build similar report for your own data, you can do so with Holistics with 3 simple steps:

  1. Sign Up Free Trial
  2. Connect To Your SQL Database (instruction within app)
  3. Build The Report With Above Query

Holistics provides an online interface for data analysts to create automated reports/dashboards to share with your business users through dynamic SQL queries.

Building a data reporting platform for your company? Give us a try!

Huy Nguyen

Huy Nguyen

Original creator and cofounder of Holistics, a data platform for tech companies. Holistics’ customers are tech startups like Grab, Traveloka, ShopBack, 99co, Tech In Asia and alike.

Read More