Cohort Retention Analysis is a powerful thing that most business owners need to look at.
Retention: If first-time user A goes to the store on Week 1, and returns to the store the next week, he is a returned user. If user B also goes to the store on Week 1 and does not return the next week, he's a bounced user, that basically means you lose him as a user (despite all the marketing money you spent to lure him to your store).
Cohort Analysis is a technique to see how variables change in different groups given different starting conditions (a cohort).
In this post, we'll show you how to build the below Cohort Retention:
The above chart tells you a few things:
- We cohort users by their signup month. This is the
- For each cohort, we show the population (how many people) in that cohort in the second column
- An activity is an action user did to the site (this could be simply view the page, or as specific as making a purchase). We denote this as table
We need the following tables and fields:
users: - id - timestamp activities: - user_id - timestamp
Bucketing Users Into Cohort
First we bucket them into different cohort by their sign up month, and store into
-- (user_id, cohort_month), each with cohort_items as ( select date_trunc('month', U.timestamp)::date as cohort_month, id as user_id from users U order by 1, 2 )
After that, we build
-- (user_id, month_number): user X has activity in month number X WITH user_activities as ( select A.user_id, MONTH_DIFF( date_trunc('month', A.timestamp)::date, C.cohort_month ) as month_number from public.activities A left join cohort_items C ON A.user_id = C.user_id group by 1, 2 )
The above would return all the pairs of
(user_id, month_number) that indicates if a user is active in that month after their original signup date.
| user | month_number | | Alex | 0 | | Alex | 1 | | Bob | 0 | | Bob | 2 |
MONTH_DIFF is a user-defined function that takes in 2 dates, and return the number of months between them.
Cohort Size: is simply how many users are in each group
-- (cohort_month, size) with cohort_size as ( select cohort_month, count(1) as num_users from cohort_items group by 1 order by 1 ),
And finally, putting them together with the below:
-- (cohort_month, month_number, cnt) with retention_table as ( select C.cohort_month, A.month_number, count(1) as num_users from user_activities A left join cohort_items C ON A.user_id = C.user_id group by 1, 2 ) -- our final value: (cohort_month, size, month_number, percentage) select B.cohort_month, S.num_users as total_users, B.month_number, B.num_users::float * 100 / S.num_users as percentage from retention_table B left join cohort_size S ON B.cohort_month = S.cohort_month order by 1, 3 where B.cohort_month IS NOT NULL
And you would get:
With Holistics, we can automatically turn them into beautiful cohort table like so:
The Full SQL
Here's the full SQL for your reference. Note: you just need to change the table names and column names inside the
first 2 WITH clauses, and the rest should work as per normal. [View the full SQL (https://gist.github.com/nvquanghuy/bd0fda7b88e5b2fd8e46e047e391d25f)
— — — — — — — — — — — —
Having problems finding a simple and affordable data reporting system for your startups? Check us out at holistics.io.
Subscribe to Holistics Blog - Business Intelligence & Data Engineering
Get the latest posts delivered right to your inbox