Data Lake vs Data Warehouse vs Data Mart

The term "Data Lake", "Data Warehouse" and "Data Mart" are often times used interchangbly. But what are exactly the differences between these things? This post attempts to help explain the similarity, the difference and when to use each.

A high-level comparison of these three constructs is as below:


Data Lake

A data lake is the place where you dump all forms of data generated in various parts of your business: structured data feeds, chat logs, emails, images (of invoices, receipts, checks etc.), and videos. The data collection routines does not filter any information out; data related to canceled, returned, and invalidated transactions will also be captured, for instance.

Why data lake?

A data lake is relevant in two contexts:

1- Your organization is so big and your product does so many functions that there are many possible ways to analyze data to improve the business. Thus, you need a cheap way to store different types of data in large quantities.

Eg. Twitter in the B2C space (They have text (Tweets), Images, Videos, Links, Direct Messages, Live Streams, etc.), and Square (B2B) (Transactions, Returns, Refunds, Customer Signatures, Logon IDs etc.).

2- You don't have a plan for what to do with the data, but you have a strong intent to use it at some point. Thus, you collect data first and analyze later.

Also, the volume is so high that traditional DBs might take hours if not days to run a single query. So, having it in a Massively Parallel Processor (MPP) infrastructure helps you analyze the data comparatively quickly.

Data Warehouse

A data warehouse usually only stores data that's already modeled/structured.

A Data Warehouse is multi-purpose and meant for all different use-cases. It doesn’t take into account the nuances of requirements from a specific business unit or function. As an example, let’s take a Finance Department at a company. They care about a few metrics, such as Profits, Costs, and Revenues to advise management on decisions, and not about others that Marketing & Sales would care about. Even if there are overlaps, the definitions could be different.

Data Mart

While a data-warehouse is a multi-purpose storage for different use cases, a data-mart is a subsection of the data-warehouse, designed and built specifically for a particular department/business function.

Some benefits of using a data-mart:

  • Isolated Security: Since the data-mart only contains data specific to that department, you are assured that no unintended data access (finance data, revenue data) are physically possible.

  • Isolated Performance: Similarly, since each data-mart is only used for particular department, the performance load is well managed and communicated within the department, thus not affecting other analytical workloads.

Huy Nguyen

Huy Nguyen

Original creator and cofounder of Holistics, a data platform for tech companies. Holistics’ customers are tech startups like Grab, Traveloka, ShopBack, 99co, Tech In Asia and alike.

Read More

From SQL Queries To Beautiful Charts

Connect to your database and build beautiful charts with Holistics BI

Learn More

"Holistics is the solution to the increasingly many and complex data requests from the operational teams"

Tang Yee Jie

Senior Data Analyst, Grab