/ Analytics

Combining Data Sources: The Holistics Approach

This post is in response to the article: Combining Data Sources: Approaches and Considerations, aiming to provide the approach and philosophy adopted by Holistics in its suite of Business Intelligence Products.

The philosophy that Holistics has adopted in designing its business intelligence products and features is one that does not store a customer’s source data. From our experience, we have adopted this approach for the following reasons:

  1. Storing data in an intermediate, abstracted cache poses data security concerns
  2. Reporting usage can change quickly, and users should have the flexibility and agility to be in control of their data because they understand business requirements best
  3. Developing a standardized technical solution to optimize intermediate storage, whether in-memory or in a database, may not be suitable for all use cases. This could compromise the experience of a few customers in order to satisfy others’
  4. The cheapest method of data duplication to achieve performance is replicating in a database, where the storage cost is lower compared to in-memory or in a hosted service.

We are firm believers in the creation of a data warehouse that contain data marts catering to different use cases and departments. This article explores the products and features that empower our customers to build and manage their data warehouses.

A typical Customer Scenario

KPIs and Dashboards

Start-ups typically begin their Business Intelligence journey with a few reports covering a snapshot of KPIs presented to founders or heads of departments.

More data requests

Subsequently, these requests start to increase both in terms of frequency and variety, thereby demanding data from multiple different sources, such as Facebook, CRM Software, and internal databases.

Initially, this is satisfied by downloading snippets of information as Excel Files and subsequently creating Charts or Pivot Tables after hours processing in Excel. However, the strategic and scalable solution requires a data warehouse and automated workflows combining information from all the relevant sources.

Holistics Data Preparation: Data Imports

Holistics enable a data analyst to build a data warehouse by bringing together data from multiple types of sources. Data Imports enables the sourcing of data from the following sources:

holistics_data_import

The data from these sources can be populated into your data warehouse, hosted on Cloud databases such as Redshift or on-premise platforms, as intermediary, “staging” Tables. Thus, Holistics does not store your source data and empowers you to manage your data warehouse.

select_data_source

Now, all these sources collect and maintain data in very different ways, ranging from automatically updated Database tables to manually updated rows of data on a Google Spreadsheet on an adhoc basis. Thus, in the “New Import” wizard, one can specify the method of importing the data as per below:

import_type

An example for each case is as below:

1. Full Data Import

Replacing the operational cost data specified by finance in CSV, Excel spreadsheets or Google Sheets. Data may vary from week to week.

2. Append/Insert

Orders or book data that grow each day, and previously imported data does not change

3. Incremental/Upsert

Customer information, such as Address, Phone Numbers, which are subjected to change

Until now, all the features outlined are offered by other products that enable the sourcing of data. Over and above the sourcing capability, Holistics provides additional capabilities, as below.

Remove Columns

Some data points present in Operational systems may not be required in a data warehouse, such as Customer Identifying information that is not required in KPI analytics. With a simple click of a button, exclude columns that are not required in the target table:

remove_column

Derive New Columns

Create new columns based on a mathematical or logical operation on existing columns. The Import wizard lets the user specify a Source Expression and not just a source column, thereby empowering an analyst to do more than plain sourcing.

derive_new_columns

Filter, Optimize, and Take control

Sometimes, the operational system may generate or store a lot of data (in the order of millions or billions of rows). However, query performance or business requirements may dictate that only a subset of data be sourced. Thus, the filter criteria can be specified under “Apply Condition”

post-import-query

Lastly, the “Post Import query” provides the option to execute a SQL query upon a successful data import. One can leverage this DELETE, UPDATE, or INSERT rows into another table, and the query will be event-driven in its execution. In the example above, the “Post Import query” deletes rows that are more than 1 month old to conserve the storage space on a large table.

Scheduling

The Import job can be scheduled with just a few clicks, and Holistics offers many options on the frequency : Daily, Weekly, Monthly etc. Thus, the job can run on a weekend or during low-traffic hours so that the query execution on the operational database doesn’t impact your business. Automate imports from multiple sources, and free yourself from the mundane tasks associated with data refresh.

scheduling-1

Once the schedule is set-up, it appears on the Import Job screen as below:

scheduling_screen

Manage Executions

The Data Imports homepage provides a view into all the jobs executed, including relevant information such as query execution time, status, and logs for failure analysis.

execution_history

In sum, the Data Imports product provides a comprehensive data extraction capability that is over and above connectivity to multiple data sources.

Conclusion

Building a data warehouse provides a strategic capability that can empower your organization. However, various aspects of building and using a data warehouse have been fragmented, and companies typically use different tools for ETL and Reporting that are sometimes incompatible. Holistics, with its end-to-end offering of Data Preparation (ETL) and Data Reporting (Visualization and Reporting) helps your company get more from your data for less.

Do you already have a data warehouse but face issues with processing millions of rows of data in your reports?

Holistics “Data Transforms” product helps you handle the heavy-lifting behind reporting solutions.

From SQL Queries To Beautiful Charts

Connect to your database and build beautiful charts with Holistics BI

Learn More

"Holistics is the solution to the increasingly many and complex data requests from the operational teams"


Tang Yee Jie

Senior Data Analyst, Grab