ETL and Tableau

In Business Intelligence (BI) data becomes information, which results in added value to an enterprise. It helps to steer the organization and to improve processes and effectiveness, thus saving costs and increasing revenue and becoming more competitive. Having said that, this challenge shifts to the IT-department, which is tasked with the job of delivering a lean and mean platform that can service this. How does Tableau fit in this picture and what about To Datawarehouse or not to Datawarehouse?
Real Time data
When using BI we are looking at our data and most important: the conclusions, correlations and exceptions we are pointed out to. How real-time must this information be? This varies over the subject areas we are looking at. Your usual suspect being the Profit & Loss or the Balance Sheet can easily be a day (or maybe even a month) old. Your historical data won’t change, so this needs no updates. What about your sick-leave? Well. This is registered afterwards, so… What about your factory? How much is produced today? Does this need to be real-time? But what about the current state of a machine in a production line? What if it breaks down or gets overheated? This must be real-time. The question is whether a BI-solution is the perfect platform to indicate downtime of a machine. May I suggest a red flashìng light on top of that device?
Not Real Time data
In any BI-solution (near) real-time data is mostly not mandatory. Hence we can take the data source of our Business Intelligence offline and have non real-time data. This will at least refrain from stressing our operational database. Any offline subject area or data warehouse can do the job. When we look at our ERP-system it’s most likely that invoicing and financing and posting processes will run at night. Therefore BI on our revenue and balance sheet and profit & loss will be max 24 hours old. No need to stress our infrastructure to enforce real-time data. Things become different when we look at companies that are so reliant on real-time data, such as stock brokers or trading in commodities. These companies find themselves struggling against minute by minute changes in exchange rates, market prices and long or short position and many more. Traders need to respond by buying or selling on the second and must have real-time insight.
To Data Warehouse or not?
Remind me, please. What are the reasons for a data warehouse? How is this a solution or how is this an ICT-driven provisioning? In larger organizations it’s likely to find a data warehouse for multiple reasons, such as combining data from multiple sources, keeping historical data that can be purged from the originating system and creating snapshots of data over time. If BI is pulling the data from the data warehouse this can never be real time. What is real time anyway? A speed or temperature meter can be considered real time. Traffic information can never be real-time enough. But what about Revenue or Position or whatever data in the (ERP)system. Yes, this is transactional data, but what if the back office employee makes a deal over the phone, goes to have lunch first and afterwards enters this contract into the system (which takes about seven minutes to do) and then hits the save-button. How real-time is that data actually? The argument that a user must see changes made to the ERP-system real-time in a dashboard might help in having confidence in the system but that is not a serious business case from functional perspective.
Multiple sources
When multiple systems and separate data sources are completing the larger picture a merge of data is required in a BI solution. Ideally one single BI application services this. An Extract Transform & Load (ETL) tool can do this merger and create one single version of the truth while uniforming the definition of the information. Especially when the source data is not “clean” an ETL tool can do the generic repeatable work. The trick is to avoid messy and error-prone exercises in Excel.

Image: combining datasources in Tableau Prep
Data governance
Most likely the employees in and around the company that are using any strategic, tactical or operation information from a BI-solution are not necessarily the end-users of the source system. For example: a sales manager inquiring on the sales figures of his customers is not likely to be the same person entering and processing the sales orders. But how to manage and secure this information a way that is GDPR-compliant? Any BI-solution brings information that might be sensitive on the level of privacy and segregation of duties. This requires data governance which translates into a security model.
Data federation
When real-time data is a must-have it will most likely be for actual transactions or exceptions only. Everything that happened in the balance sheet before this month can be determined as historical static data. An ETL can help federating data across historical static and today’s transactions. The safe assumption is also that for historical data does not require a low level granularity of details that current and actual transactions or exceptions need. Therefore ETL can do the preparation of combining the aggregation of old data, with dense data from the past month, with details from this week, with actual data from today. This is what we call data federation.
Tableau
When looking at Tableau we find mainly two types of connecting data: live or extract. This choice can have a big impact on performance. For a small set of data this is not an issue. When larger sets of data come into play we still want to offer good performance. When Tableau extracts data it creates compressed dense hyper-files that are designed for high performance. These hyper-files can for instance be created by Tableau Prep, making this the dedicated ETL-tool to go with Tableau. In Tableau Server we can schedule the refresh of these hyper-files up to once every 15 minutes. For any ERP-system a refresh once a day will suit most requirements. When connecting to multiple source systems, including Excel-files, CRM-systems and HR-systems Tableau Prep can easily combine these into single hyper-files or break these down into multiple hyper-files. In general a rule of thumb is to have a hyper-file per genuine subject area (i.e. Accounts Receivables) containing all necessary measures, facts, dimensions and attributes. Combining data across hyper-files can be achieved by Data Blending in Tableau, or (preferably) by creating a new hyper-file that combines data (i.e. Sales and Purchases).

Image: ETL flow in Tableau Prep
Finally
In general Tableau is best suited working with super-fast and highly compressed hyper-files. This implies ETL for instance by using Tableau Prep. When (near) real-time data is a must these sources can be connected from Tableau using a live data connection. The combination of both is very well feasible. But that must-have is probably only confined to certain subject areas and or functional disciplines within the organization. The absolute requirement for real-time data is something that needs a functional business case of which in my humble opinion most of those can be challenged.

Jelle Huisman
Managing Partner
Real-time or not? Smart BI data management
Not all data needs to be real-time. Tableau blends speed and efficiency with powerful ETL tools for optimal performance and insights.