Anyone who has ever worked on a data warehouse project knows that it can be an excruciating endeavor, taking years to complete and costing millions of dollars. So, it’s no wonder that one of Qlik’s greatest features is bringing multiple and disparate data sources together without the need of a data warehouse. Even with data warehouses, customers often pull directly from data sources to avoid the time lag created by passing data through a data warehouse.
So, how easy is it to build robust BI applications without a data warehouse?
Typically, a user pulls from 1 to 5 different sources to build a single application. Data can come from ERP systems, cloud services, or other manual files. The process to bring these data files together into a single data model is referred to as ETL or Extract-Transform-Load. Although there can be challenges in the Extract and Load, the most difficult task is Transform – the scripts that run unattended that “transform” data into standard formats and/or fill in missing or corrupted data so that it becomes usable for the end user.
On the other end of the spectrum, Qlik has worked with customers who have over 10 times the typical number of data sources, all to look like they came from a single source to be utilized in a single application. In other words, data warehouse replacement! I was curious to hear firsthand how we have tackled such complex projects, so I talked with Peter Adams, a Qlik Consultant who has worked on over 35 customer projects of all sizes.
Peter told me about his most extreme data integration project. The customer had grown rapidly through acquisitions and now needed an enterprise view of their business. He was tasked with integrating 50+ sources, each with multiple files, resulting in over 300 data files that needed to be ETL’d into a single data model.
Peter shared that for any size data integration project, the biggest challenge is always data inconsistencies. In this particular case, the problem was exponentially more difficult because the creation and delivery of data files was not automated, which created the opportunity for changes in the data without warning. So, in addition to building an ETL to support disparate data, he had to accommodate warning the end users when data had changed to the point it could no longer be transformed successfully and, therefore, was no longer useful until resolved.
Another unique challenge Peter faced in this project was replicating ERP “matching” functionality as part of transformation - matching receipts against order quantities to calculate remaining open order amounts, by line item. For this, he had to run the data through multiple scripts to break down single receipt records to correspond to the quantities in the order table.
In total, Peter built approximately 100 scripts in the ETL. But what did that do to performance when processing production-level volumes of data? Performance was at the forefront while designing and building the ETL. But Peter’s approach is to always balance performance with ease of maintainability, especially in this case with the risk of data sources changing unexpectedly. Once Peter had deployed incremental loading on the longest running portions of the ETL, the time to load was within the customer’s desired parameters while still ensuring they can continue to adapt to changes without our help.
So, was it all worth it?
In the end, although this project took a bit more time you would consider “typical,” the customer saved 75% to 90% in time and money over building a data warehouse. Qlik Consulting’s experience and skills helped design and build a very complex ETL successfully, not only saving the customer money, but also giving them insights to their data much earlier that will lead toward additional ROI!
To learn more on how Qlik Consulting can help you integrate multiple data sources to achieve even greater insights, visit qlik.com/consulting.