A Government Department
• CASE STUDY 01 •
Using BigQuery, Dataform, and Datastudio for creating ETL pipelines for more advanced analysis and reporting
Problem
Solution
Outcome
• CASE STUDY 01 •
The Problem
The Client contacted us with concerns about their batch and streaming data ingestion. They approached us to devise a plan to make their data ingestion service more scalable, more stable and more powerful. They need real time dashboards for reporting to their stakeholders and they were scaling up their team so collaboration was all important for communication and data quality.
The Client had 3 primary event-based sources of data and all of them emitted at least thousands of new rows of data each hour which was becoming really hard to manage with our client’s in-house database server. The Department also had static data sitting within various places in the organisation that also needed ingesting in a more automated manner.
According to our Client, the main drawbacks of the existing system were the lack of scalability, data processing speed and power, as well as failing to generate real time analytical reports without delay, which would allow our clients to gather valuable business insights to propel the business forward.
• CASE STUDY 01 •
Solution
Within the 2 weeks, we had several meetings with the Client’s various teams to understand what sort of data were being streamed, how they needed to be processed, where the static data, in what format it was stored and what type of reports the client wanted us to build.
As our Client had 3 primary sources which were streaming a tremendous amount of data, setting up a data warehouse and creating ETL pipelines became inevitable.
After thorough inspection of the client’s requirements we decided to provide our solution revolving around Google BigQuery.
We used Google Dataform to enhance data management, ensuring data quality, better data transformation and versioning, while boosting collaboration across several teams throughout the company.
Google Datastudio was chosen instantly as it is free and integrates really well with BigQuery and thus provided a greater edge for reporting.
• CASE STUDY 01 •
Outcome
Now our client can have the luxury of combining multiple data sources into a single, consistent data store that is loaded into a data warehouse. Excellent reporting systems were built using the stored data without any hassle. The overall data management process became easy and precise, which eliminated our client’s need to hire a team of dedicated data engineers to oversee the entire data transformation process.
The team could proceed to increase staff count whilst collaboratively working on SQL code. The communication was now seamless and analysts had a clear view of everyone’s code in realtime.
The time to build reports was reduced by 3x, due to the seamless streaming of data from Bigquery and the quality data assurance with dataform
Dataform provided a single source of truth for data. Which means multiple analyst creating multiple tables, creating different results were no more. Now, the Dataform platform has a single code base so all analysts could work on the same code. Using Dataform’s Git integration, the team could review changes and push the code to the Master repository which would be sent to the pipeline to build the reports.
Dataform used data assertion to identify errors within the pipeline before they were pushed into the reports. By identifying the errors quickly, the team could fix the errors before it reached the reports. Therefore less time was used on quality issues and more time analysing the results.