Case Study: Designing a Data Ecosystem and Creating an Automated Data Pipeline

Background

In this case study, we'll explore how we designed a relational database and created an automated data pipeline to migrate and flatten the data to an OLAP server. The data was then used for large querying by outside clients, and SSRS reporting was used for in-house analysis.

Designing the Relational and OLAP Databases

The first step in this project was to design a relational database. The company had a large amount of data spread across various systems and databases, so it was important to identify the entities and relationships involved in the data.

The team broke down the data into smaller, more manageable pieces and identified how they were related to each other. After many hours of deliberation, we then created tables and defined the relationships between them using primary and foreign keys. Many debates where had with regards to indexes and query analyzer outputs. In the end we came up with a model that addressed all concerns

Creating the Automated Data Pipeline

High Level Requirements

  • Large amounts of data (possible millions of rows) could be queried at any time
  • Clients need fast responses when running queries
  • The data needs to be in as "real-time" as possible for analyzing

Once the relational database was designed, the team set up an automated data pipeline to move the data from various sources to an OLAP server. We used SSIS for ETL to extract the data, transform it into a "flattened" format that could be used by the OLAP server, and load it into the destination system.

The automated data pipeline satisfied several requirements by allowing the company to move large amounts of data quickly and efficiently, without the need for manual intervention. This saved time and reduced the risk of errors caused by manual processes.

Using the Data for Large Querying and SSRS Reporting

With the data migrated to the OLAP server, outside clients were able to query the data for large-scale analysis. The company also used SSRS reporting for in-house analysis, allowing users to create reports based on the data stored in the OLAP server.

The insights gained from the data helped drive business decisions and improve overall performance.

Conclusion

Designing a relational database and creating an automated data pipeline can be complex tasks, but they can provide valuable insights and analysis to help drive business decisions. By taking a careful approach to designing the database and setting up an automated data pipeline, companies can save time and reduce errors while gaining valuable insights from their data.

Description

An automated Data pipeline from a relational database to an OLAP Server.