Download presentation
Presentation is loading. Please wait.
1
Architecture of modern data warehouse
Eugene Polonichko, Data Platform MVP, Pass Chapter Leader Architecture of modern data warehouse
2
Organizers Natalia Pogorelova Andriy Pogorelov Paul Stetsenko
3
Sponsors
4
About me Eugene Polonichko has over 7 years of experience with SQL Server. He mainly focused on BI projects (SSAS, SSIS, PowerBI, Cognos, Informatica PowerCenter, Pentaho, Tableau). Eugene is a passionate speaker and SQL community volunteer presenting regularly at PASS SQL Saturday events and local user groups around Ukraine and Europe. Eugene is PASS Chapter Leader and he has a status MVP Data Platform
5
Agenda Modern Data Warehouse Microsoft architecture
Traditional approach Modern Data warehouse Ten characteristics Microsoft architecture Microsoft Modern Data Warehouse Azure Data Factory Cosmos DB Storage Azure Databricks Azure DWH
6
Concept of modern data warehouse: Traditional approach
Data Intake Data Transformation & Storage Data Consumption & Presentation
7
Modern data warehouse
8
Modern Data Warehouse Ingest & Prep Model & Serve Visualize Store
12/2/2019 5:23 AM Modern Data Warehouse Ingest & Prep Model & Serve Visualize Logs (unstructured) Azure Data Factory Azure SQL Data Warehouse Power BI Code-free data ingestion from 85+ data integration connectors Media (unstructured) Azure Databricks (Prep-only) Up to 14x faster and costs 94% less than other cloud providers Leader in the Magic Quadrant for Business Intelligence and Analytics Platforms* At the foundation, customers can build a data lake to store all their data and different data types with Azure Data Lake Storage. To ingest data, customers can do so code-free with over 85 data integration connectors with Azure Data Factory. This empowers customers to do code-free ETL/ELT with any data from any source. Whether the data is in on-premises data sources, other Azure services, or other cloud services, customers can seamlessly author, monitor, and manage their big data pipelines with a visual environment that is easy to use. And once customer ingest that data, they can use Azure Databricks to shape the data formats and prep it using a Notebook—which makes internal collaboration on data more streamlined and efficient. Now, with the data stored, ingested, and prepared, customers can put their data into Azure SQL Data Warehouse. With SQL Data Warehouse, customers now have their data in a industry-leading data warehouse that is up to 14x faster and costs 94% less than other cloud providers. This enables customers to use a cloud data warehouse to handle petabyte-scale analytics workloads with industry-leading query performance and security. And finally, the combination of SQL Data Warehouse and Power BI enables customers to build visualizations on massive amounts of data and ensure that data insights are available to everyone across their organization. Files (unstructured) Business/ custom apps (structured) Store Azure Data Lake Storage High performance data lake available in all 54 Azure regions © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
9
Microsoft Modern Data Warehouse
Easily ingest live streaming data for an application using Apache Kafka cluster in Azure HDInsight Bring together all your structured data using Azure Data Factory to Azure Blob Storage. Take advantage of Azure Databricks to clean, transform, and analyze the streaming data, and combine it with structured data from operational databases or data warehouses. Use scalable machine learning/deep learning techniques, to derive deeper insights from this data using Python, R or Scala, with inbuilt notebook experiences in Azure Databricks. Leverage native connectors between Azure Databricks and Azure SQL Data Warehouse to access and move data at scale. Build analytical dashboards and embedded reports on top of Azure Data Warehouse to share insights within your organization and use Azure Analysis Services to serve this data to thousands of users. Power users take advantage of the inbuilt capabilities of Azure Databricks and Azure HDInsight to perform root cause determination and raw data analysis. Take the insights from Azure Databricks to Cosmos DB to make them accessible through real time apps.
10
Azure HDInsight Kafka Azure HDInsight is a managed, full- spectrum, open-source analytics service for enterprises. HDInsight is a cloud service that makes it easy, fast, and cost-effective to process massive amounts of data. HDInsight also supports a broad range of scenarios, like extract, transform, and load (ETL); data warehousing; machine learning; and IoT. Apache Kafka is an open-source, distributed streaming platform. It's often used as a message broker, as it provides functionality similar to a publish-subscribe message queue.
11
Azure Data Factory Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data.
12
Azure Data Factory Azure Self- hosted Azure - SSIS
13
Storage Azure Blob storage is Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. Blob storage offers three types of resources: The storage account. A container in the storage account A blob in a container
14
Azure Databricks Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. Designed with the founders of Apache Spark, Databricks is integrated with Azure to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
15
Azure Databricks
16
Azure Data Warehouse Azure SQL Data Warehouse is a massively parallel processing (MPP) cloud-based, scale-out, relational database capable of processing massive volumes of data. Combines the SQL Server relational database with Azure cloud scale-out capabilities. Decouples storage from compute. Enables increasing, decreasing, pausing, or resuming compute. Integrates across the Azure platform. Utilizes SQL Server Transact-SQL (T-SQL) and tools. Complies with various legal and business security requirements such as SOC and ISO.
17
Architecture of SQL Data Warehouse
Control node Compute nodes Azure storage Data Movement Service
18
Cosmos DB Azure Cosmos DB is a globally distributed, multi-model database service. Then learn how to replicate your data across any number of Azure regions and scale your throughput independent from your storage.
19
Visualization Power BI is a business analytics service by Microsoft. It aims to provide interactive visualizations and business intelligence capabilities with an interface simple enough for end users to create their own reports and dashboards
20
Links data-architecture components-of-a-modern-data-warehouse-a-glossary
21
Thank you https://www.linkedin.com/in/eugenepolonichko/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.