The Modern Data Warehouse and Azure

Slides:



Advertisements
Similar presentations
Andy Roberts Data Architect
Advertisements

AZ PASS User Group Azure Data Factory Overview Josh Sivey, Solution Partner October
Internal Modern Data Platform Somnath Data Platform Architect.
Energy Management Solution
11/19/2017 9:41 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
BUILD BIG DATA ENTERPRISE SOLUTIONS FASTER ON AZURE HDINSIGHT
Connected Infrastructure
Run Azure Services in your datacenter
Building ARM IaaS Application Environment
4/18/2018 6:56 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Data Platform and Analytics Foundational Training
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Connected Living Connected Living What to look for Architecture
Data Platform and Analytics Foundational Training
Data Platform and Analytics Foundational Training
System Center Marketing
Creating Enterprise Grade BI Models with Azure Analysis Services
Using a Gateway to Leverage On-Premises Data in Power BI
Using a Gateway to Leverage On-Premises data in Power BI
ADF & SSIS: New Capabilities for Data Integration in the Cloud
Connected Living Connected Living What to look for Architecture
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Microsoft Ignite /22/2018 3:27 PM BRK2121
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
Connected Infrastructure
Building Analytics At Scale With USQL and C#
9/6/2018 7:14 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Energy Management Solution
Using a Gateway to Leverage On-Premises Data in Power BI
Add intelligence to Dynamics AX with Cortana Intelligence suite
Exploring Azure Event Grid
Azure Infrastructure as a Service
9/19/2018 8:20 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
Welcome! Power BI User Group (PUG)
Overview of Azure Data Lake Store
Business Intelligence for Project Server/Online
BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -
Microsoft Connect /22/2018 9:50 PM
Accelerate Your Self-Service Data Analytics
Welcome! Power BI User Group (PUG)
Microsoft Connect /24/ :05 AM
TFS from on-prem to the cloud with Azure DevOps Services
Orchestration and data movement with Azure Data Factory v2
XtremeData on the Microsoft Azure Cloud Platform:
THR1171 Azure Data Integration: Choosing between SSIS, Azure Data Factory, and Azure Databricks Cathrine Wilhelmsen, | cathrinew.net.
Databricks: the new kid on the block
DAT381 Team Development with SQL Server 2005
Analytics in the Cloud using Microsoft Azure
Context about the Data Warehouse
Power BI with Analysis Services
5 Azure Services Every .NET Developer Needs to Know
Azure Machine Learning on Databricks
Introduction to Dataflows in Power BI
Orchestration and data movement with Azure Data Factory v2
Power BI – Introduction to Dataflows
Understanding Azure Data Engineering Options Finding Clarity in a Vast & Changing Landscape Cameron Snapp.
ETL Patterns in the Cloud with Azure Data Factory
Wimmer Solutions Team Justin Barbara Meg SQL and PowerBI Developer
Moving your on-prem data warehouse to cloud. What are your options?
Introduction to Azure Data Lake
Data Wrangling for ETL enthusiasts
Michael French Principal Consultant 5/18/2019
Beyond orchestration with Azure Data Factory
SQL Server 2019 Bringing Apache Spark to SQL Server
Get your data flowing with Data Flows! and...umm...dataflows.
Visual Data Flows – Azure Data Factory v2
Dimension Load Patterns with Azure Data Factory Data Flows
Visual Data Flows – Azure Data Factory v2
Architecture of modern data warehouse
Presentation transcript:

The Modern Data Warehouse and Azure Chris Seferlis – Sr CSA @ Microsoft

Who am I? Former CIO – 20+ years in IT US Army Veteran Wife and 2 Girls Outdoors person (Run, Hike, Cycle, Ski, Fish) Child Herder (aka Jr Soccer Coach) Social: @bizdataviz Video Feed Coming soon: Youtube.com/bizdataviz Questions: chris@bizdataviz.com

Why We’re Here: Modern Data Warehouse https://azure.microsoft.com/en-us/solutions/architecture/modern-data-warehouse/

And… Streaming & Big Data

And… Advanced Analytics

Decisions… decisions…

Traditional Method

1 2 3 4 5 6 7 Traditional Data Architecture for BI Programs Source Audit, Balance & Control Data Governance/Catalog/Dictionary Source Extract & Load Raw Data Store Transform Structure Semantic Layer Data Delivery 1 2 3 4 5 6 7 Source 1 Source 2 Source 3 Source 4 On-Prem SQL Server Source 5 Source 6 API Call SFTP SSIS Azure Sql DB Azure Sql DB Views SSAS Power BI Link to traditional data architecture https://www.delorabradish.com/modeling-for-bi/your-bi-blueprint-road-to-a-successful-bi-implementation Link to Azure data architecture https://www.delorabradish.com/modeling-for-bi/data-architecture-for-azure-bi-programs

Why Migrate to Azure? Flexibility Scale Offset Limited Local IT Resources Event Based File Ingestion Unstructured Data Large Data Volumes Near Real Time Requirements Data Science Capabilities Development Time to Production Support for large audiences Mobile Collaboration

Azure Function ABS Watcher Azure Data Architecture for BI Programs Subject area OLAP Model SFTP AI, ML Tools Logical Model + Metadata Dashboards Workbooks Reports API Calls Self-hosted Integration Runtime Azure Logic App SFTP File Watcher Data Pull or Push Temporary Store Multi-file Consolidation To Data Models Source Raw Data Store Transform & Load Enterprise Data Science Source 1 Source 3 Cloud On-Prem 4 Source 5 Source 2 Dimensional model Semantic Layer Delivery Azure Logic App & SQL Server Procedure event logging to Cosmos DB or Azure SQL Database Azure Function ABS Watcher Permanent Current File + Deltas (Separate New Update, Delete) Files Standardized Data Store Generate Current Version File Separate Delta Analyze Visualize Azure Blob Storage Databricks Azure Data Lake PolyBase t-SQL Spark Power BI 10 Unstructured Cosmos DB 8 9 Source 6 … Azure Data Factory Pipeline Ingestion “Orchestrators” PBI Logs Azure SQL DW Azure SQL DB Source 7 1 2 3 4 5 6 7 11 12 13 Link to traditional data architecture https://www.delorabradish.com/modeling-for-bi/your-bi-blueprint-road-to-a-successful-bi-implementation Link to Azure data architecture https://www.delorabradish.com/modeling-for-bi/data-architecture-for-azure-bi-programs

Azure Data Architecture ~ Traditional Comparison Subject area OLAP Model SFTP Dashboards Workbooks Reports API Calls Self-hosted Integration Runtime Azure Logic App SFTP File Watcher Data Pull or Push Temporary Store Multi-file Consolidation To Data Models Source Raw Data Store Transform & Load Enterprise Source 1 Source 3 Cloud On-Prem 4 Source 5 Source 2 Dimensional model Semantic Layer Delivery Azure Logic App & SQL Server Procedure event logging to Cosmos DB or Azure SQL Database Standardized Data Store Analyze Visualize Azure Blob Storage Databricks Azure Data Lake PolyBase t-SQL Spark Power BI 10 Cosmos DB 8 9 Source 6 … Azure Data Factory Pipeline Ingestion “Orchestrators” PBI Logs Azure SQL DW Azure SQL DB Source 7 1 2 3 4 5 6 7 11 12 13 SSIS SQL DB Tabular PBI Traditional 

Azure Function ABS Watcher Azure Data Architecture ~ Value Add Subject area OLAP Model SFTP AI, ML Tools Logical Model + Metadata Dashboards Workbooks Reports API Calls Self-hosted Integration Runtime Azure Logic App SFTP File Watcher Data Pull or Push Temporary Store Multi-file Consolidation To Data Models Source Raw Data Store Transform & Load Enterprise Data Science Source 1 Source 3 Cloud On-Prem 4 Source 5 Source 2 Dimensional model Semantic Layer Delivery Azure Logic App & SQL Server Procedure event logging to Cosmos DB or Azure SQL Database Azure Function ABS Watcher Permanent Current File + Deltas (Separate New Update, Delete) Files Standardized Data Store Generate Current Version File Separate Delta Analyze Visualize Azure Blob Storage Databricks Azure Data Lake PolyBase t-SQL Spark AAS Power BI 10 Unstructured Cosmos DB 8 9 Source 6 … Azure Data Factory Pipeline Ingestion “Orchestrators” PBI Logs Azure SQL DW Azure SQL DB Source 7 1 2 3 4 5 6 7 11 12 13 SSIS SQL DB Tabular PBI Traditional 

Do I need a data lake?

Azure Data Lake Storage Gen2 A “no-compromises” Data Lake: secure, performant, massively-scalable Data Lake storage that brings the cost and scale profile of object storage together with the performance and analytics feature set of data lake storage COST EFFECTIVE INTEGRATION READY SECURE MANAGEABLE FAST SCALABLE Support for fine-grained ACLs, protecting data at the file and folder level Multi-layered protection via at-rest Storage Service encryption and Azure Active Directory integration Automated Lifecycle Policy Management Object Level tiering Atomic file operations means jobs complete faster High throughput No limits on data store size Global footprint (50 regions) Object store pricing levels File system operations minimize transactions required for job completion Optimized for Spark and Hadoop Analytic Engines Tightly integrated with Azure end to end analytics solutions

Convergence of two Storage Services 10/9/2019 3:12 AM Convergence of two Storage Services Azure Blob Storage General Purpose Object Storage Azure Data Lake Store Optimized for Big Data analytics Global scale – All Azure regions Full BCDR capabilities Tiered - Hot/Cool/Archive Cost Efficient Large partner ecosystem Built for Hadoop Hierarchical namespace ACLs, AAD and RBAC Performance tuned for big data Very high scale capacity and throughput Azure Data Lake Storage Gen2 The best of Blobs and ADLS © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Azure Data Lake Storage Gen2 architecture Blob API ADLS API HIERARCHICAL FILE SYSTEM Performance Enhancements Scale and Cost Effectiveness Security Blob Storage Object Tiering and Lifecycle Policy Management AAD Integration, RBAC, Storage Account Security HA/DR support through ZRS and RA-GRS Data Governance and Management

Data processing

Azure Logic App ~ SFTP Listener Push from Source Source Data Pull or Push Azure Logic App SFTP File Watcher Raw Data Store Temporary Data Store SFTP File Watchers 2 3 Logic App SFTP File Watcher SFTP File Added or Changed Logic App Log Event SFTP File Found Azure Database Stored Proc Log File Found Logic App Log Event & Call ADF Pipeline Azure Data Factory SFTP Orchestrator Azure Blob Storage SFTP Source 5 Source 6 Event Hub Send Event Azure Blob Storage 1 2 3 Azure Data Factory

Azure Data Factory Orchestrator Scheduled Pull from Source (traditional SSIS) API Calls Self-hosted Integration Runtime Data Pull or Push Temporary Store Source Raw Data Store Source 1 Source 3 Cloud On-Prem 4 Source 2 Azure Blob Storage … Logic App Log Event After every activity!! Event Hub Send Event 2 ~ ADF Orchestrator 3 Azure Blob Storage Azure Database Stored Proc Get Start Date 1 2 3 Azure Data Factory Triggered Pipeline Update Run Date Copy Dataset Azure Data Factory

Azure Blob Storage ~ Preprocessing 1 2 3 4 No Deletes Needed Source Azure Data Factory Azure Blob Storage finalContainer Azure Function Unapproved Departments Must Delete Cleansed CSV File Raw Data Store Temporary Data Store 2b 2c Azure Blob Storage tempContainer Databricks or ADFgen2 Delete.py /or/ Pipeline Azure SQL Database Azure Function ABS Watcher 5a 5b 5c 5d 5e 6 If found Logic App Cosmos DB Logic App Data Factory Data Lake Store Databricks Full or incremental load parameter passed to ADL Orchestrator Azure Blob Storage Cosmos DB

Azure Data Factory Orchestrator Scheduled Pull from Source 1 2 3 4 No Preprocessing Needed Source Azure Data Factory Azure Blob Storage finalContainer Azure Function Raw Data Store Temporary Data Store Azure SQL Database Azure Function ABS Watcher 5a 5b 5c 5d 5e 6 If found Logic App Cosmos DB Logic App Data Factory Data Lake Store Databricks Full or incremental load parameter passed to ADL Orchestrator Azure Blob Storage Cosmos DB

Wash, Rinse, Repeat… Some ingestion method 1 2 3 4 Raw Data Store 1 2 3 4 Some ingestion method Azure Blob Storage finalContainer Azure Function Raw Data Store Temporary Data Store Azure Function ABS Watcher 5a 5b 5c 5d 5e 6 If found Logic App Cosmos DB Logic App Data Factory Data Lake Store Databricks Full or incremental load parameter passed to ADL Orchestrator Azure Blob Storage Cosmos DB

Azure Function ABS Watcher Azure Data Lake Ingestion For all Sources Temporary Data Store Raw Data Store Generate Current Version File + Separate Delta Files Transform & Load Current File + Deltas (Separate New Update, Delete) Files Standardized Data Store 3 4 – ABS File Watcher (Root Container) 5 6 Azure Blob Azure Function ABS File Added or Changed Logic App Log Event ABS File Found Logic App Log Event & Call ADF Pipeline Azure Data Factory ADL Orchestrator Azure Data Lake Store Azure Function ABS Watcher Event Hub Send Event Azure Blob Storage Data Bricks Azure Data Lake 3 4 5 6 Azure Data Factory

Decision Point… ADF vs SPs vs Databricks Visual Designer Great Orchestration Data Flows for Transformation SPs Easy Lift and Shift Lots of Resources Standard SQL Code Databricks Granular control Spark Engine Flexible File Capabilities

Azure Data Factory Orchestrator ADL Orchestrator Pipeline Ingestion Pipeline AsIs Pipeline PySpark Create row-level checksum Create delta files Create AsIs Files All ADF Metadata Logging Logic App Log Event Success Failure Event Hub Send Event or Azure Data Lake Store Separate New, Changed & Deleted Files Single “AsIs” Current File Source For Azure Blob One Orchestrator Pipeline For all Sources

Decision Point… ASDB vs ASDW Symmetric Multi-Processing Transactional Data Database < 1TB Massively Parallel Processing Analytical Data Database > 1TB https://www.blue-granite.com/blog/is-azure-sql-data-warehouse-a-good-fit-updated

Azure Data Warehouse Ingestion For all Sources Current File + Deltas (Separate New Update, Delete) Files Standardized Data Store Transform & Load Enterprise Data Store Multi-file Consolidation To Data Models 3NF Schema Subject area specific integrated Data Hub With historical tracking OLAP Schema 6 7 8 Azure Data Lake Store Azure Data Factory Orchestrator Execute series of Stored Procedures Azure SQL Data Warehouse External Tables Azure SQL Data Warehouse 3NF Tables Event Hub Send Event Azure SQL Data Warehouse Logging Tables 8 9 Azure SQL DB or ADW Azure Data Lake PolyBase t-SQL 6 7 8 and/or 9 Azure Data Factory

Is that all?

What about… Data Quality? Master Data Management? Data Catalog? Data Glossary?

Cloud Tools Tool Purpose 1 Azure Logic Apps SFTP "watcher“ Event logging Blob storage and data lake delete methodologies Notifications Automatic emails Cosmos DB document upload and deletions 2 Azure Function Azure Blob Storage "listener" 3 Azure Event Hub event handling 4 Azure Blob Storage temporary work space 5 Azure Data Factory Process flow orchestrators Data copy QA methodologies

Cloud Tools (continued) Purpose 6 Databricks Data processing and write to Azure Data Lake Other pre-processing data requirements 7 Azure Data Lake Delta files -- change data capture at the file level Current “AsIs” files Data science self-service Power BI self-service 8 Cosmos DB SQL API Logging ELT metadata 9 Azure Key Vault Supports Dev/QA/Prod Migration

Cloud Tools (continued) Purpose 10 Azure SQL Database ELT metadata 11 Azure SQL Data Warehouse Both Inmon and Kimball data stores (loosely speaking) 12 Azure Analysis Services Tabular semantic layer 13 Power BI Reporting and self-service 14 Azure Data Catalog Data catalog for reports, sources, etc 15 Master Data Management 16 Data Quality Tool

Development Tools Tool Purpose 1 Visual Studio Python project Auto generate the file-level metadata for complete file ingestion to Azure Data Lake 2 Visual Studio Azure Data Warehouse project Team Foundation Server source code control for Azure Data Warehouses 3 Visual Studio Logic App Project Team Foundation Server or GIT source code control for Azure Logic Apps 4 Visual Studio Database Project Team Foundation Server or GIT source code control for Azure SQL Databases 5 Github or Azure DevOps Source code control for Azure Data Factory and Databricks