BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -

Slides:



Advertisements
Similar presentations
Data Management Conference ETL In SQL Server 2008 Allan Mitchell London September 29th.
Advertisements

Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
ETL By Dr. Gabriel.
Data Management Console Synonym Editor
ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.
DAT 360: DTS in SQL Server 2000 Best Practices Euan Garden Group Manager, SQL Server Microsoft Corporation.
Building Enterprise Applications Using Visual Studio®
Microsoft Ignite /10/2018 3:38 AM
Data Platform and Analytics Foundational Training
5/15/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
6/2/ :21 AM THR2179 Integrating Microsoft Visio, PowerApps and Flow to create compelling online solutions David Parker Owner, bVisual Visio MVP ©
Antonio Abalos Castillo
Microsoft Ignite /11/2018 1:18 AM BRK4017
Chris Menegay Sr. Consultant TECHSYS Business Solutions
6/19/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
ADF & SSIS: New Capabilities for Data Integration in the Cloud
A time travel with temporal tables
7/22/2018 9:21 PM BRK3270 Building a Better Data Solution: Microsoft SQL Server and Azure Data Services Joey D’Antoni Principal Consultant Denny Cherry.
SQL Server for Java developers
Automate all things! Microsoft Azure continuous deployment
Microsoft Ignite /13/2018 7:38 PM BRK2247
Presented by: Warren Sifre
A developers guide to Azure SQL Data Warehouse
Entity Based Staging SQL Server 2012 Tyler Graham
Populating a Data Warehouse
Populating a Data Warehouse
Populating a Data Warehouse
Customize and Tune Microsoft Office 365 Data Loss Prevention
Building ETL/ELT Workloads with Azure Data Factory V2
A developers guide to Azure SQL Data Warehouse
TechEd /24/2018 6:19 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
Effective report authoring using Power BI Desktop
Populating a Data Warehouse
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Populating a Data Warehouse
Orchestration and data movement with Azure Data Factory v2
Populating a Data Warehouse
12/3/ :27 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Power-up NoSQL with Azure Cosmos DB
Learn. Imagine. Build. .NET Conf
12/25/2018 5:11 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
12/29/ :48 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
THR1171 Azure Data Integration: Choosing between SSIS, Azure Data Factory, and Azure Databricks Cathrine Wilhelmsen, | cathrinew.net.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Microsoft Dynamics.
What query folding means to self-service BI projects
Build data-driven collection and list apps using ListView in HTML5
ETL Automation using Biml
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Azure SQL DWH: Tips and Tricks for developers
Introduction to VSTS Database Professional
Building ETL/ELT Workloads with Azure Data Factory V2
4/11/2019 6:29 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
4/16/2019 2:13 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Orchestration and data movement with Azure Data Factory v2
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
SSIS Data Integration Data Warehouse Acceleration
ETL Patterns in the Cloud with Azure Data Factory
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Azure Data Factory V2: SSIS in the Cloud or Not?
9/8/ :03 PM © 2006 Microsoft Corporation. All rights reserved.
Michael French Principal Consultant 5/18/2019
Beyond orchestration with Azure Data Factory
Visual Data Flows – Azure Data Factory v2
Dimension Load Patterns with Azure Data Factory Data Flows
Visual Data Flows – Azure Data Factory v2
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix - @jasonhorner Cathrine Wilhelmsen, Inmeta - @cathrinew

Agenda Overview Design Patterns Preview of…?

Overview of Azure Data Factory

Azure Data Factory Sources Data Warehouse Analysis Reporting ETL / ELT

ETL / ELT Azure Data Factory Visual UI Drag and Drop Code Support Python, .NET, ARM Control Flow Loop, Branch, If SSIS Execution Lift and Shift

ETL / ELT

ETL - Extract Transform Load

ETL - Extract Transform Load

ETL - Extract Transform Load

ETL - Extract Transform Load

ETL - Extract Transform Load

ETL - Extract Transform Load

ELT - Extract Load Transform

ELT - Extract Load Transform

ELT - Extract Load Transform

ELT - Extract Load Transform

ELT - Extract Load Transform

ELT - Extract Load Transform

ETL ELT

Azure Data Factory Concepts

Azure Data Factory Concepts Pipelines Activities Triggers Linked Services Datasets Integration Runtime

Azure Data Factory Design Patterns

What are Design Patterns? Reusable solutions for common problems: Description or template Formalized best practices Not finished designs that can be transformed directly into source or machine code

Why use Design Patterns? Use tested, proven and documented solutions to: Speed up development Prevent issues than can cause problems later Improve code readability

Design Patterns Truncate and Load Merge Load Incremental Load Bulk Table Transfer

Full Extract: Truncate and Load Specific use cases: All data needed, but replication is not available Small data sets that change often No historical requirements Very simple, but can be considered an antipattern

Full Extract: Truncate and Load Source Sink Source Table Sink Table

Full Extract: Merge Load Specific use cases: All data needed, but replication is not available Medium data sets that have few changes Need to minimize churn on the staging tables Adds complexity, doesn’t solve the incremental extract from source

Full Extract: Merge Load Source Sink Source Table Table Type Stored Procedure Sink Table

Incremental Load Specific use cases: All data needed, including a robust history Large data sets that have many changes Need to minimize churn on the staging tables and load on source systems Often requires changes to the source system (triggers, added columns, or engine features)

Control Table (High Watermark) Incremental Load Source Sink Source Table Table Type Stored Procedure Change Table Change Tracking Current Version Control Table (High Watermark) Sink Table

Delta Detection Hash Comparison (Full Extract) High Watermark (Incremental Load) Change Tracking (Incremental Load) Other: column-by-column comparison, triggers, row versioning, modified dates, temporal tables

Delta Detection: High Watermark BE WARY of these approaches! Delta Detection: High Watermark Based on ascending integer or datetime Store the highest value in a control table or calculate by SELECT MAX(<Column>) FROM Table Based on ascending date Update or Create Assumes data is not updated and that the dates are maintained automatically

Delta Detection: Change Tracking Lightweight solution for tracking data changes: Has a row changed? Which rows have been changed? What kind of change was it? Which columns were changed? Only tracks the latest change to a row

Adds complexity, requires database tables to manage state Bulk Table Transfer Specific use cases: Hundreds to thousands of tables to copy Similar loading patterns for all tables Need to minimize amount of code in solution Adds complexity, requires database tables to manage state

Bulk Table Transfer Source Sink Source Table Table Type Stored Procedure Control Table List Sink Table Log Table

Auditing: Batches Every ETL Process should start by creating a Batch Batches are logical concepts used to tie multi-pipeline load processes together for Auditing and Logging A batch is closed when a nightly process is completed (Fail or Success)

Auditing: Common Columns CreatedDate - Date row was inserted CreatedBatchId - Batch that inserted row ModifiedDate - Date row was updated ModfiedBatchId - Batch that updated row IsDeleted - Indicates if record has been removed

Logging: Common Columns Row Counts - Selected, Inserted, Modified, Ignored ExecutionTime - Begin, End, Duration LoadStatus - Fail, Success

Demo: Solution Overview Jason Horner

Design Patterns: Key Take Aways Model your Metadata correctly Make composable single purpose Pipelines Leverage Parameters and User Properties Lookup, Foreach, and Metadata, activities are powerful Edit the JSON files directly when you hit a wall

Preview of…?

Azure Data Factory Data Flows

Azure Data Factory Data Flows ETL / ELT Visual Authoring Drag and Drop Azure Databricks No Code Transform At Scale Join, Split, Aggregate, Lookup, Filter, Sort, Derived Column

Azure Data Factory Data Flows ETL / ELT

Demo: Azure Data Factory Data Flows Cathrine Wilhelmsen

Cathrine Wilhelmsen, Inmeta Thank you! Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta jason@jasonhorner.com hi@cathrinew.net @jasonhorner @cathrinew

Please evaluate this session Your feedback is important to us! 11/22/2018 7:58 AM Please evaluate this session Your feedback is important to us! Please evaluate this session through MyEvaluations on the mobile app or website. Download the app: https://aka.ms/ignite.mobileApp Go to the website: https://myignite.techcommunity.microsoft.com/evaluations © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

11/22/2018 7:58 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.