ETL Patterns in the Cloud with Azure Data Factory

Slides:



Advertisements
Similar presentations
Platinum Sponsors Titanium Sponsors. ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools.
Advertisements

SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
AZ PASS User Group Azure Data Factory Overview Josh Sivey, Solution Partner October
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
BUILD BIG DATA ENTERPRISE SOLUTIONS FASTER ON AZURE HDINSIGHT
1/27/2018 5:13 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
MICROSOFT AZURE ISV PROFILE: BMC SOFTWARE
4/18/2018 6:56 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Business Continuity & Disaster Recovery
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Data Platform and Analytics Foundational Training
Smart Building Solution
Examine information management in Cortana Intelligence
Creating Enterprise Grade BI Models with Azure Analysis Services
Working With Azure Batch AI
Orchestrating Data and Services with Azure Data Factory
ADF & SSIS: New Capabilities for Data Integration in the Cloud
Smart Building Solution
Welcome! Power BI User Group (PUG)
SQL Server Data Tools for Visual Studio Part I: Core SQL Server Tools
9/6/2018 7:14 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Exploring Azure Event Grid
A developers guide to Azure SQL Data Warehouse
Modeling and Analytics Features Coming in Analysis Services vNext
Enterprise security for big data solutions on Azure HDInsight
Welcome! Power BI User Group (PUG)
Azure Container Service - the most open container orchestration service yet Saurya Das Program Manager.
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.
Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.
Utilizing the Capabilities of Microsoft Azure, Skipper Offers a Results-Based Platform That Helps Digital Advertisers with the Marketing of Their Mobile.
BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -
A developers guide to Azure SQL Data Warehouse
Accelerate Your Self-Service Data Analytics
Welcome! Power BI User Group (PUG)
Microsoft Connect /24/ :05 AM
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Orchestration and data movement with Azure Data Factory v2
SSIS in the Cloud Integration Runtime in Azure Data Factory V2
Appcelerator Arrow: Build APIs in Minutes. Connect to Any Data Source
Modern cloud PaaS for mobile apps, web sites, API's and business logic apps
Microsoft Virtual Academy
THR1171 Azure Data Integration: Choosing between SSIS, Azure Data Factory, and Azure Databricks Cathrine Wilhelmsen, | cathrinew.net.
1/3/2019 9:40 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Azure Data Factory – Preview of V2
Serverless Architecture in the Cloud
2/19/2019 9:06 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Power BI with Analysis Services
What’s New and What’s Coming…
Azure Machine Learning on Databricks
Orchestration and data movement with Azure Data Factory v2
Windows Azure Hybrid Architectures and Patterns
Understanding Azure Data Engineering Options Finding Clarity in a Vast & Changing Landscape Cameron Snapp.
Zendos Tecnologia Utilizes the Powerful, Scalable
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Deep Dive Into SSIS in ADF
Data Wrangling for ETL enthusiasts
Michael French Principal Consultant 5/18/2019
Beyond orchestration with Azure Data Factory
Continuous Integration and Delivery (CI/CD) in Azure Data Factory
SQL Server 2019 Bringing Apache Spark to SQL Server
Get your data flowing with Data Flows! and...umm...dataflows.
Visual Data Flows – Azure Data Factory v2
Dimension Load Patterns with Azure Data Factory Data Flows
Visual Data Flows – Azure Data Factory v2
Architecture of modern data warehouse
Presentation transcript:

ETL Patterns in the Cloud with Azure Data Factory Mark Kromer Senior Program Manager Microsoft Azure Data Management @kromerbigdata

ETL Patterns in the Cloud Important factors for success What is ETL? More than Extract, Transform, Load Scheduling, Monitoring, Maintenance, Source Control, CI/CD, Operationalize Platform as a Service (ADF) vs. Infrastructure as a Service (IaaS/SSIS) Self-managed vs. Provider-Managed ELT or ETL? Difference is primarily highly-parsed semantics However: In the cloud, common pattern == stage data in low-cost, inexpensive storage Not typically performant to process data in-flight Particularly crossing boundaries (on-prem, vnets, data centers, regions) Scale is very important in Cloud ETL Cloud projects assume elastic scale. ETL is not immune to this expectation. Flexible Schema is very important in Cloud ETL Assume “Big Data tenets” aka “data chaos”: Your data sources will change shape, size and volume. Often!

Cloud ETL Patterns with ADF

Easy-to-use Wizard for Copying Data at Scale 6/1/2019 6:40 AM Easy-to-use Wizard for Copying Data at Scale © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Nightly ETL Data Loads Codefree Design code-free ETL workflows Copy data from on- prem, other clouds and Azure Stage data for transformation Build visual data transformations Schedule triggers for your pipeline execution Monitor processes and configure alerts All within ADF

Build Resilient Data Flows with Schema Drift Handling of Flexible Schemas

Slowly Changing Dimension Scenario Common DW pattern to manage changing attributes to dimension members Graphically build code-free SCD ETL pattern to load your data warehouse Connect directly to Azure SQL DB and Azure SQL DW Use Lookup, Surrogate Key, Derived Column and Select transforms

Load Star Schema DW Scenario Classic ETL pattern is easy to build in ADF’s code-free Data Flow visual data transformation environment Add Aggregate transforms to produce calculations that you store in your analytical database schema Use Join transform to combine data from multiple data sources and data streams inside your data flow Land your data in your Lake folders or direct to Azure SQL DW

Data Lake Data Science Scenario ADF supports building visual data transformations against your data directly in Data Lake locations (i.e. Azure Blob Store, Azure Data Lake Store) Built-in handling of schema drift for frequent changes in data lake file formats, columns, and data types Perform data exploration and data profiling across your data lake in ADF Data Flow win interactive debug data preview

Azure Data Factory Workflow Data Pipelines/Control Flow

Conditional execution 6/1/2019 6:40 AM Incremental Delta Data Copy Conditional execution If-Then, Lookup, Execute Pipeline Connection Managers © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Built-in source control support 6/1/2019 6:40 AM Built-in source control support © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Operationalize – Monitor your data pipelines 6/1/2019 6:40 AM Operationalize – Monitor your data pipelines © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Use Templates to quickly get started Quickly get started with building data integration solutions. Avoid building same workflows repeatedly. Simply instantiate a template. Improve developer productivity along with reducing development time for repeat processes.

ADF Integration Runtime Activity Dispatch/Monitor Data Movement SSIS Package Execution

Azure Data Factory Service Command & Control 6/1/2019 6:40 AM Data Flow UX & SDK Authoring | Monitoring/Mgmt Azure Cloud Azure Data Factory Service Scheduling | Orchestration | Monitoring PaaS Cloud Host Integration Runtime Installable Agent Integration Runtime Cloud Apps, Svcs & Data On Premises Apps & Data © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Customer 1 firewall border Azure Data Factory “Integration Runtime” deployed on premises for transformation and then moved to cloud Customer 1 Customer 1 firewall border “IR” ADF Foo On-prem

SSIS in ADF

Provision SSIS IR in ADF 6/1/2019 6:40 AM Provision SSIS IR in ADF © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6/1/2019 6:40 AM Deployment via SSMS on premises in Azure Once connected, you can deploy projects/packages to SSIS PaaS from your local file system/SSIS on premises © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

You can select some packages to execute on SSIS PaaS 6/1/2019 6:40 AM Execution via SSMS You can select some packages to execute on SSIS PaaS © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

You can see package execution error messages 6/1/2019 6:40 AM Monitoring via SSMS on premises in Azure You can see package execution error messages © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Execute SSIS Packaged in ADF Pipeline

ADF Mapping Data Flows

What is ADF Mapping Data Flow? 6/1/2019 6:40 AM What is ADF Mapping Data Flow? Data Flow is a new feature of Azure Data Factory that allows you to build data transformations in a visual user interface Transform Data, At Scale, in the Cloud, Zero-Code Cloud-first, scale-out ELT Code-free dataflow pipelines Serverless scale-out transformation execution engine Maximum Productivity for Data Engineers Does NOT require understanding of Spark / Scala / Python / Java Resilient Data Transformation Flows Built for big data scenarios with unstructured data requirements Operationalize with Data Factory scheduling, control flow and monitoring © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Code-free Data Transformation At Scale Does not require understanding of Spark, Big Data Execution Engines, Clusters, Scala, Python … Focus on building business logic and data transformation Data cleansing Aggregation Data conversions Data prep Data exploration … not …

Build your logical data flows adding data transformations in a guided experience

Microsoft Azure Data Factory Continues to Extend Data Flow Library with a Rich Set of Transformations and Expression Functions

Debug mode provides row-level context and visible results in inspector pane

Debug Data Flows with Data Preview and Data Sampling

Interactive Expression Builder – Build data transform expressions, not Spark code

Deep Monitoring Introspection of Data Transformations

Sponsors