Presentation is loading. Please wait.

Presentation is loading. Please wait.

ETL Patterns in the Cloud with Azure Data Factory

Similar presentations


Presentation on theme: "ETL Patterns in the Cloud with Azure Data Factory"— Presentation transcript:

1 ETL Patterns in the Cloud with Azure Data Factory
Mark Kromer Senior Program Manager Microsoft Azure Data Management @kromerbigdata

2 ETL Patterns in the Cloud Important factors for success
What is ETL? More than Extract, Transform, Load Scheduling, Monitoring, Maintenance, Source Control, CI/CD, Operationalize Platform as a Service (ADF) vs. Infrastructure as a Service (IaaS/SSIS) Self-managed vs. Provider-Managed ELT or ETL? Difference is primarily highly-parsed semantics However: In the cloud, common pattern == stage data in low-cost, inexpensive storage Not typically performant to process data in-flight Particularly crossing boundaries (on-prem, vnets, data centers, regions) Scale is very important in Cloud ETL Cloud projects assume elastic scale. ETL is not immune to this expectation. Flexible Schema is very important in Cloud ETL Assume “Big Data tenets” aka “data chaos”: Your data sources will change shape, size and volume. Often!

3 Cloud ETL Patterns with ADF

4 Easy-to-use Wizard for Copying Data at Scale
6/1/2019 6:40 AM Easy-to-use Wizard for Copying Data at Scale © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5 Nightly ETL Data Loads Codefree
Design code-free ETL workflows Copy data from on- prem, other clouds and Azure Stage data for transformation Build visual data transformations Schedule triggers for your pipeline execution Monitor processes and configure alerts All within ADF

6 Build Resilient Data Flows with Schema Drift Handling of Flexible Schemas

7 Slowly Changing Dimension Scenario
Common DW pattern to manage changing attributes to dimension members Graphically build code-free SCD ETL pattern to load your data warehouse Connect directly to Azure SQL DB and Azure SQL DW Use Lookup, Surrogate Key, Derived Column and Select transforms

8 Load Star Schema DW Scenario
Classic ETL pattern is easy to build in ADF’s code-free Data Flow visual data transformation environment Add Aggregate transforms to produce calculations that you store in your analytical database schema Use Join transform to combine data from multiple data sources and data streams inside your data flow Land your data in your Lake folders or direct to Azure SQL DW

9 Data Lake Data Science Scenario
ADF supports building visual data transformations against your data directly in Data Lake locations (i.e. Azure Blob Store, Azure Data Lake Store) Built-in handling of schema drift for frequent changes in data lake file formats, columns, and data types Perform data exploration and data profiling across your data lake in ADF Data Flow win interactive debug data preview

10 Azure Data Factory Workflow Data Pipelines/Control Flow

11 Conditional execution
6/1/2019 6:40 AM Incremental Delta Data Copy Conditional execution If-Then, Lookup, Execute Pipeline Connection Managers © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

12 Built-in source control support
6/1/2019 6:40 AM Built-in source control support © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

13 Operationalize – Monitor your data pipelines
6/1/2019 6:40 AM Operationalize – Monitor your data pipelines © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

14 Use Templates to quickly get started
Quickly get started with building data integration solutions. Avoid building same workflows repeatedly. Simply instantiate a template. Improve developer productivity along with reducing development time for repeat processes.

15 ADF Integration Runtime Activity Dispatch/Monitor Data Movement SSIS Package Execution

16 Azure Data Factory Service
Command & Control 6/1/2019 6:40 AM Data Flow UX & SDK Authoring | Monitoring/Mgmt Azure Cloud Azure Data Factory Service Scheduling | Orchestration | Monitoring PaaS Cloud Host Integration Runtime Installable Agent Integration Runtime Cloud Apps, Svcs & Data On Premises Apps & Data © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

17 Customer 1 firewall border
Azure Data Factory “Integration Runtime” deployed on premises for transformation and then moved to cloud Customer 1 Customer 1 firewall border “IR” ADF Foo On-prem

18 SSIS in ADF

19 Provision SSIS IR in ADF
6/1/2019 6:40 AM Provision SSIS IR in ADF © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

20 6/1/2019 6:40 AM Deployment via SSMS on premises in Azure Once connected, you can deploy projects/packages to SSIS PaaS from your local file system/SSIS on premises © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

21 You can select some packages to execute on SSIS PaaS
6/1/2019 6:40 AM Execution via SSMS You can select some packages to execute on SSIS PaaS © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

22 You can see package execution error messages
6/1/2019 6:40 AM Monitoring via SSMS on premises in Azure You can see package execution error messages © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

23 Execute SSIS Packaged in ADF Pipeline

24 ADF Mapping Data Flows

25 What is ADF Mapping Data Flow?
6/1/2019 6:40 AM What is ADF Mapping Data Flow? Data Flow is a new feature of Azure Data Factory that allows you to build data transformations in a visual user interface Transform Data, At Scale, in the Cloud, Zero-Code Cloud-first, scale-out ELT Code-free dataflow pipelines Serverless scale-out transformation execution engine Maximum Productivity for Data Engineers Does NOT require understanding of Spark / Scala / Python / Java Resilient Data Transformation Flows Built for big data scenarios with unstructured data requirements Operationalize with Data Factory scheduling, control flow and monitoring © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

26 Code-free Data Transformation At Scale
Does not require understanding of Spark, Big Data Execution Engines, Clusters, Scala, Python … Focus on building business logic and data transformation Data cleansing Aggregation Data conversions Data prep Data exploration … not …

27 Build your logical data flows adding data transformations in a guided experience

28 Microsoft Azure Data Factory Continues to Extend Data Flow Library with a Rich Set of Transformations and Expression Functions

29 Debug mode provides row-level context and visible results in inspector pane

30 Debug Data Flows with Data Preview and Data Sampling

31 Interactive Expression Builder – Build data transform expressions, not Spark code

32 Deep Monitoring Introspection of Data Transformations

33 Sponsors


Download ppt "ETL Patterns in the Cloud with Azure Data Factory"

Similar presentations


Ads by Google