Visual Data Flows – Azure Data Factory v2

Slides:



Advertisements
Similar presentations
Technical BI Project Lifecycle
Advertisements

Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
An Introduction to HDInsight June 27 th,
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
Andy Roberts Data Architect
Data Warehousing The Easy Way with AWS Redshift
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
4/19/ :02 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
4/18/2018 6:56 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
4/18/2018 3:49 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Data Platform and Analytics Foundational Training
Working With Azure Batch AI
Using a Gateway to Leverage On-Premises data in Power BI
Microsoft Ignite /11/2018 1:18 AM BRK4017
ADF & SSIS: New Capabilities for Data Integration in the Cloud
Microsoft Ignite /22/2018 3:27 PM BRK2121
Introduction to R Programming with AzureML
Extensible Platform Microsoft Dynamics 365
PowerApps and Microsoft Flow for Business Users
9/6/2018 7:14 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
IBM DATASTAGE online Training at GoLogica
Add intelligence to Dynamics AX with Cortana Intelligence suite
Exploring Azure Event Grid
A developers guide to Azure SQL Data Warehouse
Microsoft Build /20/2018 5:17 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Modeling and Analytics Features Coming in Analysis Services vNext
Enterprise security for big data solutions on Azure HDInsight
Analytics for Apps: Landing and Loading Data into SQL Data Warehouse
Populating a Data Warehouse
Populating a Data Warehouse
BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -
Matt Masson Software Development Engineer Microsoft Corporation
A developers guide to Azure SQL Data Warehouse
Accelerate Your Self-Service Data Analytics
Azure SQL DWH: Tips and Tricks for developers
Microsoft Connect /24/ :05 AM
Data Science 101 To Production
Populating a Data Warehouse
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Orchestration and data movement with Azure Data Factory v2
Populating a Data Warehouse
Modern cloud PaaS for mobile apps, web sites, API's and business logic apps
Microsoft Virtual Academy
THR1171 Azure Data Integration: Choosing between SSIS, Azure Data Factory, and Azure Databricks Cathrine Wilhelmsen, | cathrinew.net.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Azure SQL DWH: Tips and Tricks for developers
Power BI with Analysis Services
Common Data Service Data Integrator
Azure SQL DWH: Tips and Tricks for developers
Introduction to Dataflows in Power BI
Orchestration and data movement with Azure Data Factory v2
Introducing Power BI dataflows
Power BI – Introduction to Dataflows
Understanding Azure Data Engineering Options Finding Clarity in a Vast & Changing Landscape Cameron Snapp.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
SSIS Data Integration Data Warehouse Acceleration
ETL Patterns in the Cloud with Azure Data Factory
Mark Quirk Head of Technology Developer & Platform Group
Deep Dive Into SSIS in ADF
Data Wrangling for ETL enthusiasts
Michael French Principal Consultant 5/18/2019
Beyond orchestration with Azure Data Factory
SQL Server 2019 Bringing Apache Spark to SQL Server
Get your data flowing with Data Flows! and...umm...dataflows.
Dimension Load Patterns with Azure Data Factory Data Flows
Visual Data Flows – Azure Data Factory v2
Architecture of modern data warehouse
Presentation transcript:

Visual Data Flows – Azure Data Factory v2 Ted Malone Solution Architect – Data and AI Microsoft @tedmalone Visual Data Flows – Azure Data Factory v2

About me Microsoft Solution Architect – Data and AI, Customer Success Unit AI, Machine Learning, Big Data, Advanced Analytics, Data Warehousing, etc. Long-time SQL Server geek (1st version installed on OS/2 1.1) Previous Visual Studio Team System MVP http://aka.ms/tedmalone http://blog.sqltrainer.com

Introduction to Azure Data Factory Mapping Data Flows

What is ADF Mapping Data Flow? 11/21/2019 10:50 AM What is ADF Mapping Data Flow? Data Flow is a new feature of Azure Data Factory that allows you to build data transformations in a visual user interface Transform Data, At Scale, in the Cloud, Zero-Code Cloud-first, scale-out ELT Code-free dataflow pipelines Serverless scale-out transformation execution engine Maximum Productivity for Data Engineers Does NOT require understanding of Spark / Scala / Python / Java Resilient Data Transformation Flows Built for big data scenarios with unstructured data requirements Operationalize with Data Factory scheduling, control flow and monitoring © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Visual Data Flow Key Tenets 11/21/2019 10:50 AM Visual Data Flow Key Tenets Visual “Data Flow Builder” / “Data Mapping” Extensible through scripting and expressions Data Flow can be embedded into ISV / SaaS apps Embed UI Embed Parameterize Data Flows A graphical UI for building data transformation routines on Spark Built for resiliency and operationalized environments © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Code-free Data Transformation At Scale Does not require understanding of Spark, Big Data Execution Engines, Clusters, Scala, Python … Focus on building business logic and data transformation Data cleansing Aggregation Data conversions Data prep Data exploration … not …

ADF Data Flow Workstream Stage Data in Azure (ADLS, Blob, SQL DB/DW) Transform Data in Visual Data Flow Land Data in Azure Staging Area (ADLS, Blob, SQL DB/DW)

Demonstration(s)

ADF Pipeline Execution of a Data Flow Activity Design code-free ETL workflows Copy data from on- prem, other clouds and Azure Stage data for transformation Build visual data transformations Schedule triggers for your pipeline execution Monitor processes and configure alerts All within ADF

Simple Copy Flow ADF Data Flow is a guided construction process Begin by defining the Datasets for your Source and Sink Add Transformations to each node in your data flow Or simply copy from source to sink with no transformation Map columns and fields along the way

Slowly Changing Dimension Scenario Common DW pattern to manage changing attributes to dimension members Graphically build code-free SCD ETL pattern to load your data warehouse Connect directly to Azure SQL DB and Azure SQL DW Use Lookup, Surrogate Key, Derived Column and Select transforms

Load Star Schema DW Scenario Classic ETL pattern is easy to build in ADF’s code-free Data Flow visual data transformation environment Add Aggregate transforms to produce calculations that you store in your analytical database schema Use Join transform to combine data from multiple data sources and data streams inside your data flow Land your data in your Lake folders or direct to Azure SQL DW

Data Lake Data Science Scenario ADF supports building visual data transformations against your data directly in Data Lake locations (i.e. Azure Blob Store, Azure Data Lake Store) Built-in handling of schema drift for frequent changes in data lake file formats, columns, and data types Perform data exploration and data profiling across your data lake in ADF Data Flow win interactive debug data preview

ADF Data Flow Roadmap New Primitive Transformations Data Type Conversion Distinct Rows Fuzzy Lookups Hierarchical Transformations Higher-Order Transformations Data Cleansing Data Quality checks against external providers Flowlets (reusable components) Wrangling Data Flow For data prep and data exploration Misc Fuzzy Matching on Sink Flowlets for component reuse Expression Parameter built-in support New Primitive Transformations Data Type Conversion: Today, users convert types in the Source transformation and with expressions in a derived column. With a data type converter transformation, users will be able to visually configure type conversions in a single transformation Distinct Rows: Deduping and distinct is possible today in ADF Data Flow, but requires use of the Aggregate transformation and self-joins. A single distinct transformation will make this much easier. Fuzzy Lookups: Match rows based upon algorithmic matching patterns We are building in full support for hierarchical data transformations including JSON and CosmosDB support Higher-Order Transformations Data Cleansing: Beyond the basic transformation building blocks, we’ll build larger capabilities around data cleansing Data Quality checks against external providers: We are looking to add capabilities to lookup business entity data against 3rd party providers for data quality projects Flowlets (reusable components) – Data Flow templates Wrangling Data Flow For data prep and data exploration (Power Query online in ADF)

Resources Tutorial Videos: http://aka.ms/dataflowvideos Patterns http://kromerbigdata.com http://mssqldude.wordpress.com http://aka.ms/dataflowpatterns Documentation: https://docs.microsoft.com/en-us/azure/data- factory/concepts-data-flow-overview Expression Language: http://aka.ms/dataflowexpressions

What’s next? Kalen Delaney on Monday - https://bit.ly/2lORt7F Pinal Dave, September 27 - https://bit.ly/2lORdW6 Precon 2020, September 4 – with Brent Ozar SQL Saturday 2020, September 5