Visual Data Flows – Azure Data Factory v2

Slides:



Advertisements
Similar presentations
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Advertisements

An Introduction to HDInsight June 27 th,
Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
Data Warehousing The Easy Way with AWS Redshift
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
BUILD BIG DATA ENTERPRISE SOLUTIONS FASTER ON AZURE HDINSIGHT
4/19/ :02 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
4/18/2018 6:56 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
4/18/2018 3:49 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Data Platform and Analytics Foundational Training
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Data Platform and Analytics Foundational Training
Build interactive data analysis environments using Apache Spark
Working With Azure Batch AI
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Using a Gateway to Leverage On-Premises data in Power BI
ADF & SSIS: New Capabilities for Data Integration in the Cloud
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Microsoft Ignite /22/2018 3:27 PM BRK2121
Introduction to R Programming with AzureML
Extensible Platform Microsoft Dynamics 365
9/6/2018 7:14 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Create and publish reports with Power BI for desktop
IBM DATASTAGE online Training at GoLogica
Add intelligence to Dynamics AX with Cortana Intelligence suite
Exploring Azure Event Grid
A developers guide to Azure SQL Data Warehouse
Microsoft Build /20/2018 5:17 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Modeling and Analytics Features Coming in Analysis Services vNext
Enterprise security for big data solutions on Azure HDInsight
Analytics for Apps: Landing and Loading Data into SQL Data Warehouse
Populating a Data Warehouse
Populating a Data Warehouse
BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -
Matt Masson Software Development Engineer Microsoft Corporation
A developers guide to Azure SQL Data Warehouse
Accelerate Your Self-Service Data Analytics
Azure SQL DWH: Tips and Tricks for developers
Microsoft Connect /24/ :05 AM
Data Science 101 To Production
Populating a Data Warehouse
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Orchestration and data movement with Azure Data Factory v2
Modern cloud PaaS for mobile apps, web sites, API's and business logic apps
Microsoft Virtual Academy
THR1171 Azure Data Integration: Choosing between SSIS, Azure Data Factory, and Azure Databricks Cathrine Wilhelmsen, | cathrinew.net.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Azure SQL DWH: Tips and Tricks for developers
Power BI with Analysis Services
Azure SQL DWH: Tips and Tricks for developers
Introduction to Dataflows in Power BI
Orchestration and data movement with Azure Data Factory v2
Introducing Power BI dataflows
Power BI – Introduction to Dataflows
Understanding Azure Data Engineering Options Finding Clarity in a Vast & Changing Landscape Cameron Snapp.
Playing with (M)agic: Introduction to Writing M Code in Power BI
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
ETL Patterns in the Cloud with Azure Data Factory
Mark Quirk Head of Technology Developer & Platform Group
Deep Dive Into SSIS in ADF
Data Wrangling for ETL enthusiasts
Michael French Principal Consultant 5/18/2019
Beyond orchestration with Azure Data Factory
Get your data flowing with Data Flows! and...umm...dataflows.
Visual Data Flows – Azure Data Factory v2
Dimension Load Patterns with Azure Data Factory Data Flows
Architecture of modern data warehouse
Presentation transcript:

Visual Data Flows – Azure Data Factory v2 Ted Malone Solution Architect – Data and AI Microsoft @tedmalone Visual Data Flows – Azure Data Factory v2

Our sponsors

About me Microsoft Solution Architect – Data and AI, Customer Success Unit AI, Machine Learning, Big Data, Advanced Analytics, Data Warehousing, etc. Long-time SQL Server geek (1st version installed on OS/2 1.1) http://aka.ms/tedmalone http://blog.sqltrainer.com

Introduction to Azure Data Factory Mapping Data Flows

What is ADF Mapping Data Flow? 11/28/2019 1:04 PM What is ADF Mapping Data Flow? Data Flow is a new feature of Azure Data Factory that allows you to build data transformations in a visual user interface Transform Data, At Scale, in the Cloud, Zero-Code Cloud-first, scale-out ELT Code-free dataflow pipelines Serverless scale-out transformation execution engine Maximum Productivity for Data Engineers Does NOT require understanding of Spark / Scala / Python / Java Resilient Data Transformation Flows Built for big data scenarios with unstructured data requirements Operationalize with Data Factory scheduling, control flow and monitoring © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Visual Data Flow Key Tenets 11/28/2019 1:04 PM Visual Data Flow Key Tenets Visual “Data Flow Builder” / “Data Mapping” Extensible through scripting and expressions Data Flow can be embedded into ISV / SaaS apps Embed UI Embed Parameterize Data Flows A graphical UI for building data transformation routines on Spark Built for resiliency and operationalized environments © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Code-free Data Transformation At Scale Does not require understanding of Spark, Big Data Execution Engines, Clusters, Scala, Python … Focus on building business logic and data transformation Data cleansing Aggregation Data conversions Data prep Data exploration … not …

ADF Data Flow Workstream Stage Data in Azure (ADLS, Blob, SQL DB/DW) Transform Data in Visual Data Flow Land Data in Azure Staging Area (ADLS, Blob, SQL DB/DW)

Data Flow Limited Preview Support & SLAs Azure SLAs are NA for preview services (private or public preview) until GA of the service. Limited Preview Support Handled directly with the Azure Engineering team via adfdataflowext@microsoft.com. Sign-up for ADF Data Flow service **** http://aka.ms/dataflowpreview Microsoft Azure must whitelist your subscription ID to turn on the feature for you Public Preview Support Normal Azure customer service channels

Demonstration(s)

ADF Pipeline Execution of a Data Flow Activity Design code-free ETL workflows Copy data from on- prem, other clouds and Azure Stage data for transformation Build visual data transformations Schedule triggers for your pipeline execution Monitor processes and configure alerts All within ADF

Simple Copy Flow ADF Data Flow is a guided construction process Begin by defining the Datasets for your Source and Sink Add Transformations to each node in your data flow Or simply copy from source to sink with no transformation Map columns and fields along the way

Slowly Changing Dimension Scenario Common DW pattern to manage changing attributes to dimension members Graphically build code-free SCD ETL pattern to load your data warehouse Connect directly to Azure SQL DB and Azure SQL DW Use Lookup, Surrogate Key, Derived Column and Select transforms

Load Star Schema DW Scenario Classic ETL pattern is easy to build in ADF’s code-free Data Flow visual data transformation environment Add Aggregate transforms to produce calculations that you store in your analytical database schema Use Join transform to combine data from multiple data sources and data streams inside your data flow Land your data in your Lake folders or direct to Azure SQL DW

Data Lake Data Science Scenario ADF supports building visual data transformations against your data directly in Data Lake locations (i.e. Azure Blob Store, Azure Data Lake Store) Built-in handling of schema drift for frequent changes in data lake file formats, columns, and data types Perform data exploration and data profiling across your data lake in ADF Data Flow win interactive debug data preview

ADF Data Flow Roadmap New Primitive Transformations Data Type Conversion Distinct Rows Fuzzy Lookups Hierarchical Transformations Higher-Order Transformations Data Cleansing Data Quality checks against external providers Flowlets (reusable components) Wrangling Data Flow For data prep and data exploration Misc Fuzzy Matching on Sink Flowlets for component reuse Expression Parameter built-in support New Primitive Transformations Data Type Conversion: Today, users convert types in the Source transformation and with expressions in a derived column. With a data type converter transformation, users will be able to visually configure type conversions in a single transformation Distinct Rows: Deduping and distinct is possible today in ADF Data Flow, but requires use of the Aggregate transformation and self-joins. A single distinct transformation will make this much easier. Fuzzy Lookups: Match rows based upon algorithmic matching patterns We are building in full support for hierarchical data transformations including JSON and CosmosDB support Higher-Order Transformations Data Cleansing: Beyond the basic transformation building blocks, we’ll build larger capabilities around data cleansing Data Quality checks against external providers: We are looking to add capabilities to lookup business entity data against 3rd party providers for data quality projects Flowlets (reusable components) – Data Flow templates Wrangling Data Flow For data prep and data exploration (Power Query online in ADF)

Resources Tutorial Videos: http://aka.ms/dataflowvideos Patterns http://kromerbigdata.com http://mssqldude.wordpress.com http://aka.ms/dataflowpatterns Documentation: https://docs.microsoft.com/en-us/azure/data- factory/concepts-data-flow-overview Expression Language: http://aka.ms/dataflowexpressions