Download presentation
Presentation is loading. Please wait.
1
Azure Data Factory v2: What’s new?
2
SQLSat Kyiv Team Denis Reznik Yevhen Nedashkivskyi Oksana Borysenko
Eugene Polonichko Denis Reznik Mykola Pobyivovk Oksana Tkach
3
Sponsor Sessions Starts at 13:00
Don’t miss them, they might be providing some interesting and valuable information! Congress Hall DevArt Conference Hall Simplement Room AC DB Best Predslava1 Intapp NULL means no session in that room at that time
4
Sponsors
5
Session will begin very soon :)
Please complete the evaluation form from your pocket after the session. Your feedback will help us to improve future conferences and speakers will appreciate your feedback! Enjoy the conference!
6
About me Eugene Polonychko, Chapter Pass SQL Server User Group Over 6 years experience of software development, mostly focused on data. Have designed and implemented data warehouses using custom coding as well as with ETL tools. Experience developing front end applications, BI reporting and database administration. Have worked with MS SQL, MySQL and other databases. Social network: Hello. Thank you for joining me today. My name is Eugene Polonychko. I’m here today to talk to you about Azure Data Factory. But before I begin my presentation I want to tell you about myself. I’m from Ukraine. I’m DWH\BI architect and MCP. I have a more than 6-year experience of software development, mostly focused on BI. During my career I’ve designed and implemented BI solutions using Microsoft BI stack, Oracle and other technOlogies. You can connect with me on social media, the links to my twitter and LinkedIn accounts are in the slide. I’ve always been interested in ETL. That’s why I’m going to to talk about cloud ETL.
7
What are we going to talk about?
What is Azure Data Factory? Concepts Dataset Pipeline Linked Services New in Data Factory v2 Trigger Control flow SSIS I’ve divided my talk into 4 sections. First, I’ll tell YOU about what ADF is Second, I’ll explain some important concepts of ADF including dataset, pipeline and linked services Third, we’ll look at the difference between SSIS and ADF Finally, I’ll describe monitoring for this technology. I’ll conclude with Q&A session and I’ll be glad to answer your questions at the end of the talk.
8
What is Azure Data Factory?
Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. Okay, first, I’m going to give you an idea about what Azure Data Factory is . It’s a cloud data integration service . Why has Microsoft created it? Because we needed a tool which helps to import data from one cloud data source to another. So data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. Azure Data Factory itself does not store any data. It lets you create data-driven flows to orchestrate movement of data between supported data stores and processing of data using computed services in other regions or in an on-premises environment.
9
What is Azure Data Factory?
Look at this scheme. Using Azure Data Factory, you can create and schedule data-driven workflows that can ingest data from disparate data stores, process/transform the data by using compute services such as Azure HDInsight Hadoop, Spark, Azure Data Lake Analytics, and Azure Machine Learning, and publish output data to data stores such as Azure SQL Data Warehouse for business intelligence (BI) applications to consume. Now when you have an idea what ADF is we can move on to the main concepts.
10
Concepts Data Source Dataset Pipeline is a grouping of logically related activities. It is used to group activities into a unit that performs a task Concepts Linked services computing environment Activities define the actions to perform on your data. Each activity takes zero or more datasets as inputs and produces one or more datasets as output. Activity We have four main concepts. First Dataset is data source Second. Activity. It’s actions to perform on your data Third Pipeline. It’s a grouping of logically related activities And finally. Linked services are computing environment or external resource. For example It’ s Hive, Machine Learning, stored procedure Let’s look at these concepts in more detail
11
Expressions & Parameters
String functions – concat, substring, replace, indexof etc. Collection functions – length, union, first, last etc. Logic functions – equals, less than, greater than, and, or, not etc. Conversation functions – coalesce, xpath, array, int, string, json etc. Math functions – add, sub, div, mod, min, max etc. Date functions – utcnow, addminutes, addhours, format etc.
12
System variables Pipeline scope Trigger scope Variable Name
@pipeline().DataFactory @pipeline().Pipeline @pipeline().RunId @pipeline().TriggerType @pipeline().TriggerId @pipeline().TriggerName @pipeline().TriggerTime Variable Name trigger().scheduledTime trigger().startTime
13
Development Create ADF Objects and Deploy to ADFv2 .net using .net
PowerShell: Create ADF Objects and Deploy to ADFv2 Edit & PowerShell: Create ADF Objects per copy and paste and Deploy json artefacts using Powershel
14
Trigger
15
Type of triggers Manual execution
Schedule trigger: A trigger that invokes a pipeline on a wall-clock schedule. Tumbling window trigger: A trigger that operates on a periodic interval, while also retaining state.
16
Schedule trigger A schedule trigger runs pipelines on a wall-clock schedule. This trigger supports periodic and advanced calendar options. For example, the trigger supports intervals like "weekly" or "Monday at 5:00 PM and Thursday at 9:00 PM."
17
Tumbling window trigger
Tumbling window triggers are a type of trigger that fires at a periodic time interval from a specified start time, while retaining state. Tumbling windows are a series of fixed-sized, non-overlapping, and contiguous time intervals.
18
Control flow
19
Control flow Filter activity ForEach activity Execute Pipeline
Get metadata If Condition activity Web activity Lookup activity Wait activity Until activity
20
Branching On success On failure On completion On skip
21
DEMO
22
SSIS
23
Integration Runtime Azure Self-hosted Azure-SSIS
24
Integration Runtime
25
Azure integration runtime
An Azure integration runtime is capable of: Running copy activity between cloud data stores Dispatching the following transform activities in public network: HDInsight Hive activity, HDInsight Pig activity, HDInsight MapReduce activity, HDInsight Spark activity, HDInsight Streaming activity, Machine Learning Batch Execution activity, Machine Learning Update Resource activities, Stored Procedure activity, Data Lake Analytics U-SQL activity, .Net custom activity, Web activity, Lookup activity Get Metadata activity.
26
Self-hosted integration runtime
An Azure integration runtime is capable of: Running copy activity between a cloud data stores and a data store in private network Dispatching the following transform activities against compute resources in On-Premise or Azure Virtual Network:
27
Azure-SSIS Integration Runtime
To lift and shift existing SSIS workload, you can create an Azure-SSIS IR to natively execute SSIS packages.
28
DEMO
29
Do you have any questions?
30
Sponsors
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.