My Data Wandered Lonely As A Cloud: Azure Data Factory Julie Smith SQL Server MVP Innovative
My Data Flew amongst the Clouds quickly and without errors! 2
3 About Me Julie Smith IA Ambassador SQL Server MVP Datachix.com
4 INNOVATIVE ARCHITECTS #WorkSomeplaceAwesome
Foundation
Intro: What is Azure Data Factory? Azure Data Factory is a cloud service that orchestrates, manages, and monitors the integration and transformation of structured and unstructured data from on-premises and cloud sources at scale.
Data Integration in the Cloud Between other cloud services and On Prem Sources, Destinations, Transformations Like DTS Like SSIS 7
Where Portal.azure.com New>Data+Analytics>Data Factory
Three Main Elements Linked Services Datasets Pipeline Activities 9
Linked Services 10 Data Stores Data Gateways
Data Gateway for On Prem 11
Data Gateways & ADF 12 Supplies key Install Gateway on each On Prem resource (server, laptop, etc)
Data Gateways & ADF 13
Data Gateways & ADF 14
Data Stores Contain credentials and connection information for Sources and Destinations. An On Prem Data Store MUST reference a Data Gateway 15
After you set up the gateway, set up your linked services. They have to have a gateway if they are going to an on prem source or destination.
After you set up the gateway, set up your linked services. When you pick SQL Server (on prem). You HAVE to have a gateway:
Azure Data Stores Don’t Require Gateway
Types of connections, in context: 19
Once you have Linked Services Datasets (tables) Pipelines Activities 20
Author and Deploy 21
Author and Deploy 22
Diagram 23
Datasets (Tables) 24
JSON pronounced Jay-Sahn JavaScript Object Notation
JSON JSON is built on two structures: A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array. { } An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence. [ ] JavaScript Object Notation
JSON in ADF { "name": "OnPremActorSrce", "properties": { "published": false, "type": "SqlServerTable", "linkedServiceName": "NorthWindStg", "typeProperties": { "tableName": "Actor" }, "availability": { "frequency": "Day", "interval": 1 }, "policy": { "externalData": { "retryInterval": "00:01:00", "retryTimeout": "00:10:00", "maximumRetry": 3 }
JSON specific to ADF us/library/azure/dn aspx
Pipelines Activities Copy 29
Weird Things One Data Factory. So your diagram gets messy. Goes against SSIS best practices of one package per destination. Scheduling is clumsy. Pipeline and destination have to be in sync in their availability.
One Downloadable Demo Points
Within the pipeline: Activities
Slices Each unit of data consumed and produced by an activity run is called a data slice. They have StartTime and EndTime and those are accessible to the pipeline activity via ADF System Variables: "sqlReaderQuery": "$$Text.Format('select * from MyTable where timestampcolumn >= \\'{0:yyyy-MM-dd HH:mm}\\' AND timestampcolumn < \\'{1:yyyy-MM-dd HH:mm}\\'', WindowStart, WindowEnd)"
Scripting Reference As of July 16 th BEWARE
Visual Studio Extension Azure SDK 2.7 and above for Visual Studio 2013 You get templates You can reverse engineer You can connect to your factory and deploy from VS Came out JULY 22, 2015
Visual Studio Extension
Customer Case Studies factory-customer-case-studies/ factory-customer-case-studies/
WHY
Data Management Gateway Configuration Manager Instructions on use: us/documentation/articles/data-factory-move-data-between-onprem-and- cloud/#using-the-data-gateway-step-by-step-walkthroughhttps://azure.microsoft.com/en- us/documentation/articles/data-factory-move-data-between-onprem-and- cloud/#using-the-data-gateway-step-by-step-walkthrough For on prem machines. Load the Gateway on the machine. Then go to the Azure Data Factory. Create the Linked Service Gateway there. Get the key from the ADF linked service, copy and paste it into the final step of the Gateway setup on the On Prem Machine. The Gateway is for the entire server. The entire machine. The Linked service will use that gateway for other things and must be configured for each service i.e. Sql databases. Be patient. Refresh rate is slow and can make it seem like it didn’t work when it did.
Data Management Gateway Configuration Manager Instructions on use: us/documentation/articles/data-factory-move-data-between- onprem-and-cloud/#using-the-data-gateway-step-by-step- walkthroughhttps://azure.microsoft.com/en- us/documentation/articles/data-factory-move-data-between- onprem-and-cloud/#using-the-data-gateway-step-by-step- walkthrough For dev purposes, for you own machine. Use Express Set up. It will take about 10 minutes, but it works. You’ll have the data management on your laptop bam.
Learning Path paths/data-factory/ paths/data-factory/
Webinar Wee Hyong Tok’s webcast
Resources Reza Rad’s blog