Download presentation
Presentation is loading. Please wait.
Published byPatrick Brown Modified over 8 years ago
1
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Intro to Data Factory PASS Cloud Virtual Chapter March 23, 2015 Steve Hughes, Architect
2
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 2 About the Presenter Steve Hughes – Architect for Pragmatic Works Blog: www.dataonwheels.comwww.dataonwheels.com Twitter: @dataonwheels LinkedIn: linked.com/in/dataonwheels Email: shughes@pragmaticworks.com
3
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 3 What is Data Factory? Cloud-based, highly scalable data movement and transformation tool Built on Azure for integrating all kinds of data Still in preview so it is likely not yet feature complete (e.g. Machine Learning Activity added in December 2014)
4
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 4 Data Factory Components Linked Servers SQL Server Database – PaaS, IaaS, On Premise Azure Storage – Blob, Table Datasets Input/Output using JSON deployed with PowerShell Pipelines Activities using JSON deployed with PowerShell Copy, HDInsight, Azure Machine Learning
5
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 5 Current Activities Supported CopyActivity copy data from a source to a sink (destination) HDInsightActivity – Pig, Hive, MapReduce Transformations MLBatchScoringActivity – Can be used to score data with the ML Batch Scoring API StoredProcedureActivity – Executes stored procedures in an Azure SQL Database C# or.NET Custom Activity
6
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Data for the Demo Movies.txt in Azure Blob Storage Movies table in Azure SQL Database
7
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Building a Data Factory Pipeline 1.Create Data Factory 2.Create Linked Services 3.Create Input and Output Tables or Datasets 4.Create Pipeline 5.Set the Active Period for the Pipeline
8
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 8 Step 1: Create a Data Factory in Windows Azure
9
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 9 Step 2 – Create Linked Services 1.Click Linked Services tile 2.Add Data Stores 1.Add Blob Storage 2.Add SQL Database Three Data Store Types Supported: Azure Storage Account Azure SQL Database SQL Server Data Gateways can also be used for on premise SQL Server sources
10
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 10 Step 3 – Create Datasets/Tables JSON File for Datasets Structure – Name, Type (String,Int,Decimal,Guid,Boolean,Date) {name: “ThisName”, type:”String”} Location – Azure Table, Azure Blob, SQL Database Availability – “cadence in which a slice of the table is produced”
11
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 11 Step 3 – Input JSON { "name": "MoviesFromBlob", "properties": { "structure": [ { "name": "MovieTitle", "type": "String"}, { "name": "Studio", "type": "String"}, { "name": "YearReleased", "type": "Int"} ], "location": { "type": "AzureBlobLocation", "folderPath": "data-factory-files/Movies.csv", "format": { "type": "TextFormat", "columnDelimiter": "," }, "linkedServiceName": "Shughes Blob Storage" }, "availability": { "frequency": "hour", "interval": 4 } Structure defines the structure of the data in the file Location defines the location and file format information Availability sets the cadence to once every 4 hours Dataset Name
12
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 12 Step 3 – Output JSON { "name": "MoviesToSqlDb", "properties": { "structure": [ { "name": "MovieName", "type": "String"}, { "name": "Studio", "type": "String"}, { "name": "YearReleased", "type": "Int"} ], "location": { "type": "AzureSQLTableLocation", "tableName": "Movies", "linkedServiceName": "Media Library DB" }, "availability": { "frequency": "hour", "interval": 4 } Dataset Name Structure defines the table Structure, only fields targeted are mapped Location defines the location and the table name Availability sets the cadence to once every 4 hours
13
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 13 Step 3 – Deploy Datasets Deployment is done via PowerShell PS C:\> New-AzureDataFactoryTable -ResourceGroupName shughes-datafactory - DataFactoryName shughes-datafactory -File c:\data\JSON\MoviesFromBlob.json PS C:\> New-AzureDataFactoryTable -ResourceGroupName shughes-datafactory - DataFactoryName shughes-datafactory -File c:\data\JSON\MoviesToSqlDb.json
14
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 14 Step 4 – Pipeline JSON { "name": "MoviesPipeline", "properties": { "description" : "Copy data from csv file in Azure storage to Azure SQL database table", "activities": [ { "name": "CopyMoviesFromBlobToSqlDb", "description": "Add new movies to the Media Library", "type": "CopyActivity", "inputs": [ {"name": "MoviesFromBlob"} ], "outputs": [ {"name": "MoviesToSqlDb"} ], "transformation": { "source": { "type": "BlobSource" }, "sink": { "type": "SqlSink" } Pipeline Name Activity definition – type (CopyActivity), Input, Output Activity Name CopyActivity transformation – source and sink "Policy": { "concurrency": 1, "executionPriorityOrder": "NewestFirst", "style": "StartOfInterval", "retry": 0, "timeout": "01:00:00" } Policy required for SqlSink – concurrency must be set or deployment fails
15
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 15 Step 4 – Deploy Pipeline New-AzureDataFactoryPipeline -ResourceGroupName shughes- datafactory -DataFactoryName shughes-datafactory -File c:\data\JSON\MoviesPipeline.json
16
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 16 Step 4 – Deployed Pipeline
17
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 17 Step 4 – Pipeline Diagram
18
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 18 Step 5 – Set Active Period Set-AzureDataFactoryPipelineActivePeriod -ResourceGroupName shughes- datafactory -DataFactoryName shughes-datafactory -StartDateTime 2015-01- 12 –EndDateTime 2015-01-14 –Name MoviesPipeline This gives the duration that data slices will be available to be processed. The frequency is set in the dataset parameters.
19
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Exploring Blades in Azure Portal Start with the Diagram Drill to various details in the pipeline Latest Update full online design capability
20
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Looking at Monitoring Review monitoring information in Azure portal
21
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 21 Common Use Cases Log Import for Analysis
22
INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 22 Resources Azure Storage Explorer – Codeplex.com Azure.Microsoft.com – Data Factory Azure.Microsoft.com – Azure PowerShell
23
Products Improve the quality, productivity, and performance of your SQL Server and BI solutions. Services Speed development through training and rapid development services from Pragmatic Works. Foundation Helping those who don’t have the means to get into information technology and to achieve their dreams. Questions? Contact me at steve@dataonwheels.com shughes@pragmaticworks.co m Blog: www.dataonwheels.comwww.dataonwheels.com Pragmatic Works: www.pragmaticworks.com www.pragmaticworks.com
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.