Incrementally Moving to the Cloud Using Biml Scott Currie Varigence
Agenda Azure Data Factory Cloud Data Movement Workflows (in general) What is Azure Data Factory? Scenarios for using ADF with Biml ADF in Biml Azure Feature Pack for SSIS Cloud Data Movement Workflows (in general)
What is Azure Data Factory? “Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. Just like a manufacturing factory that runs equipment to take raw materials and transform them into finished goods, Data Factory orchestrates existing services that collect raw data and transform it into ready-to-use information.” https://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/
Ehhhhhhhh… Think of Azure Data Factory (as it currently stands) as being like SQL Agent in the cloud Data movement is pretty useful https://azure.microsoft.com/en-us/documentation/articles/data-factory-data-movement-activities/ Data transformation? Not so much https://azure.microsoft.com/en-us/documentation/articles/data-factory-data-transformation-activities/ Amount of configuration is rather heavy Most of the development must be done by hand with JSON
Let’s take a look at… Azure Data Factory
Scenarios for Using ADF with Biml Equivalent of simple staging for on-premises / cloud hybrid scenarios Orchestration of AzureDW workflows Orchestration and autogeneration of big data workflows Hadoop U-SQL Failover and surge strategies
Azure Data Factory in Biml Let’s take a look at… Azure Data Factory in Biml
Biml Workflow
Azure Feature Pack for SSIS Connection Managers Azure Storage Connection Manager Azure Subscription Connection Manager Tasks Azure Blob Upload Task Azure Blob Download Task Azure HDInsight Hive Task Azure HDInsight Pig Task Azure HDInsight Create Cluster Task Azure HDInsight Delete Cluster Task Data Flow Components Azure Blob Source Azure Blob Destination Azure Blob Enumerator Foreach Azure Blob Enumerator https://msdn.microsoft.com/en-us/library/mt146770.aspx
Cloud Data Movement Workflows (in general) Migrating Data Options Real World Migration Scenario Migrating Data to the Cloud
Migrating Data Options (Azure) Load Data with Azure Data Factory Move data from OnPrem to Azure Storage Blob to SQL Data Warehouse Load data with PolyBase in SQL Data Warehouse Load data into Azure Storage Blob using AzCopy Load data into SQL Data Warehouse using PolyBase Load data with BCP in SQL Data Warehouse Import data into SQL Data Warehouse using BCP
Loading Data Blob Storage Data Factory SQL DWH Amazon S3 Snowball AWS Import/Export Amazon Redshift Bucket with Objects Snowball Amazon S3 Technical metadata Technical metadata (ETL process metadata, back room metadata, transformation metadata) is a representation of the ETL process. It stores data mapping and transformations from source systems to the data warehouse and is mostly used by data warehouse developers, specialists and ETL modellers. Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition. The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies. The technical metadata can be structured in the following way: Source Database - or system definition. It can be a source system database, another data warehouse, file system, etc. Target Database - Data Warehouse instance Source Tables - one or more tables which are input to calculate a value of the field Source Columns - one or more columns which are input to calculate a value of the field Target Table - target DW table and column are always single in a metadata repository. Target Column - target DW column Transformation - the descriptive part of a metadata entry. It usually contains a lot of information, so it is important to use a common standard throughout the organisation to keep the data consistent.
Load data with PolyBase in SQL Data Warehouse Blob Storage PolyBase SQL DWH Technical metadata Technical metadata (ETL process metadata, back room metadata, transformation metadata) is a representation of the ETL process. It stores data mapping and transformations from source systems to the data warehouse and is mostly used by data warehouse developers, specialists and ETL modellers. Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition. The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies. The technical metadata can be structured in the following way: Source Database - or system definition. It can be a source system database, another data warehouse, file system, etc. Target Database - Data Warehouse instance Source Tables - one or more tables which are input to calculate a value of the field Source Columns - one or more columns which are input to calculate a value of the field Target Table - target DW table and column are always single in a metadata repository. Target Column - target DW column Transformation - the descriptive part of a metadata entry. It usually contains a lot of information, so it is important to use a common standard throughout the organisation to keep the data consistent. https://msdn.microsoft.com/en-us/library/mt143171.aspx
Load data with BCP in SQL Data Warehouse bcp DimDate2 in C:\Temp\DimDate2.txt -S <Server Name> -d <Database Name> -U <Username> -P <password> -q -c -t ',' BCP Data Migration Wizard SQL DWH Technical metadata Technical metadata (ETL process metadata, back room metadata, transformation metadata) is a representation of the ETL process. It stores data mapping and transformations from source systems to the data warehouse and is mostly used by data warehouse developers, specialists and ETL modellers. Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition. The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies. The technical metadata can be structured in the following way: Source Database - or system definition. It can be a source system database, another data warehouse, file system, etc. Target Database - Data Warehouse instance Source Tables - one or more tables which are input to calculate a value of the field Source Columns - one or more columns which are input to calculate a value of the field Target Table - target DW table and column are always single in a metadata repository. Target Column - target DW column Transformation - the descriptive part of a metadata entry. It usually contains a lot of information, so it is important to use a common standard throughout the organisation to keep the data consistent.
Show me the code already!!! Aren’t you the guy that lives codes in presentations? Show me the code already!!!
I tried that, but it was slow
The Need for Speed 10 terabytes of data will take more than 10 days to transfer over a dedicated 100 Mbps connection.
Real World Example of Moving to the Cloud Technical metadata Technical metadata (ETL process metadata, back room metadata, transformation metadata) is a representation of the ETL process. It stores data mapping and transformations from source systems to the data warehouse and is mostly used by data warehouse developers, specialists and ETL modellers. Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition. The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies. The technical metadata can be structured in the following way: Source Database - or system definition. It can be a source system database, another data warehouse, file system, etc. Target Database - Data Warehouse instance Source Tables - one or more tables which are input to calculate a value of the field Source Columns - one or more columns which are input to calculate a value of the field Target Table - target DW table and column are always single in a metadata repository. Target Column - target DW column Transformation - the descriptive part of a metadata entry. It usually contains a lot of information, so it is important to use a common standard throughout the organisation to keep the data consistent.
Load Data with SSIS (ETL) STUFF Technical metadata Technical metadata (ETL process metadata, back room metadata, transformation metadata) is a representation of the ETL process. It stores data mapping and transformations from source systems to the data warehouse and is mostly used by data warehouse developers, specialists and ETL modellers. Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition. The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies. The technical metadata can be structured in the following way: Source Database - or system definition. It can be a source system database, another data warehouse, file system, etc. Target Database - Data Warehouse instance Source Tables - one or more tables which are input to calculate a value of the field Source Columns - one or more columns which are input to calculate a value of the field Target Table - target DW table and column are always single in a metadata repository. Target Column - target DW column Transformation - the descriptive part of a metadata entry. It usually contains a lot of information, so it is important to use a common standard throughout the organisation to keep the data consistent.
Load Data with SSIS (ETL) without STUFF AdoNet Postgres OleDb Technical metadata Technical metadata (ETL process metadata, back room metadata, transformation metadata) is a representation of the ETL process. It stores data mapping and transformations from source systems to the data warehouse and is mostly used by data warehouse developers, specialists and ETL modellers. Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition. The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies. The technical metadata can be structured in the following way: Source Database - or system definition. It can be a source system database, another data warehouse, file system, etc. Target Database - Data Warehouse instance Source Tables - one or more tables which are input to calculate a value of the field Source Columns - one or more columns which are input to calculate a value of the field Target Table - target DW table and column are always single in a metadata repository. Target Column - target DW column Transformation - the descriptive part of a metadata entry. It usually contains a lot of information, so it is important to use a common standard throughout the organisation to keep the data consistent.
Load Data Pattern without STUFF UTF-8 10X Blob Storage PolyBase SQL DWH Technical metadata Technical metadata (ETL process metadata, back room metadata, transformation metadata) is a representation of the ETL process. It stores data mapping and transformations from source systems to the data warehouse and is mostly used by data warehouse developers, specialists and ETL modellers. Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition. The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies. The technical metadata can be structured in the following way: Source Database - or system definition. It can be a source system database, another data warehouse, file system, etc. Target Database - Data Warehouse instance Source Tables - one or more tables which are input to calculate a value of the field Source Columns - one or more columns which are input to calculate a value of the field Target Table - target DW table and column are always single in a metadata repository. Target Column - target DW column Transformation - the descriptive part of a metadata entry. It usually contains a lot of information, so it is important to use a common standard throughout the organisation to keep the data consistent.
New Load Data Pattern without STUFF Technical metadata Technical metadata (ETL process metadata, back room metadata, transformation metadata) is a representation of the ETL process. It stores data mapping and transformations from source systems to the data warehouse and is mostly used by data warehouse developers, specialists and ETL modellers. Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition. The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies. The technical metadata can be structured in the following way: Source Database - or system definition. It can be a source system database, another data warehouse, file system, etc. Target Database - Data Warehouse instance Source Tables - one or more tables which are input to calculate a value of the field Source Columns - one or more columns which are input to calculate a value of the field Target Table - target DW table and column are always single in a metadata repository. Target Column - target DW column Transformation - the descriptive part of a metadata entry. It usually contains a lot of information, so it is important to use a common standard throughout the organisation to keep the data consistent.
Questions?
Load Data with Biml Pattern without STUFF UTF-8 Blob Storage PolyBase SQL DWH Technical metadata Technical metadata (ETL process metadata, back room metadata, transformation metadata) is a representation of the ETL process. It stores data mapping and transformations from source systems to the data warehouse and is mostly used by data warehouse developers, specialists and ETL modellers. Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition. The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies. The technical metadata can be structured in the following way: Source Database - or system definition. It can be a source system database, another data warehouse, file system, etc. Target Database - Data Warehouse instance Source Tables - one or more tables which are input to calculate a value of the field Source Columns - one or more columns which are input to calculate a value of the field Target Table - target DW table and column are always single in a metadata repository. Target Column - target DW column Transformation - the descriptive part of a metadata entry. It usually contains a lot of information, so it is important to use a common standard throughout the organisation to keep the data consistent.