Understanding Azure Data Engineering Options Finding Clarity in a Vast & Changing Landscape Cameron Snapp.

Slides:



Advertisements
Similar presentations
Information managers are seeking innovative DBMS’s which are able to handle large data volumes in new ways or to optimize existing products and processes.
Advertisements

Andy Roberts Data Architect
AZ PASS User Group Azure Data Factory Overview Josh Sivey, Solution Partner October
BUILD BIG DATA ENTERPRISE SOLUTIONS FASTER ON AZURE HDINSIGHT
Connected Infrastructure
AuraPortal Cloud Helps Empower Organizations to Organize and Control Their Business Processes via Applications on the Microsoft Azure Cloud Platform MICROSOFT.
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Data Platform and Analytics Foundational Training
Business Continuity & Disaster Recovery
Big Data Enterprise Patterns
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Data Platform and Analytics Foundational Training
DocFusion 365 Intelligent Template Designer and Document Generation Engine on Azure Enables Your Team to Increase Productivity MICROSOFT AZURE APP BUILDER.
Barracuda Networks Creates Next-Generation Security Solutions That Enable Customers to Accelerate Their Adoption of Microsoft Azure MICROSOFT AZURE APP.
Partner Logo Veropath Offers a Next-Gen Expense Management SaaS Technology Solution, Built Specifically to Harness Big Data Analytics Capabilities in Azure.
Vidcoding Introduces Scalable Video and TV Encoding in the Cloud at an Affordable Price by Utilizing the Processing Power of Azure Batch MICROSOFT AZURE.
ADF & SSIS: New Capabilities for Data Integration in the Cloud
Incrementally Moving to the Cloud Using Biml
Primal and Microsoft Azure Deliver Personalized Content, Intelligence, and Analytics That Match Your Content to the Interests of Your Audience MICROSOFT.
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Connected Infrastructure
Exploring Azure Event Grid
NGAGE Intelligence Leverages Microsoft Azure Platform to Provide Essential Analytics for Hybrid SharePoint Server/Office 365 Environments MICROSOFT AZURE.
Custom Activities in Azure Data Factory
A developers guide to Azure SQL Data Warehouse
Business Continuity & Disaster Recovery
Enterprise security for big data solutions on Azure HDInsight
Migrating Your BI Platform To Azure
Oscar AP by Massive Analytic: A Precognitive Analytics Platform for Effortless Data-Driven Decisions. Now Available in Azure Marketplace MICROSOFT AZURE.
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Yellowfin: An Azure-Compatible Business Intelligence Platform That Connects People with Their Data for Better Decision Making MICROSOFT AZURE APP BUILDER.
Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.
About Me
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Data Security for Microsoft Azure
CloneManager® Helps Users Harness the Power of Microsoft Azure to Clone and Migrate Systems into the Cloud Cost-Effectively and Securely MICROSOFT AZURE.
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Orchestration and data movement with Azure Data Factory v2
SSIS in the Cloud Integration Runtime in Azure Data Factory V2
Cloud Analytics for Microsoft Azure
XtremeData on the Microsoft Azure Cloud Platform:
THR1171 Azure Data Integration: Choosing between SSIS, Azure Data Factory, and Azure Databricks Cathrine Wilhelmsen, | cathrinew.net.
Quasardb Is a Fast, Reliable, and Highly Scalable Application Database, Built on Microsoft Azure and Designed Not to Buckle Under Demand MICROSOFT AZURE.
Microsoft Azure for SQL Server Professionals
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
BluSync by ParaBlu Offers Secure Enterprise File Collaboration and Synchronization Solution That Uses Azure Blob Storage to Enable Secure Sharing MICROSOFT.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Orchestration and data movement with Azure Data Factory v2
Zendos Tecnologia Utilizes the Powerful, Scalable
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
ETL Patterns in the Cloud with Azure Data Factory
Azure Data Storage Options
Databricks and End-to-End Processes Demo Links & Help
Thank you to our Sponsors
Moving your on-prem data warehouse to cloud. What are your options?
Introduction to Azure Data Lake
Data Wrangling for ETL enthusiasts
Customer 360.
Michael French Principal Consultant 5/18/2019
The Modern Data Warehouse and Azure
Beyond orchestration with Azure Data Factory
SQL Server 2019 Bringing Apache Spark to SQL Server
Get your data flowing with Data Flows! and...umm...dataflows.
Visual Data Flows – Azure Data Factory v2
Dimension Load Patterns with Azure Data Factory Data Flows
Visual Data Flows – Azure Data Factory v2
Architecture of modern data warehouse
Presentation transcript:

Understanding Azure Data Engineering Options Finding Clarity in a Vast & Changing Landscape Cameron Snapp

About Me – csnapp@captechconsulting.com @SnappSQL MCSE and PMP certified IT Consultant with CapTech since 2006 and have over 15 years of Microsoft SQL Server experience Computer Science degree from the University of Richmond Masters degree in IT Management from the University of Virginia Founded my own MLB Data Analytics Company

Topics We’ll Cover Why Go to the (Data) Cloud? Azure Data Tool Highlights Data Migration Considerations Understanding Integration Runtimes Demoing the SSIS Lift & Shift Demoing Data Factory Looped Copy Demoing Data Flow What’s Next?

Why Go to the (Data) Cloud? Reduce operational costs  limit burden of managing infrastructure Increase high availability  ability to solve failover, redundancy, and SLA challenges Increase scalability  ability to specify multiple cores per node (scale up) and multiple nodes per cluster (scale out). Life and shift on premise databases and SSIS to Azure Migrate on premise data to higher performing cloud data store Capture weblogs or social media activity to a blob store for batch processing Ingest a variety of data to a lake for scientific exploration and modeling Introduce a new analytic platform for ad-hoc BI Business needs & technical capabilities still drive the data management decision Hybrid solutions are (seemingly?) the norm Price, Policies, Processes all need to be (re)evaluated Key Takeaways:

Azure Data Tool Highlights Storage (ABS) - massively scalable object store; serves as a file system, a messaging store, and a NoSQL repository in the cloud Data Lake Storage (ADLS) – Hadoop-compatible scalable storage for big data analytic workloads SQL Database – cloud DB as a service, offering most of SQL Server Enterprise, meeting high availability and SLA needs. SQL Datawarehouse – elastic cloud DB supporting MPP, specialized for analytic queries of relational data at scale Cosmos DB - globally distributed, multi-model database with customizable elasticity and throughput scalability. Data Factory – integration service to orchestrate and automate data movement and transformation leveraging a variety of languages. Data Lake Analytics (ADLA) – on-demand analytics service for transforming and processing big data with U-SQL Databricks – Apache Spark-based analytics platform, supporting collaborative data exploration, feature engineering, and modeling on elastic node clusters HDInsights – service that deploys/provisions Apache Hadoop clusters, hosted in Azure, for big data analysis, processing, and reporting. PowerBI – business intelligence service for displaying visualizations, performing ad-hoc analysis, and accessing data from a variety of sources without IT integration.

Data Migration Considerations Hive, Spark, Scala, Polybase, Python, AML, U-SQL, T-SQL, Existing codebase impacts SQL Server flavor selection Truly understand ingest frequency, compute platform, consumption methods Don’t be boxed into a single strategy. Key Takeaways:

A Possible High Level Design

Migration Strategies

Understanding Integration Runtimes Data Transfer Units a major factor Scaling destination pricing tier throttles performance Scale Azure SQLDB to 25-50 DTUs per MB of bandwidth SQL DW offers Massively Parallel Processing architecture Best to use Polybase, leverage T-SQL on the DW Many “distributed” design considerations Loads from Hadoop/Data Lake? Consider Spark, Hive https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime https://docs.microsoft.com/en-us/azure/data-factory/tutorial-deploy-ssis-packages-azure

Engineering Scenarios: Framework Demos SSIS Lift and Shift Project Deploy Env and IR Config Data Factory Activity Pipeline to start IR, poll, exec package, pause IR, email http://microsoft-ssis.blogspot.com/2018/04/start-and-stop-integration-runtime-in.htm Traditional Datawarehouse Ingestion Basic Data Factory Copy Other Activities Azure DevOps, GitHub integration Dynamic Capabilities and Resuability HTTP JSON Load Hierarchical Pipelines Sink to SQL Database with Stored Proc Database Table Types Dynamic Column Mapping Data Lake Ingestion “Zone” and Folder Structure Patterns AdfConfig Database Metadata Driven Solution Error Handling, Restartability Web UI (visual programmer), Source Control, JSON editor, new language syntax Data Factory has incredibly reusable components via parameters Drive “what to do” business rules in database and limit DF’s “knowledge" Key Takeaways:

Azure Data Flow Demo

Azure Data Flow – Quick Hits Puts the missing T back in ADF as an ETL tool Translates JSON to Scala and executes on Databricks Rich expression builder for aggregate and derived Surrogate Key, Exists, Extend are cool Can be debugged w/ data preview (DBX) Cannot be natively triggered (pipeline parent) Monitoring interface has data viewing capabilities

Questions? What’s Next? Thank You! csnapp@captechconsulting.com @SnappSQL https://www.captechconsulting.com/services/data-and-analytics

SSIS Lift & Shift Details Must leverage project deployment Cost based on SSIS-IR run time, not activities called or data volumes Able to install 3rd Party tools here Secure channel vita HTTPS and TLS over IPSec VPN or the Azure ExpressRoute option offer additional security Azure SQL DB/DW as targets support encryption at rest (TDE) SSISDB provides same logging/reporting/configurations as on-prem

Even SQL Database is Confusing…