Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding Azure Data Engineering Options Finding Clarity in a Vast & Changing Landscape Cameron Snapp.

Similar presentations


Presentation on theme: "Understanding Azure Data Engineering Options Finding Clarity in a Vast & Changing Landscape Cameron Snapp."— Presentation transcript:

1 Understanding Azure Data Engineering Options Finding Clarity in a Vast & Changing Landscape
Cameron Snapp

2

3 About Me – csnapp@captechconsulting.com @SnappSQL
MCSE and PMP certified IT Consultant with CapTech since and have over 15 years of Microsoft SQL Server experience Computer Science degree from the University of Richmond Masters degree in IT Management from the University of Virginia Founded my own MLB Data Analytics Company

4 Topics We’ll Cover Why Go to the (Data) Cloud?
Azure Data Tool Highlights Data Migration Considerations Understanding Integration Runtimes Demoing the SSIS Lift & Shift Demoing Data Factory Looped Copy Demoing Data Flow What’s Next?

5 Why Go to the (Data) Cloud?
Reduce operational costs  limit burden of managing infrastructure Increase high availability  ability to solve failover, redundancy, and SLA challenges Increase scalability  ability to specify multiple cores per node (scale up) and multiple nodes per cluster (scale out). Life and shift on premise databases and SSIS to Azure Migrate on premise data to higher performing cloud data store Capture weblogs or social media activity to a blob store for batch processing Ingest a variety of data to a lake for scientific exploration and modeling Introduce a new analytic platform for ad-hoc BI Business needs & technical capabilities still drive the data management decision Hybrid solutions are (seemingly?) the norm Price, Policies, Processes all need to be (re)evaluated Key Takeaways:

6 Azure Data Tool Highlights
Storage (ABS) - massively scalable object store; serves as a file system, a messaging store, and a NoSQL repository in the cloud Data Lake Storage (ADLS) – Hadoop-compatible scalable storage for big data analytic workloads SQL Database – cloud DB as a service, offering most of SQL Server Enterprise, meeting high availability and SLA needs. SQL Datawarehouse – elastic cloud DB supporting MPP, specialized for analytic queries of relational data at scale Cosmos DB - globally distributed, multi-model database with customizable elasticity and throughput scalability. Data Factory – integration service to orchestrate and automate data movement and transformation leveraging a variety of languages. Data Lake Analytics (ADLA) – on-demand analytics service for transforming and processing big data with U-SQL Databricks – Apache Spark-based analytics platform, supporting collaborative data exploration, feature engineering, and modeling on elastic node clusters HDInsights – service that deploys/provisions Apache Hadoop clusters, hosted in Azure, for big data analysis, processing, and reporting. PowerBI – business intelligence service for displaying visualizations, performing ad-hoc analysis, and accessing data from a variety of sources without IT integration.

7 Data Migration Considerations
Hive, Spark, Scala, Polybase, Python, AML, U-SQL, T-SQL, Existing codebase impacts SQL Server flavor selection Truly understand ingest frequency, compute platform, consumption methods Don’t be boxed into a single strategy. Key Takeaways:

8 A Possible High Level Design

9 Migration Strategies

10 Understanding Integration Runtimes
Data Transfer Units a major factor Scaling destination pricing tier throttles performance Scale Azure SQLDB to DTUs per MB of bandwidth SQL DW offers Massively Parallel Processing architecture Best to use Polybase, leverage T-SQL on the DW Many “distributed” design considerations Loads from Hadoop/Data Lake? Consider Spark, Hive

11 Engineering Scenarios: Framework Demos
SSIS Lift and Shift Project Deploy Env and IR Config Data Factory Activity Pipeline to start IR, poll, exec package, pause IR, Traditional Datawarehouse Ingestion Basic Data Factory Copy Other Activities Azure DevOps, GitHub integration Dynamic Capabilities and Resuability HTTP JSON Load Hierarchical Pipelines Sink to SQL Database with Stored Proc Database Table Types Dynamic Column Mapping Data Lake Ingestion “Zone” and Folder Structure Patterns AdfConfig Database Metadata Driven Solution Error Handling, Restartability Web UI (visual programmer), Source Control, JSON editor, new language syntax Data Factory has incredibly reusable components via parameters Drive “what to do” business rules in database and limit DF’s “knowledge" Key Takeaways:

12 Azure Data Flow Demo

13 Azure Data Flow – Quick Hits
Puts the missing T back in ADF as an ETL tool Translates JSON to Scala and executes on Databricks Rich expression builder for aggregate and derived Surrogate Key, Exists, Extend are cool Can be debugged w/ data preview (DBX) Cannot be natively triggered (pipeline parent) Monitoring interface has data viewing capabilities

14 Questions? What’s Next? Thank You! csnapp@captechconsulting.com
@SnappSQL

15 SSIS Lift & Shift Details
Must leverage project deployment Cost based on SSIS-IR run time, not activities called or data volumes Able to install 3rd Party tools here Secure channel vita HTTPS and TLS over IPSec VPN or the Azure ExpressRoute option offer additional security Azure SQL DB/DW as targets support encryption at rest (TDE) SSISDB provides same logging/reporting/configurations as on-prem

16 Even SQL Database is Confusing…


Download ppt "Understanding Azure Data Engineering Options Finding Clarity in a Vast & Changing Landscape Cameron Snapp."

Similar presentations


Ads by Google