About Me

Slides:



Advertisements
Similar presentations
Technical BI Project Lifecycle
Advertisements

Components and Architecture CS 543 – Data Warehousing.
Data Warehouse Components
David Besemer, CTO On Demand Data Integration with Data Virtualization.
SQL Server 2005 Integration Services Mike Taulty Developer & Platform Group Microsoft Ltd
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
An Introduction to Infrastructure Ch 11. Issues Performance drain on the operating environment Technical skills of the data warehouse implementers Operational.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
7 Strategies for Extracting, Transforming, and Loading.
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Others Talk, We Listen. Managing Database Projects in Visual Studio 2013.
ETL Design - Stage Philip Noakes May 9, 2015.
Data Platform and Analytics Foundational Training
Designing and Implementing an ETL Framework
Intro to BI Architecture| Warren Sifre
Business Intelligence & Data Warehousing
Antonio Abalos Castillo
LOCO Extract – Transform - Load
Chapter 13 Business Intelligence and Data Warehouses
Incrementally Moving to the Cloud Using Biml
Chapter 13 The Data Warehouse
SQL Server Integration Services
IBM DATASTAGE online Training at GoLogica
Data Warehouse.
Presented by: Warren Sifre
A developers guide to Azure SQL Data Warehouse
Welcome! Power BI User Group (PUG)
Populating a Data Warehouse
Populating a Data Warehouse
Populating a Data Warehouse
BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -
A developers guide to Azure SQL Data Warehouse
Welcome! Power BI User Group (PUG)
Populating a Data Warehouse
Near Real Time ETLs with Azure Serverless Architecture
Ch 4. The Evolution of Analytic Scalability
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Populating a Data Warehouse
Populating a Data Warehouse
XtremeData on the Microsoft Azure Cloud Platform:
Overview of big data tools
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Power BI with Analysis Services
Data Warehousing Concepts
Introduction to Dataflows in Power BI
Orchestration and data movement with Azure Data Factory v2
Chapter 5 Architectural Design.
Understanding Azure Data Engineering Options Finding Clarity in a Vast & Changing Landscape Cameron Snapp.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
SSIS Data Integration Data Warehouse Acceleration
ETL Patterns in the Cloud with Azure Data Factory
SSIS Data Integration Data Warehouse Acceleration
Moving your on-prem data warehouse to cloud. What are your options?
SSIS Data Integration Data Warehouse Acceleration
Michael French Principal Consultant 5/18/2019
Resources.
Get your data flowing with Data Flows! and...umm...dataflows.
Design for Flexibility and Performance - ETL Patterns with SSIS and Beyond And without further ado, here is Daniel with Using SSIS to Prepare Data for.
Visual Data Flows – Azure Data Factory v2
Dimension Load Patterns with Azure Data Factory Data Flows
Demo for Partners and Customers
Visual Data Flows – Azure Data Factory v2
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

About Me – csnapp@captechconsulting.com @SnappSQL MCSE and PMP certified IT Consultant with CapTech since 2006 and have over 14 years of Microsoft SQL Server experience Computer Science degree from the University of Richmond Masters degree in IT Management from the University of Virginia Founded my own MLB Data Analytics Company Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

Topics We’ll Cover Explanation of ETL and ELT Strategies Debate the characteristics In Azure? How to Implement an ELT Architecture The Tactical Benefits: Superior Traceability Reduced Execution Times Extensible Design Fits Next Steps for Database Developers Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

Presentation Disclaimer I’ve chosen to focus on my SQL Server and SSIS experiences but the concepts still apply to your database platform of choice ELTL is assumed to be the same as ELT. The extra L takes place but feels unnecessary. Sources can be anything, but let’s assume it’s tabular. Targets are typically to support transactional systems or business intelligence, but let’s talk big data too! Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

Why the Debate? Landscape changes are causing the discussion: Volume, Variety, Velocity, Veracity, Value Analytic tools are changing the game too Codebase complexities and delivery times need to be controlled Key Difference: When and Where the transformation step is performed Key Truth: Data Quality is always a concern Key Takeaway: Business needs and technical capabilities still drive the data management decision Offer data to analysts and let them link it in their BI tool Consumers don’t care as long as its immediate and correct Technology or methodology never absolves you of analysis Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

Background Level Set - ETL Data from disparate data sources Extract Data in tool, in flight; modifications done in memory Transform Data to destination structure Load Image source: https://tekclasses.com/difference-between-etl-and-elt-process/ Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

Conventional ETL Design Let’s Debate? Graphical view of data pipelines Traditionally batch oriented, parallel and scheduled Offer functionality not available in the RDBMS Requires specialized developer skills Business rules, cleansings, validations, filtering, joins, lookups all programmed into the tool Focus on a single destination model and work backwards Leverages a proprietary engine, potentially separate server Performance dependent on component configurations, order operations, memory to data volume ratio Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

Background Level Set - ELT Data from disparate data sources Extract Data in its original form to a staging area of the target server Load Data with semantic layer and MERGE to destination Transform Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

Conventional ELT Design Let’s Debate? Considered to be a more modern approach Allows for more real-time processing Fewer volume and structure concerns Extract data fast, limiting source system strain Reuse stage/processing to load multiple target structures Consolidate area where business and data quality rules exist Lays a foundation for a Data Lake, populating multiple destinations Leverage a RDBMS’s transaction engine to work with data locally Can necessitate more drive space & CPU, but reduces other hardware needs Maintaining large repository is not quite so simple Image source: https://tekclasses.com/difference-between-etl-and-elt-process/ Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

But What About Azure? Integration Runtimes Data Factory DF v2 Polybase SSIS on VM AML T-SQL Hive, Spark Data Transfer Units a major factor Scaling destination pricing tier throttles performance Scale Azure SQLDB to 25-50 DTUs per MB of bandwidth SQL DW offers Massively Parallel Processing architecture Best to use Polybase, leverage T-SQL on the DW Many “distributed” design considerations Loads from Hadoop/Data Lake? Consider Spark, Hive https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime https://docs.microsoft.com/en-us/azure/data-factory/tutorial-deploy-ssis-packages-azure Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

My Proposed ELT Style Architecture Stage Use SSIS to truncate stage table and do 1:1 data flow task Simultaneously copy of all source data about the transaction that changed Semantic Views SELECT statement which applies all business rules, joins, transformations from the stage area Output columns are format of a single target table T-SQL Merge Source is the view - Destination is the target table - Join on Natural Keys Can handle Type 1 or Type 2 deltas Key Benefits: Faster development cycles and execution times Decreased cost and complexity of code maintenance Flexible to fit many scenarios Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

Superior Traceability Stage is a 1:1 copy of the source Table names, Column names, Data types Process used by multiple data flows Views contain all the code No longer tracing dozens of ETL tool components Code is more readable and reviews are faster Repeatable design patterns Limit variability of implementation Object Oriented methodology Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

Reduced Execution Time Leverage TRUNCATE on stage Remove usage of blocking components No waste of transaction log None of the components below cause wait times or require proper use of caching Connections are opened/closed quickly Use MERGE command to move data locally Stages execute in parallel Fast at determining if/what needs to change and much faster than OLEDB command or slowly changing dimension components Source queries are small, fast and leverage indexes Only stage data about data that needs refreshed Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

Extensible Design Fits Adaptable to the most common scenarios Some difference in “what to stage” Order of target loads matters Code reusability Dynamically generate the MERGE command Mapping table to translate vernacular between systems Frameworks Too! Use an Execution Log to track all executing processes Consistent parameterization and lookup of date ranges Re-startability Data Validation & Retry Errors process Target Structure 3rd Normal Form Transactional Star Schema Reporting Mart Load Frequency One Time Migration  On Going Refresh Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

Next Steps for Database Developers Consider Scaling back on using complex ETL tools to house ETL logic Hardware sizing – more drive space and virtual memory on target server Leverage Visual Studio’s Database Projects to manage your objects Data Analysts and Source to Target documentation Implement Build queries that replicate existing ETL processes Pilot this process against a traditional approach Copyright © 2018 CapTech Ventures, Inc. All rights reserved.

Questions?