ETL Design - Stage Philip Noakes May 9, 2015.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Chapter 10: Designing Databases
Module 12: Auditing SQL Server Environments
Database Security and Auditing: Protecting Data Integrity and Accessibility Chapter 8 Application Data Auditing.
Database Security and Auditing: Protecting Data Integrity and Accessibility Chapter 8 Application Data Auditing.
1 7 Concepts of Database Management, 4 th Edition, Pratt & Adamski Chapter 7 DBMS Functions.
Fundamentals, Design, and Implementation, 9/e Chapter 11 Managing Databases with SQL Server 2000.
ICS (072)Database Systems Background Review 1 Database Systems Background Review Dr. Muhammad Shafique.
Concepts of Database Management Sixth Edition
SQL Server 2005 Integration Services Mike Taulty Developer & Platform Group Microsoft Ltd
ETL Design and Development Michael A. Fudge, Jr.
ETL By Dr. Gabriel.
Intro Informatica Productivity Pack Save Time and Money while Increasing the Quality of Your PowerCenter Deployment Louis Hausle.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
The University of Akron Dept of Business Technology Computer Information Systems DBMS Functions 2440: 180 Database Concepts Instructor: Enoch E. Damson.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
Concepts of Database Management, Fifth Edition
1 The following presentation is from the Oracle Webcast “What’s New in P6 EPPM Release 8.1.” As a partner, you may not use the Oracle Power Point template,
STORING ORGANIZATIONAL INFORMATION— DATABASES CIS 429—Chapter 7.
Agenda 03/27/2014 Review first test. Discuss internal data project. Review characteristics of data quality. Types of data. Data quality. Data governance.
FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 1 Cornell University’s.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
ETL Extract. Design Logical before Physical Have a plan Identify Data source candidates Analyze source systems with data- profiling tools Receive walk-through.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
7 Strategies for Extracting, Transforming, and Loading.
RoOUG Iunie Bucuresti, 26 Iunie Agenda Inregistrarea participantilor ODI – Common Use Cases 2Iunie 2013.
Know your data source well. Who am I? Nik – Shahriar Nikkhah Microsoft MVP 2010 – SQL Server MCITP SQL 2008 MCTS SQL 2008 and s:
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Helping Your Data Warehouse Succeed: 10 Mistakes to Avoid in Data Integration Rafael Salas w:
Slide 1 © 2016, Lera Technologies. All Rights Reserved. Oracle Data Integrator By Lera Technologies.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Impacted Systems Remediation (ISR) Forum Topic: PeopleHub
Getting started with Accurately Storing Data
Understanding Data Storage
Designing and Implementing an ETL Framework
Antonio Abalos Castillo
Overview of MDM Site Hub
Functions of a Database Management System
Informix Red Brick Warehouse 5.1
The Client/Server Database Environment
Presented by: Warren Sifre
Swagatika Sarangi (Jazz), MDM Expert
Master Data Management with SQL Server 2016 Master Data Services
About Me
Normalization Referential Integrity
tRelational/DPS Overview
Database management concepts
Physical Database Design
Populating a Data Warehouse
Typically data is extracted from multiple sources
Primary key Introduction Introduction: A primary key, also called a primary keyword, is a key in a relational database that is unique for each record.
Cloud Data Replication with SQL Data Sync
Database management concepts
Data Warehousing Concepts
Chapter 11 Managing Databases with SQL Server 2000
Best Practices in Higher Education Student Data Warehousing Forum
David Gilmore & Richard Blevins Senior Consultants April 17th, 2012
Dimension Load Patterns with Azure Data Factory Data Flows
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

ETL Design - Stage Philip Noakes May 9, 2015

Who am I? Philip Noakes Database Developer/Designer CapTech Consulting MCITP in SQL Server BI

Agenda Background – ETL and Staging Data Data Modeling in Stage ETL Architecture ETL vs ELT Data Modeling in Stage Concepts Table Structure Data Flow Auxiliary tables in the stage environment Control tables Logging Process Execution Errors Notification/Reporting

ETL Architecture

ETL vs ELT ELT - Loading raw data to presentation layer then performing transformations at the target ETL – Loading transformed data into the presentation layer

ETL vs ELT When to use ELT When to use ETL Traceability to untransformed source data Larger volumes of data When to use ETL All other times “The ETL process can take a long time. If we are processing in stream, we’ll have a connection open to the source system. A long-running process can create problems with database locks and stress the transaction system.” 1

Stage Design Built by database developers for database developers!!!

Stage Concepts Schemas Denormalized Data Data Cleansing Source specific Secured Denormalized Data Data Cleansing Flag bad or unusable data

Table Design - Schemas Organization Security Administration Source System Identification Cleansed vs Raw Security Administration Grant access by source system

Table Design - Denormalizing Flattening Data Pulling higher granularity attributes into lower granularity records. Pivoting lower granularity data into columns on higher granularity rows.

Table Design - Denormalizing Example: Orders Table

Table Design - Denormalizing Example: Product Categories

Table Design - Denormalizing "[...] design staging tables to better suit the target rather than the source. Reasons: 1. ETL is usually a two-step process. Stage then load. if the staging does mild transformations to better suit the target, I need only create one set of load processes. If the DW gets similar data from multiple sources. all I need to do is create new source specific staging processes and let the existing load processes handle the new source. 2. Sources change. I don't want to rewrite ETL processes from end-to-end because of a change in the source. 3. Most of the heavy transformation logic occurs on the load side. With the staging tables closer in structure to the target, the load process code tends to be simpler.“ - Nicholas Galemmo, Kimball Group Forums

Table Design - Denormalizing Why Denormalize in stage? Stand alone tables Reflect target architecture Utilize keys and indexes on source Why not Denormalize in stage? Strain on source system Flexibility

Table Design – Persisted Tables Provides traceability and reload capabilities without hitting source Store more attributes than required in the presentation layer Defined retention period Track processed records

Data Cleansing Identify data scenarios that you don’t want in the target Enforce business rules Look for duplicates Check referential integrity

Data Cleansing Status/Audit Fields Status Code Process/Do Not Process Error Description

Data Cleansing Cleansed Data Tables

Table Design – Data Typing Match the Source Log rejected records (And maybe fail) Image Source: http://dba.stackexchange.com/questions/6589/ssis-data-flow-error-output-runs-all-the-time

Table Design – Keys, Indexes, Etc… Foreign Keys? No! Indexing? No Primary Keys Yes Not NULLs/Check Constraints * - Persisted tables

Auxiliary Tables Set up a Framework! Log process execution stats Keep track of errors Run your system Image Source: http://parascadd.com/products/rcdetailing/panopliapreprocessor.html

Framework Components and Capabilities Control Table Incremental Loads Package Execution Logging System health reporting SLA tracking Error Tables Record Accounting Notification

Control Table

Using the Control Table

Process Tables

Error Logging Reference 2

Error Logging

Summary Stage = Exciting Maintain security considerations in stage Table design can reduce impact on source system Stage can decrease the complexity of target load Stage can be used for recovery and reload Use stage to limit risk of data quality issues

Questions

References 1 – KimballGroup.com Design Tip #99 - http://www.kimballgroup.com/2008/03/design-tip-99-staging-areas-and-etl-tools/ 2 – Erik Veerman, Jessica M Moss, Brian Knight, Jay Hackney. 2008. SQL Server 2008 Integration Services: Problem, Design, Solution