ETL Design and Development Michael A. Fudge, Jr.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Technical BI Project Lifecycle
Data Manager Best Practices Business Intelligence Solutions.
Decision Support and Data Warehouse. Decision supports Systems Components Data management function –Data warehouse Model management function –Analytical.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
IST722 Data Warehousing Technical Architecture Michael A. Fudge, Jr. * Figures taken from Kimball Ch. 4.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Components of the Data Warehouse Michael A. Fudge, Jr.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Data warehousing theory and modelling techniques Building Dimensional Models.
ETL By Dr. Gabriel.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
Introducing ETL: Components & Architecture Michael A. Fudge, Jr.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
1 Brett Hanes 30 March 2007 Data Warehousing & Business Intelligence 30 March 2007 Brett Hanes.
IST722 Data Warehousing Business Intelligence Development with SQL Server Analysis Services and Excel 2013 Michael A. Fudge, Jr.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
South Africa Data Warehouse for PEPFAR Presented by: Michael Ogawa Khulisa Management Services
The Business Intelligence Side of Blue Mountain RAM Bill Lucas, IT Systems Architect and Senior Software Engineer.
1 Data Warehouses BUAD/American University Data Warehouses.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
ETL Extract. Design Logical before Physical Have a plan Identify Data source candidates Analyze source systems with data- profiling tools Receive walk-through.
Data Warehousing.
BI Terminologies.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Transportation: Loading Warehouse Data Chapter 12.
Ahsan Abdullah 1 Data Warehousing Lecture-16 Extract Transform Load (ETL) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
Physical Design Michael A. Fudge, Jr.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
Chapter 11: Data Warehousing Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
7 Strategies for Extracting, Transforming, and Loading.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Warehousing.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Metasolv-OCDM Connector Metasolv OCDM. What is the MSS Adapter for Oracle Communications Data Model? The Oracle Communications Metasolv and Solution Adapter.
INCREMENTAL AGGREGATION After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation.
Base SAS ® vs. SAS ® Data Integration Studio Greg Nelson and Danny Grasse.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
An Overview of Data Warehousing and OLAP Technology
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
Data Integration - The ETL Process Module 4: BIC#4 – Data Integration Capability Populating Data Warehouse (Data Mart) 1.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
ETL Design - Stage Philip Noakes May 9, 2015.
Data Warehousing/Loading the DW—Topics
Data warehouse and OLAP
IBM DATASTAGE online Training at GoLogica
Data Warehouse.
SSIS Demo Michael A. Fudge, Jr.
Components of the Data Warehouse Michael A. Fudge, Jr.
An Introduction to Data Warehousing
Typically data is extracted from multiple sources
Data Warehousing Concepts
Data Warehousing/Loading the DW—Topics
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

ETL Design and Development Michael A. Fudge, Jr. IST722 Data Warehousing ETL Design and Development Michael A. Fudge, Jr.

Recall: Kimball Lifecycle Describes an approach for data warehouse projects

Objective: Outline ETL design and development process. A “Recipe” for ETL

Before You Begin Before you begin, you’ll need Physical Design – Star Schema implementation in ROLAP, with initial load. Architecture Plan – understanding of your DW/BI architecture. Source to Target Mapping – Part of the detailed design process.

The Plan… How the 34 subsystems map and are related to the 10 step plan. According to Kimball.

Step 1 – Draw The High Level Plan This is called a source to target map. Sources come from a variety of disparate areas. Targets are Dimension and Fact Tables

Step 2 – Choose an ETL Tool Your ETL tool is responsible for moving data from the various sources into the data warehouse. Programming language vs. Graphical tool. Programming  Flexibility, Customizable Graphical  Self Documenting, Easy for beginners The best solution is somewhere in the middle.

ETL: Code vs Tool Which of these is easier to understand?

Step 3 – Develop Detailed Strategies Data Extraction & Archival of Extracted Data Data quality checks on dimensions & facts Manage changes to dimensions Ensure the DW and ETL meet systems availability requirements Design a data auditing subsystem Organize the staging data

The Role of the Staging Staging stores copies of source extracts This can be a Database or File Systems Can create a history when none exists. Reduces unnecessary processing of data source. ETL: TRANSFORM (Tooling) Data Sources Staging File System or Database Data Warehouse EXTRACT LOAD ELT:TRANSFORM (SQL)

Step 4 – Drill Down by Target Table Start drilling down into the detailed source to target flow for each target dimension and fact table Flowcharts and pseudo code are useful for building out your transformation logic. ETL Tools allow you to build and document the data flow at the same time:

Step 5 – Populate Dimensions w/ Historic Data Part of the one-time historic processing step. Start with the simplest dimension table (usually type 1 SCD’s) Transformations Combine from separate sources Convert data ex. EBCDIC  ASCII Decode production codes ex. TTT Track-Type Tractor Verify rollups ex: Category  Product Ensure a “Natural” or “Business” key exists for SCD’s Assign Surrogate Keys to Dimension table

Step 6 – Perform the Fact Table Historic Load Part of the one-time historic processing step. Transformations: Replace special codes (eg. -1) with NULL on additive and semi- additive facts Calculate and store complex derived facts ex: shipping amount is divided among the number of items on the order. Pivot rows into columns ex: account type, amount  checking amount, savings amount Associate with Audit Dimension Lookup Dimension Keys using Natural/Business Keys….

Example Surrogate Key Pipeline Handles SCD’s

Step 7 – Dimension Table Incremental Processing Oftentimes the same logic used in the Historic load can be used. Identify New/ Changed data based on different attributes for the same natural key ETL tools usually can assist with this logic. CDC (Change Data Capture) Systems are popular

Step 8 – Fact Table Incremental Processing A complex ETL: Can be difficult to determine which facts need to be processed? What happens to a fact when it is re-processed? What if a dimension key lookup fails? Some ETL tool assist with processing this logic. Degenerate dimensions can be used ex: transaction number in order summary A combination of dimension keys ex: StudentKey and ClassKey for grade processing. CDC (Change Data Capture) Systems are popular

CDC Change Data Capture Data Change Events (Create, Update, Delete) are passed to the CDC System The system acts as a source for the ETL Process OLTP Database Transaction Log CDC System ETL Job OR Msg Queue / Service Bus

Step 9 – Aggregate Table and OLAP Loads Further processing beyond the ROLAP star schema. Most ROLAPS Exist to feed the MOLAP Databases Refresh / Reprocess MOLAP cubes INDEXED / MATERIALIZED views Aggregate summary tables

Step 10 – ETL System Operation & Automation Schedule jobs Catch and Log errors / exceptions Database management tasks: Cleanup old data Shrink Database Rebuild indexes Update Statistics

ETL Design and Development Michael A. Fudge, Jr. IST722 Data Warehousing ETL Design and Development Michael A. Fudge, Jr.