Typically data is extracted from multiple sources

Slides:



Advertisements
Similar presentations
Chapter 11: Data Warehousing
Advertisements

IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence.
Data Integration Combining data from different sources, providing a unified view of the data Combining data from different sources, providing a unified.
Chapter 11: Data Warehousing
© Ron McFadyen1 Many-to-one-to-many We need information that can only be obtained by accessing two fact tables through a common dimension … drilling across.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Chapter 4 Data Warehousing.
ETL Process in Data Warehouse Chirayu Poundarik. Outline ETL Extraction Transformation Loading.
ETL Design and Development Michael A. Fudge, Jr.
ETL By Dr. Gabriel.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
Data Warehousing.
MBA 664 Database Management Systems Dave Salisbury ( )
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Data Warehouse Student Data User Group Meeting 1/29/2015.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
1 Data Warehouses BUAD/American University Data Warehouses.
1 Data Warehousing. 2Definition Data Warehouse Data Warehouse: – A subject-oriented, integrated, time-variant, non- updatable collection of data used.
Ahsan Abdullah 1 Data Warehousing Lecture-18 ETL Detail: Data Extraction & Transformation Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. &
Right In Time Presented By: Maria Baron Written By: Rajesh Gadodia
Data Warehousing.
Prepared By Aakanksha Agrawal & Richa Pandey Mtech CSE 3 rd SEM.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
Transportation: Refreshing Warehouse Data Chapter 13.
Chapter 11: Data Warehousing Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
7 Strategies for Extracting, Transforming, and Loading.
OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.
Two-Tier DW Architecture. Three-Tier DW Architecture.
Data Warehousing.
MIS 451 Building Business Intelligence Systems Data Staging.
INCREMENTAL AGGREGATION After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Plan for Populating a DW
ETL Process in Data Warehouse
Chapter Name Replication and Mobile Databases Transparencies
Example of a page header
Summarized from various resources Modern Database Management
Chapter 11: Data Warehousing
IBM DATASTAGE online Training at GoLogica
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Applying Data Warehouse Techniques
Database Performance Tuning and Query Optimization
Overview and Fundamentals
Dimensional Model January 14, 2003
Inventory is used to illustrate:
Populating a Data Warehouse
Data warehouse architecture CIF, DM Bus Matrix Star schema
Aggregate Improvement and Lost, shrunken, and collapsed
Point-in-time balances Physical database Aggregation ETL Architecture
Retail Sales is used to illustrate a first dimensional model
Dimensional Model January 16, 2003
Chapter 11 Database Performance Tuning and Query Optimization
Applying Data Warehouse Techniques
Examines blended and separate transaction schemas
Review of Major Points Star schema Slowly changing dimensions Keys
Design and ETL
Transaction fact table (figure 7.2)
ETL Processing Mechanics of ETL.
Applying Data Warehouse Techniques
Review of Major Points Star schema Slowly changing dimensions Keys
Best Practices in Higher Education Student Data Warehousing Forum
Page 37 Figure 2.3, with attributes excluded
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Typically data is extracted from multiple sources Loading the Warehouse Typically data is extracted from multiple sources To update the warehouse periodically, we must receive the changes that have occurred and load these into the warehouse An initial load of the warehouse ETL Periodically update the warehouse Changes? ETL Refresh? March 2004 91.4904 Ron McFadyen

Data is obtained from source systems Transform ETL Extract Data is obtained from source systems Transform Data is cleansed and transformed Correcting values, detecting incorrect/impossible values, … Complex fields broken down, standardization of values, … Attribute types may vary in source and target Data may be aggregated Load New data is loaded into dimensions and fact tables Indexes rebuilt March 2004 91.4904 Ron McFadyen

Data capture techniques Synchronous Data is obtained as it is created in the source systems – in real time Asynchronous A delay is present between the time the data changes and the time it is captured in the warehouse Total Refresh A complete table in the warehouse is refreshed from its source Incremental Only changes are acquired and loaded into the warehouse March 2004 91.4904 Ron McFadyen

Data capture techniques Static capture Data is acquired by reading the database or files. Subsets may be acquired by filtering Application assisted The application(s) are modified to write changes out to a file/database Triggered capture Triggers in a DBMS are written to capture changes Replication A DBMS replication feature is used to manage changes Log capture The DBMS logging feature is utilized for capturing changes File comparison A prior copy and the current file are comparing to find changes March 2004 91.4904 Ron McFadyen

Data capture process source Extract Cleanse Repair Transform target errors Load March 2004 91.4904 Ron McFadyen

Star Schema Update Load Dim 1 Load Dim n Load Fact table March 2004 91.4904 Ron McFadyen

Dimension table surrogate key management Figure 16.4 on page 360 Going back in time (page 271) Late-arriving fact rows Late-arriving dimension rows From the source systems we receive data that we should have receive some time ago Technical issues having to do with Dimension records being twin-timestamped for contiguous non-overlapping time intervals, placing records in the right partition For dimensions, you also need to adjust the facts Subtle point about the ordering of surrogate keys March 2004 91.4904 Ron McFadyen

Point-in-time balances See pages 208+, SQL on page 209 The fact table is given on page 210 There is just one date-related attribute: transaction date key “the date key is a set of integers running from 1 to N with a meaningful, predictable sequence. We assign consecutive integers to the date surrogate key so that we can physically partition a large fact table based on the date.” “The date dimension is the only dimension whose surrogate keys have any embedded semi-intelligence” March 2004 91.4904 Ron McFadyen