Data Warehouse Chapter 11. Multiple Files Problem Added complexity of multiple source files Start simple Multiple Source files Extracted data Logic to.

Slides:



Advertisements
Similar presentations
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Supervisor : Prof . Abbdolahzadeh
BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.
C6 Databases.
Multidimensional Database Structure
Chapter 3 Database Management
Exploiting the DW data DW is a platform for creating a wide array of reports It solves data feed problems, but does not lead to specific decision support.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Data Staging Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of.
Components and Architecture CS 543 – Data Warehousing.
Modeling the Data Warehouse Chapter 7. Data Warehouse Database Design Phases zDefining the business model (conceptual model) zCreating the dimensional.
Getting Started (Excerpts) Chapter One DAVID M. KROENKE’S DATABASE CONCEPTS, 2 nd Edition.
Data Warehouse success depends on metadata
Concepts of Database Management Sixth Edition
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Pokročilé databázové technológie Genči
Leaving a Metadata Trail Chapter 14. Defining Warehouse Metadata Data about warehouse data and processing Vital to the warehouse Used by everyone Metadata.
Chapter 14 & 15 Conceptual & Logical Database Design Methodology
ETL By Dr. Gabriel.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
1 Sharif University Data Warehouse. 2 Sharif University Objectives Need for Data Warehouse. What is Data Warehouse? Data Warehouse Properties. Data Warehouse.
L/O/G/O Metadata Business Intelligence Erwin Moeyaert.
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Zhangxi Lin Texas Tech University ISQS 6339, Data Management & Business Intelligence 1 ISQS 6339, Data Management & Business Intelligence Extraction, Transformation,
Introduction to the Orion Star Data
Chapter 4 The Relational Model 3: Advanced Topics Concepts of Database Management Seventh Edition.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Concepts and Terminology Introduction to Database.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Chapter 4: Organizing and Manipulating the Data in Databases
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Agenda 03/27/2014 Review first test. Discuss internal data project. Review characteristics of data quality. Types of data. Data quality. Data governance.
© 2007 by Prentice Hall 1 Introduction to databases.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 1: The Database Environment Modern Database Management 9 th Edition Jeffrey A. Hoffer,
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Data Management Console Synonym Editor
ETL Extract. Design Logical before Physical Have a plan Identify Data source candidates Analyze source systems with data- profiling tools Receive walk-through.
Oracle Data Integrator Transformations: Adding More Complexity
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
6 Extraction, Transformation, and Loading (ETL) Transformation.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Databases Shortfalls of file management systems Structure of a database Database administration Database Management system Hierarchical Databases Network.
Transportation: Loading Warehouse Data Chapter 12.
Data Management for Decision Support Session-3 Prof. Bharat Bhasker.
Copyright 2007, Paradigm Publishing Inc. ACCESS 2007 Chapter 2 BACKNEXTEND 2-1 LINKS TO OBJECTIVES Creating Related Tables Creating Related Tables Determining.
Database collection of related information stored in an organized form Database program software tool for storage & retrieval of that information.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Advanced Accounting Information Systems Day 10 answers Organizing and Manipulating Data September 16, 2009.
Transportation: Refreshing Warehouse Data Chapter 13.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
7 Strategies for Extracting, Transforming, and Loading.
Chapter 5-1. Chapter 5-2 Chapter 5: Organizing and Manipulating the Data in Databases Introduction Normalization Validating the Data in Databases Extracting.
MIS 451 Building Business Intelligence Systems Data Staging.
4 Copyright © Oracle Corporation, All rights reserved. Modeling the Data Warehouse.
6 Copyright © 2006, Oracle. All rights reserved. The ETL Process: Transforming Data.
3 Copyright © 2006, Oracle. All rights reserved. Business, Logical, and Dimensional Modeling.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
Copyright  Oracle Corporation, All rights reserved Building the Warehouse.
Copyright  Oracle Corporation, All rights reserved Transforming Data.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Supervisor : Prof . Abbdolahzadeh
Chapter 6 - Database Implementation and Use
Data Warehousing Concepts
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Data Warehouse Chapter 11

Multiple Files Problem Added complexity of multiple source files Start simple Multiple Source files Extracted data Logic to detect Correct source

Transforming Data from Multiple files File

Missing Values Problem Solution Ignore Wait Mark rows Extract when time-stamped A If NULL then Field=‘A’

Duplicate Value Problem Solution SQL self-join techniques RDMBS constrains utilities SELECT… FROM table_a, table_b WHERE table_a.key(+)=table_b.key UNION SELECT… FROM table_a, table_b WHERE table_a.key=table_b.key(+) ACME Inc

Element Names Problem Solution CTAS SQL*Loader Customer Client Contact Name Customer

Element Meaning Problem Avoid misinterpretation Complex solution Document meaning in metadata Customer’s name All customer details All details Except name

Input Format Problem EBCDICASCII “123-73” 12373

Referential Integrity Problem Solution SQL anti-join Server constraints Dedicated tools Department Emp Name Department 1099 Smith Jones Doe Harris 60

Name and Address Problem No unique key Missing values Personal and commercial names mixed Different addresses for same member Different names and spelling for same number Many names on one line One name on two lines

Name and Address Problem Single-field format Multiple-field format Mr.J.Smith, 100 Main St., Bigtown, County Luth, Name Street Town County Code Mr.J.Smith 100 Main St. Bigtown County Luth 23565

Clean and Organize 1. Create atomic values. 2. Standardize formats. 3. Verify data accuracy. 4. Match with other records. 5. Identify private and commercial addresses and inhabitants. 6. Document in metadata. Requires sophisticated tools and techniques

Merging Data Operational transactions do not usually map one-to-one with warehouse data Data for the warehouse is merged to provide information for analysis Sale 1/2/98 12:00:01 Ham Pizza $10.00 Sale 1/2/98 12:00:02 Cheese Pizza $15.00 Sale 1/2/98 12:00:02 Anchovy Pizza $12.00 Return 1/2/98 12:00:03 Ham Pizza -$12.00 Sale 1/2/98 12:00:04 Sausage Pizza $11.00 Pizza sales/return by day, hour, seconds

Merging Data Sale 1/2/98 12:00:01 Ham Pizza $10.00 Sale 1/2/98 12:00:02 Cheese Pizza $15.00 Sale 1/2/98 12:00:02 Anchovy Pizza $12.00 Return 1/2/98 12:00:03 Ham Pizza -$12.00 Sale 1/2/98 12:00:01 Ham Pizza $10.00 Sale 1/2/98 12:00:02 Cheese Pizza $10.00 Sale 1/2/98 12:00:04 Sausage Pizza $11.00

Adding a Date Stamp Enables time analysis Label loaded data with a date stamp Add time to fact and dimension data

Adding a Date Stamp Sales Fact Table Item_id Store_id Time_key Sales_dollars Sales_units Store Table Store_id District_id Time_key Item_Table Item_id Dept_id Time_key Time Table Week_id Period_id Year_id Time_key Product Table Product_id Time_key Product_desc

Adding a Date Stamp Fact table - Add triggers - Recode applications - Compare tables Dimension table Time representation - Point in time - Time span

Adding Keys to Data #1 Sale 1/2/98 12:00:01 Ham Pizza $10.00 #2 Sale 1/2/98 12:00:02 Cheese Pizza $15.00 #3 Sale 1/2/98 12:00:02 Anchovy Pizza $12.00 #4 Sale 1/2/98 12:00:03 Ham Pizza -$12.00 #dw1 Sale 1/2/98 12:00:01 Ham Pizza $10.00 #dw2 Sale 1/2/98 12:00:02 Cheese Pizza $10.00 #dw3 Sale 1/2/98 12:00:04 Sausage Pizza $11.00 #5 Sale 1/2/98 12:00:04 Sausage Pizza $11.00 Data values or artificial keys

Summarizing Data During extraction on staging area After loading onto the warehouse server Operational databases Staging area Warehouse database

Maintaining Transformation Metadata Contains transformation rules, algorithms, and routines Sources Stages Rules Publish Extract Transform Load Query

Transformation Timing and Location Transformation is performed: - Before load - In parallel May be initiated at different points UnlikelyProbablePossible

Choosing a Transformation Point * Workload * Network bandwidth * Environment * Parallel execution * CPU use * Load window time * Disk space * User information needs

Monitoring and Tracking Transformations should: Be self-documenting Provides summary statistics Handle process exceptions

Designing Transformation Processes Analysis: - Sources and target mappings, business rules - Key users, metadata, grain Design options: PL/SQL, replication, custom, third-party tools Design issues: - Performance - Size of the staging area - Exception handling, integrity maintenance

Transformation Tools Purchased SQL*Loader In-house developed

Data Management, Quality, and Auditing Tools Data management: - Innovative Systems - Postalsoft - Vality Technology Data quality and auditing: - Innovative Systems - Vality Technology

Summary This lesson discussed the following topics: Importance of data quality Transformation processes Data transformation issuess Data anomalies Name and address management Tools