6 Copyright © 2006, Oracle. All rights reserved. The ETL Process: Transforming Data.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

Introduction to OWB(Oracle Warehouse Builder)
BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.
Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
Accounting System Design
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Designing the Data Warehouse and Data Mart Methodologies and Techniques.
Components and Architecture CS 543 – Data Warehousing.
Data Warehouse success depends on metadata
Concepts of Database Management Sixth Edition
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
5 Copyright © 2009, Oracle. All rights reserved. Defining ETL Mappings for Staging Data.
Leaving a Metadata Trail Chapter 14. Defining Warehouse Metadata Data about warehouse data and processing Vital to the warehouse Used by everyone Metadata.
ETL By Dr. Gabriel.
L/O/G/O Metadata Business Intelligence Erwin Moeyaert.
Database Systems – Data Warehousing
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
Data Warehouse Chapter 11. Multiple Files Problem Added complexity of multiple source files Start simple Multiple Source files Extracted data Logic to.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Zhangxi Lin Texas Tech University ISQS 6339, Data Management & Business Intelligence 1 ISQS 6339, Data Management & Business Intelligence Extraction, Transformation,
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
I Copyright © 2004, Oracle. All rights reserved. Introduction.
Objectives Overview Define the term, database, and explain how a database interacts with data and information Define the term, data integrity, and describe.
Activity Running Time DurationIntro0 2 min Setup scenario 2 2 min SQL BI components & concepts 4 5 min Data input (Let’s go shopping) 9 7 min Whiteboard.
Session 4: The HANA Curriculum and Demos Dr. Bjarne Berg Associate professor Computer Science Lenoir-Rhyne University.
I Copyright © Oracle Corporation, All rights reserved. Introduction.
I Copyright © 2004, Oracle. All rights reserved. Introduction Copyright © 2004, Oracle. All rights reserved.
© 2007 by Prentice Hall 1 Introduction to databases.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
Data Management Console Synonym Editor
1 Copyright © 2004, Oracle. All rights reserved. Introduction.
Soup-2-Nuts Alaska Department of Fish & Game Commercial Fisheries October, 2011.
Oracle Data Integrator Transformations: Adding More Complexity
6 Extraction, Transformation, and Loading (ETL) Transformation.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Data Warehousing. Databases support: Transaction Processing Systems –operational level decision –recording of transactions Decision Support Systems –tactical.
Databases Shortfalls of file management systems Structure of a database Database administration Database Management system Hierarchical Databases Network.
Transportation: Loading Warehouse Data Chapter 12.
Data Management for Decision Support Session-3 Prof. Bharat Bhasker.
3 Copyright © 2009, Oracle. All rights reserved. Accessing Non-Oracle Sources.
Advanced Accounting Information Systems Day 10 answers Organizing and Manipulating Data September 16, 2009.
D Copyright © Oracle Corporation, All rights reserved. Loading Data into a Database.
Transportation: Refreshing Warehouse Data Chapter 13.
6 Copyright © 2009, Oracle. All rights reserved. Using the Data Transformation Operators.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
7 Strategies for Extracting, Transforming, and Loading.
3 Copyright © 2009, Oracle. All rights reserved. Understanding the Warehouse Builder Architecture.
1 Copyright © 2005, Oracle. All rights reserved. Following a Tuning Methodology.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Chapter 5-1. Chapter 5-2 Chapter 5: Organizing and Manipulating the Data in Databases Introduction Normalization Validating the Data in Databases Extracting.
5 Copyright © 2008, Oracle. All rights reserved. Testing and Validating a Repository.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
 CONACT UC:  Magnific training   
C Copyright © 2007, Oracle. All rights reserved. Introduction to Data Warehousing Fundamentals.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.
Copyright  Oracle Corporation, All rights reserved Building the Warehouse.
Copyright  Oracle Corporation, All rights reserved Transforming Data.
9 Copyright © 2006, Oracle. All rights reserved. Summary Management.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Plan for Populating a DW
Defining Data Warehouse Concepts and Terminology
Introduction.
Defining Data Warehouse Concepts and Terminology
Data Warehousing Concepts
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

6 Copyright © 2006, Oracle. All rights reserved. The ETL Process: Transforming Data

Copyright © 2006, Oracle. All rights reserved Objectives After completing this lesson, you should be able to do the following: Define transformation Identify possible staging models Identify data anomalies and eliminate them Explain the importance of quality data Describe techniques for transforming data Design transformation process List Oracle’s enhanced features and tools that can be used to transform data

Copyright © 2006, Oracle. All rights reserved Transformation Transformation eliminates anomalies from operational data: Cleans and standardizes Presents subject-oriented data Extract Warehouse Load Operational systems Data staging area Transform: Clean up Consolidate Restructure

Copyright © 2006, Oracle. All rights reserved Possible Staging Models Remote staging model Onsite staging model

Copyright © 2006, Oracle. All rights reserved Remote Staging Model Data staging area within the warehouse environment Data staging area in its own environment Load Warehouse Load Warehouse Operational system Extract Operational system Extract Transform Staging area Transform Staging area

Copyright © 2006, Oracle. All rights reserved Onsite Staging Model Data staging area within the operational environment, possibly affecting the operational system ExtractLoad Warehouse Operational system Transform Staging area

Copyright © 2006, Oracle. All rights reserved Data Anomalies No unique key Data naming and coding anomalies Data meaning anomalies between groups Spelling and text inconsistencies CUSNUMNAMEADDRESS Oracle Limited100 N.E. 1st St Oracle Computing15 Main Road, Ft. Lauderdale Oracle Corp. UK 15 Main Road, Ft. Lauderdale, FLA Oracle Corp UK Ltd181 North Street, Key West, FLA

Copyright © 2006, Oracle. All rights reserved Transformation Routines Cleaning data Eliminating inconsistencies Adding elements Merging data Integrating data Transforming data before load

Copyright © 2006, Oracle. All rights reserved Transforming Data: Problems and Solutions Multipart keys Multiple local standards Multiple files Missing values Duplicate values Element names Element meanings Input formats Referential integrity constraints Name and address

Copyright © 2006, Oracle. All rights reserved Multipart Keys Problem Multipart keys Country code Sales territory Product number Salesperson code Product code = 12 M

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Multiple Local Standards Problem Multiple local standards Tools or filters to preprocess cm inches cmUSD 600 1,000 GBP FF 9,990 DD/MM/YY MM/DD/YY DD-Mon-YY

Copyright © 2006, Oracle. All rights reserved Multiple Files Problem Added complexity of multiple source files Start simple Transformed data Multiple source files Logic to detect correct source

Copyright © 2006, Oracle. All rights reserved Missing Values Problem Solution: Ignore Wait Mark rows Extract when timestamped If NULL, then field = “ A ” A

Copyright © 2006, Oracle. All rights reserved Duplicate Values Problem Solution: SQL self-join techniques RDBMS constraints SQL> SELECT... 2 FROM table_a, table_b 3 WHERE table_a.key (+)= table_b.key 4 UNION 5 SELECT... 6 FROM table_a, table_b 7 WHERE table_a.key = table_b.key (+);

Copyright © 2006, Oracle. All rights reserved Element Names Problem Solution: Common naming conventions Customer Client Contact Name

Copyright © 2006, Oracle. All rights reserved Element Meaning Problem Avoid misinterpretation Complex solution Document meaning in metadata Customer’s name Customer_detail All customer details All details except name

Copyright © 2006, Oracle. All rights reserved Input Format Problem ASCIIEBCDIC “ ” ACME Co. áøåëéí äáàéíBeer (Pack of 8)

Copyright © 2006, Oracle. All rights reserved Referential Integrity Problem Solution: SQL antijoin Server constraints Dedicated tools Department EmpNameDepartment 1099Smith Jones Doe Harris60

Copyright © 2006, Oracle. All rights reserved Name and Address Problem Single-field format Multiple-field format Mr. J. Smith,100 Main St., Bigtown, County Luth, Database 1 NAMELOCATION DIANNE ZIEFELDN100 HARRY H. ENFIELDM300 Database 2 NAMELOCATION ZIEFELD, DIANNE100 ENFIELD, HARRY H300 NameMr. J. Smith Street100 Main St. TownBigtown CountryCounty Luth Code23565

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Name and Address Processing in Oracle Warehouse Builder Name and address mapping operator supports: Parsing Standardization Postal matching and geocoding

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Quality Data: Importance and Benefits Quality data: –Key to a successful warehouse implementation Quality data helps you in: –Targeting right customers –Determining buying patterns –Identifying householders: private and commercial –Matching customers –Identifying historical data

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Quality: Standards and Improvements Setting standards: –Define a quality strategy. –Decide on optimal data-quality level. Improving operational data quality: –Consider modifying rules for operational data. –Document the sources. –Create a data stewardship program. –Design the cleanup process carefully. –Initial cleanup and refresh routines may differ.

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Data Quality Guidelines Operational data: Should not be used directly in the warehouse Must be cleaned for each increment Is not fixed by modifying applications

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Data Quality: Solutions and Management Solutions: COBOL, Java, 4GL Specialized tools Customized data conversion process: –Investigation –Conditioning and standardization –Integration Management: Take responsibility. Resolve problems. Appoint a data quality manager.

Copyright © 2006, Oracle. All rights reserved Transformation Techniques Merging data Adding a date stamp Adding keys to data

Copyright © 2006, Oracle. All rights reserved Merging Data Operational transactions do not usually map one-to-one with warehouse data. Data for the warehouse is merged to provide information for analysis. Sale1/2/0212:00:02 Cheese Pizza $15.00 Sale1/2/02 12:00:04 Sausage Pizza $11.00 Return1/2/02 12:00:03 Anchovy Pizza – $12.00 Sale1/2/02 12:00:02 Anchovy Pizza $12.00 Sale1/2/0212:00:01Ham Pizza $10.00 Pizza sales/returns by day, hour, seconds

Copyright © 2006, Oracle. All rights reserved Merging Data Pizza sales Sale1/2/0212:00:01Ham Pizza $10.00 Sale1/2/0212:00:02 Cheese Pizza $15.00 Sale1/2/02 12:00:04 Sausage Pizza $11.00 Pizza sales/returns by day, hour, seconds Sale1/2/0212:00:01Ham Pizza $10.00 Sale1/2/0212:00:02 Cheese Pizza $15.00 Sale1/2/02 12:00:02 Anchovy Pizza $12.00 Return1/2/02 12:00:03 Anchovy Pizza – $12.00 Sale1/2/02 12:00:04 Sausage Pizza $11.00

Copyright © 2006, Oracle. All rights reserved Adding a Date Stamp Time element can be represented as a: –Single point in time –Time span Add time element to: –Fact tables –Dimension data

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Adding a Date Stamp: Fact Tables and Dimensions ChannelsTable Channel_id Channel_name Time_key Customers Table Cust_id Cust_first_name Time_key Sales Item_id Store_id Time_key Sales_dollars Sales_units Times Table Week_id Period_id Year_id Time_key Products Table Product_id Time_key Product_desc

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Adding Keys to Data #1Sale1/2/98 12:00:01 Ham Pizza $10.00 #2Sale1/2/98 12:00:02 Cheese Pizza $15.00 #3Sale1/2/98 12:00:02 Anchovy Pizza $12.00 #5Sale1/2/98 12:00:04 Sausage Pizza $11.00 #4Return1/2/98 12:00:03 Anchovy Pizza – $12.00 #dw1Sale1/2/98 12:00:01 Ham Pizza $10.00 #dw2Sale1/2/98 12:00:02 Cheese Pizza $15.00 #dw3Sale1/2/98 12:00:04 Sausage Pizza $11.00 Data values or artificial keys

Copyright © 2006, Oracle. All rights reserved Summarizing Data 1.During extraction on staging area 2.After loading to the warehouse server Operational databases Warehouse database Staging area

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Maintaining Transformation Metadata Transformation metadata contains: Transformation rules Algorithms and routines Sources Extract Stage Transform Rules Load Publish Query

Copyright © 2006, Oracle. All rights reserved Maintaining Transformation Metadata Restructure keys. Identify and resolve coding differences. Validate data from multiple sources. Handle exception rules. Identify and resolve format differences. Fix referential integrity inconsistencies. Identify summary data.

Copyright © 2006, Oracle. All rights reserved Data Ownership and Responsibilities Data ownership and responsibilities should be shared by the: –Operational team –Data warehouse team Business benefit gained with the “work together” approach

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Transformation Timing and Location Transformation is performed: –Before load –In parallel Can be initiated at different points: –On the operational platform –In a separate staging area

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Choosing a Transformation Point Workload Impact on environment CPU usage Disk space Network bandwidth Parallel execution Load window time User information needs

Copyright © 2006, Oracle. All rights reserved Monitoring and Tracking Transformations should: Be self-documenting Provide summary statistics Handle process exceptions

Copyright © 2006, Oracle. All rights reserved Designing Transformation Processes Analysis: –Sources and target mappings, business rules –Key users, metadata, grain Design options: –Tools (OWB) –Custom 3GL programs –4GLs such as SQL or PL/SQL –Replication Design issues: –Performance –Size of the staging area –Exception handling, integrity maintenance

Copyright © 2006, Oracle. All rights reserved Transformation Tools SQL*Loader Oracle Warehouse Builder(OWB) supports –Predefined transformations –Custom transformations

Copyright © 2006, Oracle. All rights reserved Oracle’s Enhanced Features for Transformation Transformation methods: Staging table 1 Staging table 2 Flat files Load into staging tables. Merge into warehouse tables. Multistage transformation Transform data. Validate data. Data warehouse

Copyright © 2006, Oracle. All rights reserved Oracle’s Enhanced Features for Transformation Transformation methods: Pipelined transformation External tables Flat files External table Table functions Transform data. Validate data. Merge into warehouse tables. Warehouse tables

Copyright © 2006, Oracle. All rights reserved Existing row updated New row inserted Oracle’s Enhanced Features for Transformation Transformation mechanisms using SQL: CREATE TABLES AS SELECT (CTAS) UPDATE MERGE Multitable INSERT CustCustomer MERGE

Copyright © 2006, Oracle. All rights reserved Application of the MERGE Statement in Data Warehousing An example: MERGE INTO customers C USING cust_src S ON (c.cust_id = s.src_cust_id) WHEN MATCHED THEN UPDATE SET c.cust_address = s.cust_address WHEN NOT MATCHED THEN INSERT ( cust_id, cust_first_name,…) VALUES (src_cust_id, src_first_name,…);

Copyright © 2006, Oracle. All rights reserved Multitable INSERT Statements Types: Unconditional INSERT Pivoting INSERT Conditional ALL INSERT Conditional FIRST INSERT Source table Condition Target table 1 Target table 2 Target table 3

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Advantages of Multitable INSERTs Eliminates the need for multiple INSERT…AS SELECT statements to populate multiple tables Eliminates the need for a procedure to perform multiple INSERTs using IF…THEN…ELSE syntax Significant performance improvement over the preceding two methods due to the elimination of the cost of repeated scans on the source data

Copyright © 2006, Oracle. All rights reserved Oracle’s Enhanced Features for Transformation Transformation mechanisms Using PL/SQL: –Used for complex transformations Using table functions. Table functions can: –Return multiple rows from a function –Accept results of multiple row SQL subqueries as input –Take cursors as input –Be parallelized –Support incremental pipelining

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved Advantages of PL/SQL Table Functions Table functions “pipeline” the results to the consuming process as soon as they are produced. Table functions can return multiple rows during each invocation (pipelining of data). Pipelining eliminates the need for buffering the produced rows.

Copyright © 2006, Oracle. All rights reserved Summary In this lesson, you should have learned how to: Define transformation Identify possible staging models Identify data anomalies and eliminate them Explain the importance of quality data Describe techniques for transforming data Design transformation process Describe Oracle’s enhanced features and tools that can be used to transform data

Copyright © 2006, Oracle. All rights reserved Practice 6-1: Overview This practice covers the following topics: Identifying the suitable staging model for RISD data warehouse Identifying the problems, and the best suited transformation techniques for the RISD data based on the given scenario Exploring the viewlet based demonstrations on ETL features of Oracle Warehouse Builder

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved

Copyright © 2006, Oracle. All rights reserved