Copyright  Oracle Corporation, 1999. All rights reserved. 1010 Building the Warehouse.

Slides:



Advertisements
Similar presentations
Business Information Warehouse Business Information Warehouse.
Advertisements

C6 Databases.
Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Management Information Systems, Sixth Edition
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
Designing the data warehouse / data marts Part 2.
Defining Data Warehouse Concepts and Terminology
Designing the Data Warehouse and Data Mart Methodologies and Techniques.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Data Staging Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of.
Components and Architecture CS 543 – Data Warehousing.
Data Warehouse success depends on metadata
Data Warehouse Components
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
5 Copyright © 2009, Oracle. All rights reserved. Defining ETL Mappings for Staging Data.
Leaving a Metadata Trail Chapter 14. Defining Warehouse Metadata Data about warehouse data and processing Vital to the warehouse Used by everyone Metadata.
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
ETL The process of updating the data warehouse.. Recent Developments in Data Warehousing: A Tutorial Hugh J. Watson Terry College of Business University.
ETL By Dr. Gabriel.
Basic Concepts of Datawarehousing An Overview Prasanth Gurram.
Data Warehousing: Tools & Technologies by: Er. Manu Bansal Assistant Professor Deptt of IT
L/O/G/O Metadata Business Intelligence Erwin Moeyaert.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Understanding Data Warehousing
Data Warehouse Chapter 11. Multiple Files Problem Added complexity of multiple source files Start simple Multiple Source files Extracted data Logic to.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Zhangxi Lin Texas Tech University ISQS 6339, Data Management & Business Intelligence 1 ISQS 6339, Data Management & Business Intelligence Extraction, Transformation,
1-1 System Development Process System development process – a set of activities, methods, best practices, deliverables, and automated tools that stakeholders.
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Transportation: Loading Warehouse Data Chapter 12.
3 Copyright © 2009, Oracle. All rights reserved. Accessing Non-Oracle Sources.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 2 Information System Building Blocks.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Transportation: Refreshing Warehouse Data Chapter 13.
7 Strategies for Extracting, Transforming, and Loading.
9 Copyright © 2009, Oracle. All rights reserved. Deploying and Reporting on ETL Jobs.
3 Copyright © 2009, Oracle. All rights reserved. Understanding the Warehouse Builder Architecture.
1 Copyright © 2005, Oracle. All rights reserved. Following a Tuning Methodology.
6-1 Copyright © 2013 Pearson Canada Inc. Databases and Information Management CHAPTER SIX.
© 2009 Wipro Ltd - Confidential ETL TESTING Handling Heterogeneous Data Formats Rajasimman Selvaraj Simanchal Sahu Tithi Mukherjee.
MIS 451 Building Business Intelligence Systems Data Staging.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to Essbase.
C Copyright © 2006, Oracle. All rights reserved. Integrating with Oracle Streams.
6 Copyright © 2006, Oracle. All rights reserved. The ETL Process: Transforming Data.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
11 Copyright © 2004, Oracle. All rights reserved. Performing a Migration Using Oracle Migration Workbench (Part II)
C Copyright © 2007, Oracle. All rights reserved. Introduction to Data Warehousing Fundamentals.
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.
1 Copyright © 2007, Oracle. All rights reserved. Installing and Setting Up the Warehouse Builder Environment.
Copyright  Oracle Corporation, All rights reserved Transforming Data.
9 Copyright © 2004, Oracle. All rights reserved. Getting Started with Oracle Migration Workbench.
2 Copyright © 2008, Oracle. All rights reserved. Building the Physical Layer of a Repository.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Plan for Populating a DW
Defining Data Warehouse Concepts and Terminology
Overview of MDM Site Hub
Introduction.
Defining Data Warehouse Concepts and Terminology
Introduction to Essbase
THE ARCHITECTURAL COMPONENTS
Data Warehouse.
Presentation transcript:

Copyright  Oracle Corporation, All rights reserved Building the Warehouse

10-2 Copyright  Oracle Corporation, All rights reserved. Overview Project Management (Methodology, Maintaining Metadata) Defining DW Concepts & Terminology Planning for a Successful Warehouse Analyzing User Query Needs Choosing a Computing Architecture Modeling the Data Warehouse Planning Warehouse Storage ETT (Building the Warehouse) Meeting a Business Need Supporting End User Access Managing the Data Warehouse

10-3 Copyright  Oracle Corporation, All rights reserved. Objectives After completing this lesson, you should be able to do the following: Outline the extraction, transformation, and transportation processes for building a data warehouse Identify extraction issues Explain how to examine data sources Identify extraction techniques List tools that can be used to extract data from sources After completing this lesson, you should be able to do the following: Outline the extraction, transformation, and transportation processes for building a data warehouse Identify extraction issues Explain how to examine data sources Identify extraction techniques List tools that can be used to extract data from sources

10-4 Copyright  Oracle Corporation, All rights reserved. Extraction/Transformation/Transportation Processes (ETT) Extract source data Transform/clean data Index and summarize Extract source data Transform/clean data Index and summarize Load data into WH Detect changes Refresh data Load data into WH Detect changes Refresh data Programs Tools ETT Operational systems Warehouse Browser: Hollywood X + Customers: a recorof as X + Customers: Browser: Hollywood Browser: Hollywood X + Gateways

10-5 Copyright  Oracle Corporation, All rights reserved. ETT Processes Must result in data that is relevant, useful, high- quality, accurate, and accessible Require a large proportion of warehouse development time and resources Must result in data that is relevant, useful, high- quality, accurate, and accessible Require a large proportion of warehouse development time and resources Warehouse Operational systems Relevant Clean up Consolidate Restructure ETT Useful Quality Accurate Accessible

10-6 Copyright  Oracle Corporation, All rights reserved. Data Staging Area The construction site for the warehouse Required by most implementations Composed of ODS, flat files, or relational server tables Frequently configured as multitier staging The construction site for the warehouse Required by most implementations Composed of ODS, flat files, or relational server tables Frequently configured as multitier staging Extract Transform Operational system Transport (Load) Warehouse Data staging area

10-7 Copyright  Oracle Corporation, All rights reserved. Remote Staging Model Data staging area within the warehouse environment Extract, transform, transport Transform Operational system Transport (Load) Data staging area Warehouse Warehouse environment Oper. envt. Data staging area in its own environment, avoiding negative impact on the warehouse environment Extract, transform, transport Transform Operational system Transport (Load) Data staging area Warehouse Staging envt. Oper. envt. Warehouse envt.

10-8 Copyright  Oracle Corporation, All rights reserved. Onsite Staging Model Extract Transform Operational system Transport (Load) Data staging area Warehouse Operational environment WH envt. Data staging area within the operational environment, possibly affecting the operational system Data staging area within the operational environment, possibly affecting the operational system

10-9 Copyright  Oracle Corporation, All rights reserved. Extracting Data Routines developed to select fields from source Various data formats Rules, audit trails, error correction facilities Routines developed to select fields from source Various data formats Rules, audit trails, error correction facilities Transform Operationaldatabases Data staging area Warehousedatabase Browser: Hollywood X + Customers: a recorof as X + Customers: Browser: Hollywood Browser: Hollywood X + Data mapping

10-10 Copyright  Oracle Corporation, All rights reserved. Source Systems Production Archive Internal External Production Archive Internal External Browser: Hollywood X + Customers: Browser: Hollywood X + a recorof as X + Customers: Browser: Hollywood % 110% 230% 200% -10% ABC CO GMBH LTD GBUK INC FFR ASSOC MCD CO

10-11 Copyright  Oracle Corporation, All rights reserved. Operating system platforms Hardware platforms File systems Database systems and vertical applications Operating system platforms Hardware platforms File systems Database systems and vertical applications Production Data IMSDB2VSAM NonStop SQL OracleSybaseRdbSAP Shared Medical Systems Dun and Bradstreet Financials Hogan Financials Oracle Financials Browser: Hollywood X + Customers: a recorof as X + Customers: Browser: Hollywood Browser: Hollywood X +

10-12 Copyright  Oracle Corporation, All rights reserved. Historical data Useful for analysis over long periods of time Useful for first-time load May require unique transformations Historical data Useful for analysis over long periods of time Useful for first-time load May require unique transformations Archive Data Operationaldatabases Warehousedatabase

10-13 Copyright  Oracle Corporation, All rights reserved. Internal Data Planning, sales, and marketing organization data Maintained by: –Spreadsheets (structured) –Documents (unstructured) Treated like any other source data Planning, sales, and marketing organization data Maintained by: –Spreadsheets (structured) –Documents (unstructured) Treated like any other source data PlanningMarketingAccounting % 110% 230% 200% -10% ABC CO GMBH LTD GBUK INC FFR ASSOC MCD CO Warehousedatabase % 110% 230% 200% -10% ABC CO GMBH LTD GBUK INC FFR ASSOC MCD CO % 110% 230% 200% -10% ABC CO GMBH LTD GBUK INC FFR ASSOC MCD CO

10-14 Copyright  Oracle Corporation, All rights reserved. Information from outside the organization Issues of frequency, format, and predictability Described and tracked using metadata Information from outside the organization Issues of frequency, format, and predictability Described and tracked using metadata External Data Barron's Dun and Bradstreet Purchaseddatabases Wall Street Journal Economicforecasts Competitiveinformation Warehousingdatabases A.C. Nielsen, IRI, IMS, Walsh America

10-15 Copyright  Oracle Corporation, All rights reserved. Mapping Defines which operational attributes to use Defines how to transform the attributes for the warehouse Defines where the attributes exist in the warehouse Mapping tools are available Defines which operational attributes to use Defines how to transform the attributes for the warehouse Defines where the attributes exist in the warehouse Mapping tools are available File A F1123 F2Bloggs F310/12/56 Staging File One NumberUSA123 NameMr. Bloggs DOB10-Dec-56 Metadata File AStaging File One F1Number F2Name F3DOB

10-16 Copyright  Oracle Corporation, All rights reserved. Programs: C, COBOL, PL/SQL Gateways: transparent database access In-house development is popular Tools –High initial cost –Ongoing automation –Data cleanup Programs: C, COBOL, PL/SQL Gateways: transparent database access In-house development is popular Tools –High initial cost –Ongoing automation –Data cleanup Extraction Techniques

10-17 Copyright  Oracle Corporation, All rights reserved. Sources and Targets OLAP Data marts Data analysis Data mining SourcesODSWarehouseAccess

10-18 Copyright  Oracle Corporation, All rights reserved. Designing Extraction Processes Analysis: –Sources, technologies –Data types, quality, owners Design options: –Manual, custom, gateway, third-party –Replication, full, or delta refresh Design issues: –Batch window, volumes, data currency –Automation, skills needed, resources Analysis: –Sources, technologies –Data types, quality, owners Design options: –Manual, custom, gateway, third-party –Replication, full, or delta refresh Design issues: –Batch window, volumes, data currency –Automation, skills needed, resources

10-19 Copyright  Oracle Corporation, All rights reserved. Maintaining Extraction Metadata Source location, type, structure Access method Privilege information Temporary storage Failure procedures Validity checks Handlers for missing data Source location, type, structure Access method Privilege information Temporary storage Failure procedures Validity checks Handlers for missing data

10-20 Copyright  Oracle Corporation, All rights reserved. Possible ETT Failures A missing source file A system failure Inadequate metadata Poor mapping information Inadequate storage planning A source structural change No contingency plan Inadequate data validation A missing source file A system failure Inadequate metadata Poor mapping information Inadequate storage planning A source structural change No contingency plan Inadequate data validation

10-21 Copyright  Oracle Corporation, All rights reserved. Maintaining ETT Quality ETT must be: –Tested –Documented –Monitored and reviewed Disparate metadata must be coordinated ETT must be: –Tested –Documented –Monitored and reviewed Disparate metadata must be coordinated

10-22 Copyright  Oracle Corporation, All rights reserved. Extraction Tools Mapping information Update metadata JCL files Map Source Data to Intermediate File Store Sales and Marketing Customer Name Char Varchar 20 Unique name

10-23 Copyright  Oracle Corporation, All rights reserved. Base functionality Base functionality Interface features Interface features Metadata repository Metadata repository Open API Open API Metadata access Metadata access Repository utilities Repository utilities Input and output processing Input and output processing Cleansing, reformatting, and auditing Cleansing, reformatting, and auditing References References Training requirements Training requirements Base functionality Base functionality Interface features Interface features Metadata repository Metadata repository Open API Open API Metadata access Metadata access Repository utilities Repository utilities Input and output processing Input and output processing Cleansing, reformatting, and auditing Cleansing, reformatting, and auditing References References Training requirements Training requirements Selection Criteria

10-24 Copyright  Oracle Corporation, All rights reserved. WTI Partner ETT Tools Carleton Constellar Evolutionary Technologies Informatica Information Builders OracleEDMS, Toolkits, OADW Prism Solutions Sagent Vality Technology Carleton Constellar Evolutionary Technologies Informatica Information Builders OracleEDMS, Toolkits, OADW Prism Solutions Sagent Vality Technology

10-25 Copyright  Oracle Corporation, All rights reserved. Summary This lesson discussed the following topics: ETT processes are essential and consume a large proportion of warehouse resources and time The extraction process acquires source data You may encounter many data sources There are many data extraction issues ETT Tools should be considered This lesson discussed the following topics: ETT processes are essential and consume a large proportion of warehouse resources and time The extraction process acquires source data You may encounter many data sources There are many data extraction issues ETT Tools should be considered

10-26 Copyright  Oracle Corporation, All rights reserved. Practice 10-1 Overview This practice covers the following topics: Answering a series of short questions Specifying true or false to a series of statements This practice covers the following topics: Answering a series of short questions Specifying true or false to a series of statements