Copyright Oracle Corporation, All rights reserved Building the Warehouse
10-2 Copyright Oracle Corporation, All rights reserved. Overview Project Management (Methodology, Maintaining Metadata) Defining DW Concepts & Terminology Planning for a Successful Warehouse Analyzing User Query Needs Choosing a Computing Architecture Modeling the Data Warehouse Planning Warehouse Storage ETT (Building the Warehouse) Meeting a Business Need Supporting End User Access Managing the Data Warehouse
10-3 Copyright Oracle Corporation, All rights reserved. Objectives After completing this lesson, you should be able to do the following: Outline the extraction, transformation, and transportation processes for building a data warehouse Identify extraction issues Explain how to examine data sources Identify extraction techniques List tools that can be used to extract data from sources After completing this lesson, you should be able to do the following: Outline the extraction, transformation, and transportation processes for building a data warehouse Identify extraction issues Explain how to examine data sources Identify extraction techniques List tools that can be used to extract data from sources
10-4 Copyright Oracle Corporation, All rights reserved. Extraction/Transformation/Transportation Processes (ETT) Extract source data Transform/clean data Index and summarize Extract source data Transform/clean data Index and summarize Load data into WH Detect changes Refresh data Load data into WH Detect changes Refresh data Programs Tools ETT Operational systems Warehouse Browser: Hollywood X + Customers: a recorof as X + Customers: Browser: Hollywood Browser: Hollywood X + Gateways
10-5 Copyright Oracle Corporation, All rights reserved. ETT Processes Must result in data that is relevant, useful, high- quality, accurate, and accessible Require a large proportion of warehouse development time and resources Must result in data that is relevant, useful, high- quality, accurate, and accessible Require a large proportion of warehouse development time and resources Warehouse Operational systems Relevant Clean up Consolidate Restructure ETT Useful Quality Accurate Accessible
10-6 Copyright Oracle Corporation, All rights reserved. Data Staging Area The construction site for the warehouse Required by most implementations Composed of ODS, flat files, or relational server tables Frequently configured as multitier staging The construction site for the warehouse Required by most implementations Composed of ODS, flat files, or relational server tables Frequently configured as multitier staging Extract Transform Operational system Transport (Load) Warehouse Data staging area
10-7 Copyright Oracle Corporation, All rights reserved. Remote Staging Model Data staging area within the warehouse environment Extract, transform, transport Transform Operational system Transport (Load) Data staging area Warehouse Warehouse environment Oper. envt. Data staging area in its own environment, avoiding negative impact on the warehouse environment Extract, transform, transport Transform Operational system Transport (Load) Data staging area Warehouse Staging envt. Oper. envt. Warehouse envt.
10-8 Copyright Oracle Corporation, All rights reserved. Onsite Staging Model Extract Transform Operational system Transport (Load) Data staging area Warehouse Operational environment WH envt. Data staging area within the operational environment, possibly affecting the operational system Data staging area within the operational environment, possibly affecting the operational system
10-9 Copyright Oracle Corporation, All rights reserved. Extracting Data Routines developed to select fields from source Various data formats Rules, audit trails, error correction facilities Routines developed to select fields from source Various data formats Rules, audit trails, error correction facilities Transform Operationaldatabases Data staging area Warehousedatabase Browser: Hollywood X + Customers: a recorof as X + Customers: Browser: Hollywood Browser: Hollywood X + Data mapping
10-10 Copyright Oracle Corporation, All rights reserved. Source Systems Production Archive Internal External Production Archive Internal External Browser: Hollywood X + Customers: Browser: Hollywood X + a recorof as X + Customers: Browser: Hollywood % 110% 230% 200% -10% ABC CO GMBH LTD GBUK INC FFR ASSOC MCD CO
10-11 Copyright Oracle Corporation, All rights reserved. Operating system platforms Hardware platforms File systems Database systems and vertical applications Operating system platforms Hardware platforms File systems Database systems and vertical applications Production Data IMSDB2VSAM NonStop SQL OracleSybaseRdbSAP Shared Medical Systems Dun and Bradstreet Financials Hogan Financials Oracle Financials Browser: Hollywood X + Customers: a recorof as X + Customers: Browser: Hollywood Browser: Hollywood X +
10-12 Copyright Oracle Corporation, All rights reserved. Historical data Useful for analysis over long periods of time Useful for first-time load May require unique transformations Historical data Useful for analysis over long periods of time Useful for first-time load May require unique transformations Archive Data Operationaldatabases Warehousedatabase
10-13 Copyright Oracle Corporation, All rights reserved. Internal Data Planning, sales, and marketing organization data Maintained by: –Spreadsheets (structured) –Documents (unstructured) Treated like any other source data Planning, sales, and marketing organization data Maintained by: –Spreadsheets (structured) –Documents (unstructured) Treated like any other source data PlanningMarketingAccounting % 110% 230% 200% -10% ABC CO GMBH LTD GBUK INC FFR ASSOC MCD CO Warehousedatabase % 110% 230% 200% -10% ABC CO GMBH LTD GBUK INC FFR ASSOC MCD CO % 110% 230% 200% -10% ABC CO GMBH LTD GBUK INC FFR ASSOC MCD CO
10-14 Copyright Oracle Corporation, All rights reserved. Information from outside the organization Issues of frequency, format, and predictability Described and tracked using metadata Information from outside the organization Issues of frequency, format, and predictability Described and tracked using metadata External Data Barron's Dun and Bradstreet Purchaseddatabases Wall Street Journal Economicforecasts Competitiveinformation Warehousingdatabases A.C. Nielsen, IRI, IMS, Walsh America
10-15 Copyright Oracle Corporation, All rights reserved. Mapping Defines which operational attributes to use Defines how to transform the attributes for the warehouse Defines where the attributes exist in the warehouse Mapping tools are available Defines which operational attributes to use Defines how to transform the attributes for the warehouse Defines where the attributes exist in the warehouse Mapping tools are available File A F1123 F2Bloggs F310/12/56 Staging File One NumberUSA123 NameMr. Bloggs DOB10-Dec-56 Metadata File AStaging File One F1Number F2Name F3DOB
10-16 Copyright Oracle Corporation, All rights reserved. Programs: C, COBOL, PL/SQL Gateways: transparent database access In-house development is popular Tools –High initial cost –Ongoing automation –Data cleanup Programs: C, COBOL, PL/SQL Gateways: transparent database access In-house development is popular Tools –High initial cost –Ongoing automation –Data cleanup Extraction Techniques
10-17 Copyright Oracle Corporation, All rights reserved. Sources and Targets OLAP Data marts Data analysis Data mining SourcesODSWarehouseAccess
10-18 Copyright Oracle Corporation, All rights reserved. Designing Extraction Processes Analysis: –Sources, technologies –Data types, quality, owners Design options: –Manual, custom, gateway, third-party –Replication, full, or delta refresh Design issues: –Batch window, volumes, data currency –Automation, skills needed, resources Analysis: –Sources, technologies –Data types, quality, owners Design options: –Manual, custom, gateway, third-party –Replication, full, or delta refresh Design issues: –Batch window, volumes, data currency –Automation, skills needed, resources
10-19 Copyright Oracle Corporation, All rights reserved. Maintaining Extraction Metadata Source location, type, structure Access method Privilege information Temporary storage Failure procedures Validity checks Handlers for missing data Source location, type, structure Access method Privilege information Temporary storage Failure procedures Validity checks Handlers for missing data
10-20 Copyright Oracle Corporation, All rights reserved. Possible ETT Failures A missing source file A system failure Inadequate metadata Poor mapping information Inadequate storage planning A source structural change No contingency plan Inadequate data validation A missing source file A system failure Inadequate metadata Poor mapping information Inadequate storage planning A source structural change No contingency plan Inadequate data validation
10-21 Copyright Oracle Corporation, All rights reserved. Maintaining ETT Quality ETT must be: –Tested –Documented –Monitored and reviewed Disparate metadata must be coordinated ETT must be: –Tested –Documented –Monitored and reviewed Disparate metadata must be coordinated
10-22 Copyright Oracle Corporation, All rights reserved. Extraction Tools Mapping information Update metadata JCL files Map Source Data to Intermediate File Store Sales and Marketing Customer Name Char Varchar 20 Unique name
10-23 Copyright Oracle Corporation, All rights reserved. Base functionality Base functionality Interface features Interface features Metadata repository Metadata repository Open API Open API Metadata access Metadata access Repository utilities Repository utilities Input and output processing Input and output processing Cleansing, reformatting, and auditing Cleansing, reformatting, and auditing References References Training requirements Training requirements Base functionality Base functionality Interface features Interface features Metadata repository Metadata repository Open API Open API Metadata access Metadata access Repository utilities Repository utilities Input and output processing Input and output processing Cleansing, reformatting, and auditing Cleansing, reformatting, and auditing References References Training requirements Training requirements Selection Criteria
10-24 Copyright Oracle Corporation, All rights reserved. WTI Partner ETT Tools Carleton Constellar Evolutionary Technologies Informatica Information Builders OracleEDMS, Toolkits, OADW Prism Solutions Sagent Vality Technology Carleton Constellar Evolutionary Technologies Informatica Information Builders OracleEDMS, Toolkits, OADW Prism Solutions Sagent Vality Technology
10-25 Copyright Oracle Corporation, All rights reserved. Summary This lesson discussed the following topics: ETT processes are essential and consume a large proportion of warehouse resources and time The extraction process acquires source data You may encounter many data sources There are many data extraction issues ETT Tools should be considered This lesson discussed the following topics: ETT processes are essential and consume a large proportion of warehouse resources and time The extraction process acquires source data You may encounter many data sources There are many data extraction issues ETT Tools should be considered
10-26 Copyright Oracle Corporation, All rights reserved. Practice 10-1 Overview This practice covers the following topics: Answering a series of short questions Specifying true or false to a series of statements This practice covers the following topics: Answering a series of short questions Specifying true or false to a series of statements