Chapter 16 Data Warehouse Technology and Management.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Dimensional Modeling Business Intelligence Solutions.
Database Systems: Design, Implementation, and Management Tenth Edition
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 16 Data Warehouse Technology and Management.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Data Warehouse: additional slides Source: Michael V. Mannino,
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 29 Overview of Data Warehousing and OLAP.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Lab3 CPIT 440 Data Mining and Warehouse.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
DATA WAREHOUSE (Muscat, Oman).
CS346: Advanced Databases
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehouse & Data Mining
Database Design, Application Development, and Administration, 5 th Edition Copyright © 2011 by Michael V. Mannino. All rights reserved. Chapter 16 Data.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
1 Data Warehouses BUAD/American University Data Warehouses.
13 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
BI Terminologies.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Data Warehousing. Databases support: Transaction Processing Systems –operational level decision –recording of transactions Decision Support Systems –tactical.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar ( )
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
SHIFALI CHOUBEY GISE LAB IITB Decision Support System For Farmers.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
Decision supports Systems Components
Chapter 11: Data Warehousing Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
7 Strategies for Extracting, Transforming, and Loading.
Data Warehousing.
Chapter 16 Data Warehouse Technology and Management.
Advanced Database Concepts
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
12 1 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 12.4 Online Analytical Processing OLAP creates an advanced data.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
An Overview of Data Warehousing and OLAP Technology
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
Slide 1 Chapter 17: Data Integration Practices and Relational DBMS Extensions Database Design, Application Development, and Administration, 5 th Edition.
Presented By: Pedel Oppong-Abebrese,Pedel Oppong-Abebrese Michael Boadi, William Osei, Nana Amoa OforiMichael BoadiWilliam OseiNana Amoa Ofori DATA WAREHOUSING.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Chapter 13 Business Intelligence and Data Warehouses
Chapter 13 The Data Warehouse
Data Warehouse.
Data Warehouse and OLAP
Data Warehousing Concepts
Data Warehouse and OLAP
Presentation transcript:

Chapter 16 Data Warehouse Technology and Management

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Basic concepts and characteristics Business architectures and applications Data cube concepts and operators Relational DBMS features Populating a data warehouse

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Comparison of Processing Environments Transaction processing –Uses operational databases –Short-term decisions: fulfill orders, resolve complaints, provide staffing Decision support processing –Uses integrated and summarized data –Medium and long-term decisions: capacity planning, store locations, new lines of business

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Data Warehouse Definition and Characteristics A central repository for summarized and integrated data from operational databases and external data sources Key Characteristics –Subject-oriented –Integrated –Time-variant –Nonvolatile

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Data Comparison

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Business Architectures and Applications Data warehouse projects Top-down architectures Bottom-up architecture Applications and data mining

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Data Warehouse Projects Large efforts with much coordination across departments Enterprise data model –Important artifact of data warehouse project –Structure of data model –Meta data for data transformation Top-down vs. bottom-up business architectures

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Two Tier Architecture

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Three Tier Architecture

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Bottom-up Architecture

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Applications

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Data Mining Discover significant, implicit patterns –Target promotions –Change mix and collocation of items Requires large volumes of transaction data Important application for data warehouses

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Data Cube Concepts and Operators Basics Dimension and measure details Operators

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Data Cube Basics Multidimensional arrangement of data Users think about decision support data as data cubes Terminology –Dimension: subject label for a row or column –Member: value of dimension –Measure: quantitative data stored in cells

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Data Cube Example

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Dimension and Measure Details Dimensions –Hierarchies: members can have sub members –Sparsity: many cells do not have data Measures –Derived measures –Multiple measures in cells

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Time Series Data Common data type in trend analysis Reduce dimensionality using time series Time series properties –Data type –Start date –Calendar –Periodicity –Conversion

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Slice Operator Focus on a subset of dimensions Set dimension to specific value: 1/1/2003

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Dice Operator Focus on a subset of member values Replace dimension with a subset of values Dice operation often follows a slice operation

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Other Operators Operators for hierarchical dimensions –Drill-down: add detail to a dimension –Roll-up: remove detail from a dimension –Recalculate measure values Pivot: rearrange dimensions

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Operator Summary OperatorPurposeDescription SliceFocus attention on a subset of dimensions Replace a dimension with a single member value or with a summary of its measure values DiceFocus attention on a subset of member values Replace a dimension with a subset of members Drill-downObtain more detail about a dimension Navigate from a more general level to a more specific level Roll-upSummarize details about a dimension Navigate from a more specific level to a more general level PivotPresent data in a different order Rearrange the dimensions in a data cube

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Relational DBMS Support Data modeling Dimension representation GROUP BY extensions Materialized views and query rewriting Storage structures and optimization

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Relational Data Modeling Dimension table: contains member values Fact table: contains measure values 1-M relationships from dimension to fact tables Grain: most detailed measure values stored

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Star Schema Example

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Constellation Schema Example

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Snowflake Schema Example

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Handling M-N Relationships Source data may have M-N relationships, not 1-M relationships Adjust fact or dimension tables for a fixed number of exceptions More complex solutions to support M-N relationships with a variable number of connections

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Dimension Representation Star schema and variations lack dimension representation Explicit dimension representation important to data cube operations and optimization Proprietary extensions for dimension representation Represent levels, hierarchies, and constraints

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Oracle Dimension Representation Levels: dimension components Hierarchies: may have multiple hierarchies Constraints: functional dependency relationships

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. CREATE DIMENSION Example CREATE DIMENSION StoreDim LEVEL StoreId IS Store.StoreId LEVEL City IS Store.StoreCity LEVEL State IS Store.StoreState LEVEL Zip IS Store.StoreZip LEVEL Nation IS Store.StoreNation LEVEL DivId IS Division.DivId HIERARCHY CityRollup ( StoreId CHILD OF City CHILD OF State CHILD OF Nation ) HIERARCHY ZipRollup ( StoreId CHILD OF Zip CHILD OF State CHILD OF Nation ) HIERARCHY DivisionRollup ( StoreId CHILD OF DivId JOIN KEY Store.DivId REFERENCES DivId ) ATTRIBUTE DivId DETERMINES Division.DivName ATTRIBUTE DivId DETERMINES Division.DivManager ;

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. GROUP BY Extensions ROLLUP operator CUBE operator GROUPING SETS operator Other extensions –Ranking –Ratios –Moving summary values

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. CUBE Example SELECT StoreZip, TimeMonth, SUM(SalesDollar) AS SumSales FROM Sales, Store, Time WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = Time.TimeNo AND (StoreNation = 'USA' OR StoreNation = 'Canada') AND TimeYear = 2002 GROUP BY CUBE (StoreZip, TimeMonth)

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. ROLLUP Example SELECT TimeMonth, TimeYear, SUM(SalesDollar) AS SumSales FROM Sales, Store, Time WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = Time.TimeNo AND (StoreNation = 'USA' OR StoreNation = 'Canada') AND TimeYear BETWEEN 2002 AND 2003 GROUP BY ROLLUP (TimeMonth,TimeYear);

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. GROUPING SETS Example SELECT StoreZip, TimeMonth, SUM(SalesDollar) AS SumSales FROM Sales, Store, Time WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = Time.TimeNo AND (StoreNation = 'USA' OR StoreNation = 'Canada') AND TimeYear = 2002 GROUP BY GROUPING SETS((StoreZip, TimeMonth), StoreZip, TimeMonth, ());

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Variations of the Grouping Operators Partial cube Partial rollup Composite columns CUBE and ROLLUP inside a GROUPIING SETS operation

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Materialized Views Stored view Periodically refreshed with source data Usually contain summary data Fast query response for summary data Appropriate in query dominant environments

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Materialized View Example CREATE MATERIALIZED VIEW MV1 BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND ENABLE QUERY REWRITE AS SELECT StoreState, TimeYear, SUM(SalesDollar) AS SUMDollar1 FROM Sales, Store, Time WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = Time.TimeNo AND TimeYear > 2000 GROUP BY StoreState, TimeYear;

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Query Rewriting Substitution process Materialized view replaces references to fact and dimension tables in a query Query optimizer must evaluate whether the substitution will improve performance over the original query More complex than query modification process for traditional views

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Query Rewriting Example -- Data warehouse query SELECT StoreState, TimeYear, SUM(SalesDollar) FROM Sales, Store, Time WHERE Sales.StoreId = Store.StoreId AND Sales.TimeNo = Time.TimeNo AND StoreNation IN ('USA','Canada') AND TimeYear = 2002 GROUP BY StoreState, TimeYear; -- Query Rewrite: replace Sales and Time tables with MV1 SELECT DISTINCT MV1.StoreState, TimeYear, SumDollar1 FROM MV1, Store WHERE MV1.StoreState = Store.StoreState AND TimeYear = 2002 AND StoreNation IN ('USA','Canada');

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Storage and Optimization Technologies MOLAP: direct storage and manipulation of data cubes ROLAP: relational extensions to support multidimensional data HOLAP: combine MOLAP and ROLAP storage engines

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. ROLAP Techniques Bitmap join indexes Star join optimization Query rewriting Summary storage advisors Parallel query execution

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Populating a Data Warehouse Data sources Workflow representation Optimizing the refresh process

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Data Sources Cooperative Logged Queryable Snapshot

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Maintenance Workflow

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Data Quality Problems Multiple identifiers Multiple field names Different units Missing values Orphaned values Multipurpose fields Conflicting data Different update times

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. ETL Tools Extraction, Transformation, and Loading Specification based Eliminate custom coding Third party and DBMS based tools

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Refresh Optimization

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Determining the Refresh Frequency Maximize net refresh benefit Value of data timeliness Cost of refresh Satisfy data warehouse and source system constraints

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Determining the Level of Historical Integrity Primarily an issue for dimension updates Type I: overwrite old values Type II: version numbers for an unlimited history Type III: new columns for a limited history

McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Summary Data warehouse requirements differ from transaction processing. Architecture choice is important. Multidimensional data model is intuitive Relational representation and storage techniques are significant. Maintaining a data warehouse is an important, operational problem.