An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of Minnesota, Fall 2006 CSci 8701: Overview of Database Research
Michael Goshey: 9/19/20062 Outline 1. Introduction 2. Problem Addressed 3. Major Contributions 4. Key Concepts 5. Validation Methodology 6. Assumptions Rewrite
Michael Goshey: 9/19/20063 Introduction Selected paper S. Chaudhuri and U. Dayal, An Overview of Data Warehousing and OLAP Technology, SIGMOD Record 26(1): 65-74(1997). Motivation Personal Interest
Michael Goshey: 9/19/20064 Outline 1. Introduction 2. Problem Addressed 3. Major Contributions 4. Key Concepts 5. Validation Methodology 6. Assumptions Rewrite
Michael Goshey: 9/19/20065 Problem Addressed Problem Statement Survey: organizing the data warehousing space Differing requirements between OLTP and OLAP Significance Growth area Reference work establishing consensus on terms, architectures and issues
Michael Goshey: 9/19/20066 Outline 1. Introduction 2. Problem Addressed 3. Major Contributions 4. Key Concepts 5. Validation Methodology 6. Assumptions Rewrite
Michael Goshey: 9/19/20067 Major Contributions Bridging the gulf between industry and academia OLTP vs. OLAP: clarifying the differences Concise survey of relevant issues, architectures and tools Concrete list of data warehouse design and build steps
Michael Goshey: 9/19/20068 Outline 1. Introduction 2. Problem Addressed 3. Major Contributions 4. Key Concepts 5. Validation Methodology 6. Assumptions Rewrite
Michael Goshey: 9/19/20069 Key Concepts Data warehouses and data marts OLTP, OLAP, ROLAP vs. MOLAP) Relational and dimensional data models Bitmap Index ETL Metadata Managed query vs. ad hoc environments Materialized views SQL extensions (cube, rollup, rank, percentile, etc.)
Michael Goshey: 9/19/ Data Warehouse, Data Mart
Michael Goshey: 9/19/ Relational or Dimensional?
Michael Goshey: 9/19/ Relational or Dimensional? (image from
Michael Goshey: 9/19/ Bitmap Indices customerage 0-10age 11-20age 21-30age Mary1000 John0100 Steve0010 Tom0001 Lisa0010 cardinality: unique values/total rows B-Tree vs. bitmap: 1% rule, uniqueness Boolean algebra directly on indices
Michael Goshey: 9/19/ Outline 1. Introduction 2. Problem Addressed 3. Major Contributions 4. Key Concepts 5. Validation Methodology 6. Assumptions Rewrite
Michael Goshey: 9/19/ Validation Methodology Survey paper goals Academic and industry citations Referencing tools, vendors Case studies
Michael Goshey: 9/19/ Outline 1. Introduction 2. Problem Addressed 3. Major Contributions 4. Key Concepts 5. Validation Methodology 6. Assumptions Rewrite
Michael Goshey: 9/19/ Assumptions Read-only environments Shortcomings (occasional) transactional commitments the data revision problem
Michael Goshey: 9/19/ Outline 1. Introduction 2. Problem Addressed 3. Major Contributions 4. Key Concepts 5. Validation Methodology 6. Assumptions Rewrite
Michael Goshey: 9/19/ Rewrite Changes in terminology, tools, vendors Fact constellations -> conformed dimensions Decision support -> BI Vendors and tools in BI, ETL, OLAP Multiple user constituencies Data history difficulties petabyte databases -> very large warehouses common data expiry challenges slowly changing dimensions
Michael Goshey: 9/19/ Slowly Changing Dimensions CustomerIDNameStatus 001Mary JohnsonGold CustomerIDNameStatus 001Mary JohnsonPlatinum CustomerIDNameStatus 001Mary JohnsonGold 001Mary JohnsonPlatinum CustomerIDNameOriginal StatusCurrent StatusEffective Date 001Mary JohnsonGoldPlatinum10/1/2006 Before After: Type 1 After: Type 2 After: Type 3 CustomerIDNameStatus 001Mary JohnsonPlatinum
Michael Goshey: 9/19/ Questions?