25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.

Slides:



Advertisements
Similar presentations
OLAP over Uncertain and Imprecise Data
Advertisements

Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Chapter 10: Designing Databases
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
Achieving Adaptivity for OLAP-XML Federations Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT.
Xyleme A Dynamic Warehouse for XML Data of the Web.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Hierarchical Dwarfs for the Rollup-Cube Yannis Sismanis Antonios Deligiannakis Yannis Kotidis Nick Roussopoulos.
Logical Database Design Nazife Dimililer. II - Logical Database Design Two stages –Building and validating local logical model –Building and validating.
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
Database Design and Introduction to SQL
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
SELECT LA_code, sum(population)/sum(wardarea) AS density, (SELECT count(*) FROM wards_by_LA WLA2, violent_crime WHERE WLA2.ward_code =violent_crime.ward_code.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Web-Enabled Decision Support Systems
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
OnLine Analytical Processing (OLAP)
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Designing Aggregations. Performance Fundamentals - Aggregations Pre-calculated summaries of data Intersections of levels from each dimension Tradeoff.
11th SSDBM, Cleveland, Ohio, July 28-30, 1999 Supporting Imprecision in Multidimensional Databases Using Granularities T. B. Pedersen 1,2, C. S. Jensen.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d11.
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
1 On-Line Analytic Processing Warehousing Data Cubes.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
Lineage Tracing for General Data Warehouse Transformations Yingwei Cui and Jennifer Widom Computer Science Department, Stanford University Presentation.
Lection №4 Development of the Relational Databases.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Query Optimization For OLAP-XML Federations Dennis Pedersen Karsten Riiis Torben Bach Pedersen Nykredit Center for Database Research Department of Computer.
Offering a Precision- Performance Tradeoff for Aggregation Queries over Replicated Data Paper by Chris Olston, Jennifer Widom Presented by Faizaan Kersi.
1 Cache-Oblivious Query Processing Bingsheng He, Qiong Luo {saven, Department of Computer Science & Engineering Hong Kong University of.
©Silberschatz, Korth and Sudarshan2.1Database System Concepts Chapter 2: Entity-Relationship Model Entity Sets Relationship Sets Mapping Constraints Keys.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Intro to MIS – MGS351 Databases and Data Warehouses
On-Line Analytic Processing
Data warehouse and OLAP
Efficient Methods for Data Cube Computation
Chapter 13 The Data Warehouse
Databases and Data Warehouses Chapter 3
Translation of ER-diagram into Relational Schema
Edge computing (1) Content Distribution Networks
Chapter 14 Normalization – Part I Pearson Education © 2009.
Data Warehouse.
Data Transformations targeted at minimizing experimental variance
Chapter 13 The Data Warehouse
Materializing Views With Minimal Size To Answer Queries
Database.
Efficient Aggregation over Objects with Extent
Presentation transcript:

25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen 2, and C. E. Dyreson 2 1 Center for Health Information Services Kommunedata, 2 Nykredit Center for Database Research Department of Computer Science Aalborg University,

25th VLDB, Edinburgh, Scotland, September 7-10, Motivation On-Line Analytical Processing (OLAP) systems are popular. Based on a dimensional view of data. Require fast query response times even for large amounts of data. Pre-aggregation is used to speed up queries. Full pre-aggregation (all combinations of aggregate levels are materialized) is infeasible. Takes up too much space (~200 times the size of the raw data). Takes too long to update when data changes. Practical (or partial) pre-aggregation techniques Store only select combinations of aggregates. Re-use these to compute higher-level results. Provides a good trade-off between storage space/update time and response time.

25th VLDB, Edinburgh, Scotland, September 7-10, Motivation Practical pre-aggregation requires well-behaved hierarchies and fact-dimension relationships. Hierarchies must be strict covering onto Facts and dimensions must be many-to-one have uniform granularity. Often, these requirements are not satisfied in practice. We present techniques and algorithms for applying practical pre-aggregation to general, non-summarizable hierarchies and fact-dimension relationships.

25th VLDB, Edinburgh, Scotland, September 7-10, Patient Case Study Patients are the facts in multidimensional terms. Each patient has zero or more diagnoses (many-to-many). The diagnoses may be registered at any level (Low- level, Family, or Group), i.e., the granularity is varying. A diagnosis may have no children (non-onto) or several parents (non-strict). Each patient has one address, in a city or a rural area (non- covering). Cities are located in counties.

25th VLDB, Edinburgh, Scotland, September 7-10, Talk Overview Motivation Data model context Hierarchy properties Hierarchy transformations Integration in current systems Conclusion and future work

25th VLDB, Edinburgh, Scotland, September 7-10, Data Model - Schema Fact type: Patient Dimension types: Diagnosis and Residence There are no “measures”, all data are dimensions. Category types: Low-level Diagnosis, Diagnosis Family, Diagnosis Group, Address, City, County Top category types: corresponds to ALL of the dimension. Bottom category types: the lowest level in each dimension The category types of a dimension type form a lattice. Patient Diagnosis Group LL Diagnosis Diagnosis Dimension Type Residence Dimension Type T Diagnosis T Residence Address City Diagnosis Family County

25th VLDB, Edinburgh, Scotland, September 7-10, Data Model - Instances Categories: instances of category types, consist of dimension values. Top categories contain only one “T” value. Dimensions = categories + partial order on category values, hierarchy may be non-strict. Facts: instances of fact type with separate identity. Fact-dimension relations: links facts to dimensions, may map to several values of any granularity in each dimension. Multidimensional object (MO) = schema + dimension + facts + fact-dimension relations Diagnosis Dimension Residence Dimension T Cancer PregRel Diabetes Lung Canc. T JimJane John Sydney Outback 1 Central 2 Rural Sydney Diab. Preg. Ins. Dep. Diab. Ins. Dep. Diab. Preg.

25th VLDB, Edinburgh, Scotland, September 7-10, Hierarchy Properties Summarizability requires that the dimension hierarchies are covering, onto, and strict. Non-covering hierarchies occur when links between dimension values “skip” levels. Non-strict hierarchies occur when one lower-level item has several parents. Non-onto (into) hierarchies occur when the height of the hierarchy is varying. These properties cause problems when re-using stored counts of patients to compute new values. Diagnosis Dimension Residence Dimension T Cancer PregRel Diabetes Lung Canc. T Sydney Outback 1 Central 2 Rural Sydney Diab. Preg. Ins. Dep. Diab. Ins. Dep. Diab. Preg. No Low-level Diagnosis ? ? 0 2? 1 0

25th VLDB, Edinburgh, Scotland, September 7-10, Hierarchy Transformations The overall task is to transform non-summarizable hierarchies to summarizable hierarchies automatically. A hierarchy is transformed in three steps: 1) Transform the hierarchy to be covering. 2) Transform the result from 1) to be onto. 3) Transform the result from 2) to be strict. We give an algorithm for each transformation. Each algorithm assumes that the previous algorithm(s) has been applied.

25th VLDB, Edinburgh, Scotland, September 7-10, Non-covering Hierarchies The MakeCovering algorithm starts with the bottom category C (Address). For each parent category P of C (City and County) it looks for parent categories H that are “higher than” P (City < County). The algorithms finds all the links L=(h,c) from H to C that are not covered by going through P. For each link L an intermediate value is inserted into P and linked to h and c. Finally, the algorithm is applied recursively to P (nothing changes in this case). T Sydney Outback 1 Central 2 Rural Sydney Residence Dimension Address City County T Residence P C L City Outback H

25th VLDB, Edinburgh, Scotland, September 7-10, Non-onto Hierarchies The MakeOnto algorithm starts with the top category P (T diagnosis ). For each child category C of P it finds the values n in P with no children (none are found in the first two calls). For each childless n in P, the algorithms inserts a placeholder value c n in C and links it to n. Finally, the algorithm is called recursively on C. Diagnosis Dimension T Cancer PregRel Diabetes Lung Canc. Diab. Preg. Ins. Dep. Diab. Ins. Dep. Diab. Preg. LL Diagnosis Diag. Family Diag. Group T Diagnosis P C P C C P n LL LC

25th VLDB, Edinburgh, Scotland, September 7-10, Non-strict Hierarchies Schema changes shown first Idea: “fuse” sets into one value. The MakeStrict algorithm starts with a child category C. For each parent category P of C, it looks for non-strictness between C and P if P has parents. If non-strictness occurs, a new category N (holding sets of P) is inserted between C and P. Grandparents G of C are linked to the new category. “Unsafe” links are removed. Finally, the algorithm is called recursively on N. LL Diagnosis Diag. Family Diag. Group T Diagnosis Set-of-DF Set-of-DG P G G C CC G P P

25th VLDB, Edinburgh, Scotland, September 7-10, Non-strict Hierarchies Instance changes shown now. The MakeStrict algorithm starts with a child category C. For each parent category P of C, it looks for non-strictness between C and P if P has parents. If non-strictness occurs, new “fused” values (sets of P) are inserted into N. Values in grandparents G of C are linked to the new values. “Unsafe” links are removed. Finally, the algorithm is called recursively on N. Diagnosis Dimension T Cancer PregRel Diabetes Lung Canc. Diab. Preg. Ins. Dep. Diab. Ins. Dep. Diab. Preg. LL LC {LC} {DP,IDD} {C} {PR,D}

25th VLDB, Edinburgh, Scotland, September 7-10, Did We Solve The Problem ? Now we can re-use stored values to correctly compute new higher-level values. Diagnosis Dimension T Cancer PregRel Diabetes Lung Canc. Diab. Preg. Ins. Dep. Diab. Ins. Dep. Diab. Preg. LL LC {LC} {DP,IDD} {C} {PR,D} T Sydney Outback 1 Central 2 Rural Sydney Residence Dimension City Outback

25th VLDB, Edinburgh, Scotland, September 7-10, Integration in Current Systems The transformations should be transparent to the user. Modern OLAP systems have client and server components, communicating using, e.g., OLE DB for OLAP or MD-API. OLAP queries consist of 80% navigation and 20% aggregation queries [Kimball]. Integration is achieved using a Query Handler that sends navigation queries to the original DB and aggregation queries to the transformed DB. Small space overhead (the hierarchies only are stored twice).

25th VLDB, Edinburgh, Scotland, September 7-10, Conclusion and Future Work We have presented algorithms that automatically transform non-summarizable (non-covering, non-onto, non-strict) hierarchies to become summarizable. The algorithms have a practical, low complexity and can be applied incrementally to minimize the cost of updates. The algorithms can be applied to non-summarizable relationships between facts and dimensions. The presented technique can be implemented in current OLAP systems transparently to the user. Future work: Transform only the part of the hierarchy that has been selected for materialization. Take the aggregate function into account, i.e., MAX and MIN are insensitive to duplicates.

25th VLDB, Edinburgh, Scotland, September 7-10, Previous Work Much work has focused on selecting the optimal subset of aggregate levels for practical pre-aggregation, given space and/or time constraints. This work assumes summarizability. Thus, it cannot be applied to general hierarchies and fact-dim. relationships. One paper has shown how to manually, and not transparently to the user, achieve summarizability for non- covering hierarchies. We give algorithms and techniques that automatically achieves summarizability for non-covering, non-onto, and non-strict hierarchies, transparently to the user. The algorithms and techniques can also be applied to non- summarizable relationships between facts and dimensions.