Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
COMP 3715 Spring 05. Working with data in a DBMS Any database system must allow user to  Define data Relations Attributes Constraints  Manipulate data.
1 Relational Query Optimization Module 5, Lecture 2.
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
© Copyright 2011 John Wiley & Sons, Inc.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design Copyright 2000 © John Wiley & Sons, Inc. All rights reserved. Slide 1 Key.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Dimensional Modeling – Part 2
Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.
Physical Database Monitoring and Tuning the Operational System.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design Copyright 2000 © John Wiley & Sons, Inc. All rights reserved. Slide 1 Data.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
Systems analysis and design, 6th edition Dennis, wixom, and roth
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Component 4/Unit 6f Topic VI: Create simple querying statements for the database The SELECT statement Clauses Functions Joins Subqueries Data manipulation.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
1 Data Warehousing Lecture-13 Dimensional Modeling (DM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design Copyright 2000 © John Wiley & Sons, Inc. All rights reserved. Slide 1 Systems.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Christoph F. Eick Introduction Data Management Today 1. Introduction to Databases 2. Questionnaire 3. Course Information 4. Grading and Other Things.
© 1999 FORWISS FORWISS MISTRAL und DWH 6-2 Processing Relational Queries Using the Multidimensional Access Method UB-Tree Prof. R. Bayer, Ph.D. Dr. Volker.
Object Persistence (Data Base) Design Chapter 13.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
© 2000 FORWISS, 1 MISTRAL Processing Relational Queries Using a Multidimensional Access Method.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
Reducing the Response Time for Data Warehouse Queries Using Rough Set Theory By Mahmoud Mohamed Al-Bouraie Yasser Fouad Mahmoud Hassan Wesam Fathy Jasser.
The Digital Archive Database Tool Shih Lin Computing Center Academia Sinica.
Chapter 9 Database Systems Introduction to CS 1 st Semester, 2014 Sanghyun Park.
© 1999 FORWISS FORWISS MISTRAL Performance of TPC-D Benchmark and Datawarehouses Prof. R. Bayer, Ph.D. Dr. Volker Markl Dept. of Computer Science, Technical.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Prof. Bayer, DWH, Ch.5, SS Chapter 5. Indexing for DWH D1Facts D2.
Methodology – Physical Database Design for Relational Databases.
© 1999 FORWISS General Research Report Implementation and Optimization Issues of the ROLAP Algebra F. Ramsak, M.S. (UIUC) Dr. V. Markl Prof. R. Bayer,
Indexes and Views Unit 7.
1 On-Line Analytic Processing Warehousing Data Cubes.
GLOBEX INFOTEK Copyright © 2013 Dr. Emelda Ntinglet-DavisSYSTEMS ANALYSIS AND DESIGN METHODSINTRODUCTORY SESSION EFFECTIVE DATABASE DESIGN for BEGINNERS.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 4 Logical & Physical Database Design
Chapter 5 Index and Clustering
9-1 © Prentice Hall, 2007 Topic 9: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Component 4: Introduction to Information and Computer Science Unit 6: Databases and SQL Lecture 6 This material was developed by Oregon Health & Science.
Prof. Bayer, DWH, Ch.7, SS20021 Chapt. 7 Multidimensional Hierarchical Clustering Fig. 3.1 Hierarchies in the `Juice and More´ schema Year (3) Month (12)
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
11-1 © Prentice Hall, 2004 Chapter 11: Physical Database Design Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra, Joseph S. Valacich,
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
Data Model.
Introduction of Week 9 Return assignment 5-2
Prof. R. Bayer, Ph.D. Dr. Volker Markl
Chapt. 7 Multidimensional Hierarchical Clustering
Presentation transcript:

Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH Thomas-Dehler-Str München, Germany

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Motivation Many applications have multidimensional data Multidimensional indexes support retrieval of MD data Application Field: Data Warehouses  Hierarchically organized dimensions (e.g., year – month – day)  Large data volumes  Relatively static  Mainly retrieval query profile MD indexes usually support numeric MD data Encoding for hierarchical data necessary  Multidimensional Hierarchical Clustering (MHC)

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Theoretical comparison of range query performance ideal case multidimensional index multiple B-Trees, bitmap indexes compound primary B-Tree

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH UB-Tree: basic concepts Combination of B + -Tree and Z-curve  Z-curve is used to map multidimensional points to one-dimensional values (Z-values)  Z-values are used as keys in B * -Tree  Z-curve preserves spatial-proximity  symmetric clustering Index part Data part

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Visualized range-queries Germany Sachsen Bayern Freiberg Leipzig Dresden Burgh München Passau Feb 2003Mar 2003Jun 2003

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH MHC: Non-clustered hierarchy

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH MHC: Clustered hierarchy

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Basic technology of MHC MHC: Multidimensional Hierarchical Clustering MHC necessary because  Hierarchical organization of dimensions in warehouses  No intervals for hierarchical restrictions  Naive restrictions lead to many point queries instead of one interval on UB-Tree Artificial encoding of hierarchies:  Mapping of hierarchy restrictions to range restrictions  Mapping is used for physical clustering of the fact table  Modification of query algorithms necessary  Fast computation and space efficient

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Implementation of MHC Implementation into Transbase® DBMS kernel  Computation and maintenance of MHC encoding  Integration into DDL and DML  Integration into optimizer  Integration into archiving tools Transparency to users  Physical optimization  No extension of the DML

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Supported schemata Support of star schema and snowflake schema Star schemata  Conventional complete de-normaliation of the dimension tables  Foreign key relationships between fact table and dimension tables Supported snowflake schemata  Inner dimension tables de-normalized with hierarchy attributes  Feature attributes can be normalized  Fully supported by optimizer  More efficient than star schemata (knowledge about hierarchical dependency)

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Transbase® DDL extension Dimension Table CREATE TABLE dim_segment ( country_id INTEGER NOT NULL, country_txt CHAR(*), region_id INTEGER NOT NULL, region_txt CHAR(*), micromarket_id INTEGER(*) NOT NULL, micromarket_txt CHAR(*), outlet_id INTEGER NOT NULL outlet_txt CHAR(*), SURROGATE cs_segment COMPOUND (country_id SIBLINGS 16, region_id SIBLINGS 19, micromarket_id SIBLINGS 6, outlet_id SIBLINGS 2202), PRIMARY KEY (outlet_id) )

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Transbase® DDL extension (cont.) Fact Table: CREATE TABLE fact ( dsegINTEGER REFERENCES dim_segment(outlet_id) ON UPDATE CASCADE, dprodINTEGER REFERENCES dim_product(item_id) ON UPDATE CASCADE, dtimeINTEGER REFERENCES dim_time(day_id) ON UPDATE CASCADE, turnover NUMERIC(10,2) … SURROGATE cs_seg FOR dseg, SURROGATE cs_prod FOR dprod, SURROGATE cs_time FOR dtime, PRIMARY HCKEY (cs_seg, cs_prod, cs_time) )

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH DML No change of DML statements (SELECT, INSERT, UPDATE, DELETE) Conventional star (snowflake) joins (SQL-92 compliant): SELECT country, department, category, group, year, quarter, month, SUM(price), SUM(turnover) FROM customer c, product p, date d, fact f WHERE f.custkey = c.customer AND f.prodkey = p.item_key AND f.datekey = d.day AND c.country = 'GERMANY' AND c.department = 'SOUTH' AND p.category = 'TV' AND d.month = '10/2002' AND d.year = '2002' GROUP BY country, department, category, group, year, month

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Conventional query processing Standard method (non-clustering indexes):  Index evaluation of dimension restrictions  Fact table tuple materialization  Residual join with dimension tables  Grouping and aggregating  Sorting

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH MHC query processing: Overview Abstract execution plan: better understanding, implementation in operator trees Three phases:  Interval generation (semi – join)  Fact table access  Grouping and residual join Optimizing: hierarchical pre-grouping  Minimize residual join operations by grouping before joining

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH AEP - overview Fact Fact Table Access Group Select Order By Create Range DiDi DjDj Main Execution Phase Interval Generation Residual Join DkDk DiDi... Having

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Interval generation Mapping of hierarchical restrictions into a number of intervals Usage of special hierarchy indexes:  DXh Index: (h t, h t-1,..., h 1, cs)  Efficient interval computation Optimization for feature restrictions:  Merging many small intervals to less large intervals  Usage of hierarchical dependency for feature attributes, if supported by the schema (snowflake schemata)

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Fact table access Combination of intervals of all clustering dimensions forms multidimensional query boxes QB i Fact table access with implicit tuple materialization Sequential processing of query boxes Fast retrieving of result tuples Postfiltering can be necessary depending on the UB-Tree dimensions and restrictions

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Standard AEP Fact Table Access Residual Join Group Select Order By DkDk Predicate Evaluation Having Fact DiDi...

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Optimization: Hierarchical pre-grouping Basic concept  Hierarchy encoding stored in fact table (compound surrogates)  Groups of hierarchical GROUP BY attributes built from compound surrogates  Grouping not exact for non-prefix path grouping  Drastic reduction of fact table result tuples  Example (for hierarchy year – month – day): number of fact table result tuples: pre-grouping (on month): ca (aggregated) tuples  residual join with instead of tuples  reduction by a factor of 30!  Possibly post-grouping necessary for too fine pre-grouping

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Hierarchical pre-grouping (cont.) D ln Fact Table Access Post-Group Order By Pre-Group Residual Join Having Predicate Evaluation Fact Residual Join D ei D e1 D l1...

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Performance comparison Data:  Real world data warehouse of electronic retailer in Greece  5 dimensions, 49 measures on fact table  3 years of transactions, i.e., 8,5 million fact table tuples (2,8 GB) Environment  2 Processor Pentium II (400 MHz), 768 MB RAM, Windows 2000 Queries  22 query classes with real world user queries Comparisons  MHC versus no multidimensional clustering  Conventional grouping versus hierarchical pre-grouping

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Perf. comp: MHC – no clustering FT Sel. %[ ][ ][ ] STARAEPSTARAEPSTARAEP MIN MAX MEDIAN STD-DEV Time of fact tuple access in seconds

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Perf. comp: no pre-grouping – pre-grouping FT Sel. %All[ ][ ][ ] MIN3,6 21,346,0 1. Quartile245,8135,1911,3816,2 MEDIAN1.139,5531,62.270,45.938,9 3. Quartile4.708,01.905,69.747, ,6 MAX , , , ,0 Comparison of grouping Cardinality: No pre-grouping / Hier. pre-grouping

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Perf. comp: no pre-grouping – pre-grouping Speedup of the time of hierarchical pre-grouping FT Sel. %ALL[ ][ ][ ] MIN0,3 0,80,6 1. Quartile3,02,43,94,6 MEDIAN4,43,65,86,6 3. Quartile6,55,27,27,8 MAX25,514,325,512,6

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Summary MHC: Multidimensional hierarchical clustering  Encoding for hierarchy paths, in order to support clustering multidimensional indexes  Support of star and snowflake schemata Full implementation into Transbase®  Integration into the query processor (maintenance of compound surrogates)  Integration into the optimizer (interval generation, fact table access, hierarchical pre-grouping) Significant speedup of performance:  Clustering vs. non-clustering organization: 2-20  Conventional grouping vs. hierarchical pre-grouping: 4-7

Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Questions ???? Everything clear? Otherwise contact: Roland Pieringer Tel: 089/ Transaction Software GmbH