Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH Thomas-Dehler-Str München, Germany
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Motivation Many applications have multidimensional data Multidimensional indexes support retrieval of MD data Application Field: Data Warehouses Hierarchically organized dimensions (e.g., year – month – day) Large data volumes Relatively static Mainly retrieval query profile MD indexes usually support numeric MD data Encoding for hierarchical data necessary Multidimensional Hierarchical Clustering (MHC)
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Theoretical comparison of range query performance ideal case multidimensional index multiple B-Trees, bitmap indexes compound primary B-Tree
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH UB-Tree: basic concepts Combination of B + -Tree and Z-curve Z-curve is used to map multidimensional points to one-dimensional values (Z-values) Z-values are used as keys in B * -Tree Z-curve preserves spatial-proximity symmetric clustering Index part Data part
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Visualized range-queries Germany Sachsen Bayern Freiberg Leipzig Dresden Burgh München Passau Feb 2003Mar 2003Jun 2003
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH MHC: Non-clustered hierarchy
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH MHC: Clustered hierarchy
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Basic technology of MHC MHC: Multidimensional Hierarchical Clustering MHC necessary because Hierarchical organization of dimensions in warehouses No intervals for hierarchical restrictions Naive restrictions lead to many point queries instead of one interval on UB-Tree Artificial encoding of hierarchies: Mapping of hierarchy restrictions to range restrictions Mapping is used for physical clustering of the fact table Modification of query algorithms necessary Fast computation and space efficient
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Implementation of MHC Implementation into Transbase® DBMS kernel Computation and maintenance of MHC encoding Integration into DDL and DML Integration into optimizer Integration into archiving tools Transparency to users Physical optimization No extension of the DML
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Supported schemata Support of star schema and snowflake schema Star schemata Conventional complete de-normaliation of the dimension tables Foreign key relationships between fact table and dimension tables Supported snowflake schemata Inner dimension tables de-normalized with hierarchy attributes Feature attributes can be normalized Fully supported by optimizer More efficient than star schemata (knowledge about hierarchical dependency)
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Transbase® DDL extension Dimension Table CREATE TABLE dim_segment ( country_id INTEGER NOT NULL, country_txt CHAR(*), region_id INTEGER NOT NULL, region_txt CHAR(*), micromarket_id INTEGER(*) NOT NULL, micromarket_txt CHAR(*), outlet_id INTEGER NOT NULL outlet_txt CHAR(*), SURROGATE cs_segment COMPOUND (country_id SIBLINGS 16, region_id SIBLINGS 19, micromarket_id SIBLINGS 6, outlet_id SIBLINGS 2202), PRIMARY KEY (outlet_id) )
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Transbase® DDL extension (cont.) Fact Table: CREATE TABLE fact ( dsegINTEGER REFERENCES dim_segment(outlet_id) ON UPDATE CASCADE, dprodINTEGER REFERENCES dim_product(item_id) ON UPDATE CASCADE, dtimeINTEGER REFERENCES dim_time(day_id) ON UPDATE CASCADE, turnover NUMERIC(10,2) … SURROGATE cs_seg FOR dseg, SURROGATE cs_prod FOR dprod, SURROGATE cs_time FOR dtime, PRIMARY HCKEY (cs_seg, cs_prod, cs_time) )
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH DML No change of DML statements (SELECT, INSERT, UPDATE, DELETE) Conventional star (snowflake) joins (SQL-92 compliant): SELECT country, department, category, group, year, quarter, month, SUM(price), SUM(turnover) FROM customer c, product p, date d, fact f WHERE f.custkey = c.customer AND f.prodkey = p.item_key AND f.datekey = d.day AND c.country = 'GERMANY' AND c.department = 'SOUTH' AND p.category = 'TV' AND d.month = '10/2002' AND d.year = '2002' GROUP BY country, department, category, group, year, month
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Conventional query processing Standard method (non-clustering indexes): Index evaluation of dimension restrictions Fact table tuple materialization Residual join with dimension tables Grouping and aggregating Sorting
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH MHC query processing: Overview Abstract execution plan: better understanding, implementation in operator trees Three phases: Interval generation (semi – join) Fact table access Grouping and residual join Optimizing: hierarchical pre-grouping Minimize residual join operations by grouping before joining
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH AEP - overview Fact Fact Table Access Group Select Order By Create Range DiDi DjDj Main Execution Phase Interval Generation Residual Join DkDk DiDi... Having
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Interval generation Mapping of hierarchical restrictions into a number of intervals Usage of special hierarchy indexes: DXh Index: (h t, h t-1,..., h 1, cs) Efficient interval computation Optimization for feature restrictions: Merging many small intervals to less large intervals Usage of hierarchical dependency for feature attributes, if supported by the schema (snowflake schemata)
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Fact table access Combination of intervals of all clustering dimensions forms multidimensional query boxes QB i Fact table access with implicit tuple materialization Sequential processing of query boxes Fast retrieving of result tuples Postfiltering can be necessary depending on the UB-Tree dimensions and restrictions
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Standard AEP Fact Table Access Residual Join Group Select Order By DkDk Predicate Evaluation Having Fact DiDi...
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Optimization: Hierarchical pre-grouping Basic concept Hierarchy encoding stored in fact table (compound surrogates) Groups of hierarchical GROUP BY attributes built from compound surrogates Grouping not exact for non-prefix path grouping Drastic reduction of fact table result tuples Example (for hierarchy year – month – day): number of fact table result tuples: pre-grouping (on month): ca (aggregated) tuples residual join with instead of tuples reduction by a factor of 30! Possibly post-grouping necessary for too fine pre-grouping
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Hierarchical pre-grouping (cont.) D ln Fact Table Access Post-Group Order By Pre-Group Residual Join Having Predicate Evaluation Fact Residual Join D ei D e1 D l1...
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Performance comparison Data: Real world data warehouse of electronic retailer in Greece 5 dimensions, 49 measures on fact table 3 years of transactions, i.e., 8,5 million fact table tuples (2,8 GB) Environment 2 Processor Pentium II (400 MHz), 768 MB RAM, Windows 2000 Queries 22 query classes with real world user queries Comparisons MHC versus no multidimensional clustering Conventional grouping versus hierarchical pre-grouping
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Perf. comp: MHC – no clustering FT Sel. %[ ][ ][ ] STARAEPSTARAEPSTARAEP MIN MAX MEDIAN STD-DEV Time of fact tuple access in seconds
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Perf. comp: no pre-grouping – pre-grouping FT Sel. %All[ ][ ][ ] MIN3,6 21,346,0 1. Quartile245,8135,1911,3816,2 MEDIAN1.139,5531,62.270,45.938,9 3. Quartile4.708,01.905,69.747, ,6 MAX , , , ,0 Comparison of grouping Cardinality: No pre-grouping / Hier. pre-grouping
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Perf. comp: no pre-grouping – pre-grouping Speedup of the time of hierarchical pre-grouping FT Sel. %ALL[ ][ ][ ] MIN0,3 0,80,6 1. Quartile3,02,43,94,6 MEDIAN4,43,65,86,6 3. Quartile6,55,27,27,8 MAX25,514,325,512,6
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Summary MHC: Multidimensional hierarchical clustering Encoding for hierarchy paths, in order to support clustering multidimensional indexes Support of star and snowflake schemata Full implementation into Transbase® Integration into the query processor (maintenance of compound surrogates) Integration into the optimizer (interval generation, fact table access, hierarchical pre-grouping) Significant speedup of performance: Clustering vs. non-clustering organization: 2-20 Conventional grouping vs. hierarchical pre-grouping: 4-7
Feb BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH Questions ???? Everything clear? Otherwise contact: Roland Pieringer Tel: 089/ Transaction Software GmbH