Download presentation
Presentation is loading. Please wait.
1
Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH Thomas-Dehler-Str. 18 81737 München, Germany www.transaction.de
2
Feb 2003 - 2 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Motivation Many applications have multidimensional data Multidimensional indexes support retrieval of MD data Application Field: Data Warehouses Hierarchically organized dimensions (e.g., year – month – day) Large data volumes Relatively static Mainly retrieval query profile MD indexes usually support numeric MD data Encoding for hierarchical data necessary Multidimensional Hierarchical Clustering (MHC)
3
Feb 2003 - 3 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Theoretical comparison of range query performance ideal case multidimensional index multiple B-Trees, bitmap indexes compound primary B-Tree
4
Feb 2003 - 4 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de UB-Tree: basic concepts Combination of B + -Tree and Z-curve Z-curve is used to map multidimensional points to one-dimensional values (Z-values) Z-values are used as keys in B * -Tree Z-curve preserves spatial-proximity symmetric clustering Index part Data part 8178 39513951 28
5
Feb 2003 - 5 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Visualized range-queries Germany Sachsen Bayern Freiberg Leipzig Dresden Burgh München Passau Feb 2003Mar 2003Jun 2003
6
Feb 2003 - 6 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de MHC: Non-clustered hierarchy
7
Feb 2003 - 7 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de MHC: Clustered hierarchy
8
Feb 2003 - 8 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Basic technology of MHC MHC: Multidimensional Hierarchical Clustering MHC necessary because Hierarchical organization of dimensions in warehouses No intervals for hierarchical restrictions Naive restrictions lead to many point queries instead of one interval on UB-Tree Artificial encoding of hierarchies: Mapping of hierarchy restrictions to range restrictions Mapping is used for physical clustering of the fact table Modification of query algorithms necessary Fast computation and space efficient
9
Feb 2003 - 9 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Implementation of MHC Implementation into Transbase® DBMS kernel Computation and maintenance of MHC encoding Integration into DDL and DML Integration into optimizer Integration into archiving tools Transparency to users Physical optimization No extension of the DML
10
Feb 2003 - 10 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Supported schemata Support of star schema and snowflake schema Star schemata Conventional complete de-normaliation of the dimension tables Foreign key relationships between fact table and dimension tables Supported snowflake schemata Inner dimension tables de-normalized with hierarchy attributes Feature attributes can be normalized Fully supported by optimizer More efficient than star schemata (knowledge about hierarchical dependency)
11
Feb 2003 - 11 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Transbase® DDL extension Dimension Table CREATE TABLE dim_segment ( country_id INTEGER NOT NULL, country_txt CHAR(*), region_id INTEGER NOT NULL, region_txt CHAR(*), micromarket_id INTEGER(*) NOT NULL, micromarket_txt CHAR(*), outlet_id INTEGER NOT NULL outlet_txt CHAR(*), SURROGATE cs_segment COMPOUND (country_id SIBLINGS 16, region_id SIBLINGS 19, micromarket_id SIBLINGS 6, outlet_id SIBLINGS 2202), PRIMARY KEY (outlet_id) )
12
Feb 2003 - 12 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Transbase® DDL extension (cont.) Fact Table: CREATE TABLE fact ( dsegINTEGER REFERENCES dim_segment(outlet_id) ON UPDATE CASCADE, dprodINTEGER REFERENCES dim_product(item_id) ON UPDATE CASCADE, dtimeINTEGER REFERENCES dim_time(day_id) ON UPDATE CASCADE, turnover NUMERIC(10,2) … SURROGATE cs_seg FOR dseg, SURROGATE cs_prod FOR dprod, SURROGATE cs_time FOR dtime, PRIMARY HCKEY (cs_seg, cs_prod, cs_time) )
13
Feb 2003 - 13 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de DML No change of DML statements (SELECT, INSERT, UPDATE, DELETE) Conventional star (snowflake) joins (SQL-92 compliant): SELECT country, department, category, group, year, quarter, month, SUM(price), SUM(turnover) FROM customer c, product p, date d, fact f WHERE f.custkey = c.customer AND f.prodkey = p.item_key AND f.datekey = d.day AND c.country = 'GERMANY' AND c.department = 'SOUTH' AND p.category = 'TV' AND d.month = '10/2002' AND d.year = '2002' GROUP BY country, department, category, group, year, month
14
Feb 2003 - 14 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Conventional query processing Standard method (non-clustering indexes): Index evaluation of dimension restrictions Fact table tuple materialization Residual join with dimension tables Grouping and aggregating Sorting
15
Feb 2003 - 15 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de MHC query processing: Overview Abstract execution plan: better understanding, implementation in operator trees Three phases: Interval generation (semi – join) Fact table access Grouping and residual join Optimizing: hierarchical pre-grouping Minimize residual join operations by grouping before joining
16
Feb 2003 - 16 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de AEP - overview Fact Fact Table Access Group Select Order By Create Range DiDi DjDj Main Execution Phase Interval Generation...... Residual Join DkDk DiDi... Having
17
Feb 2003 - 17 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Interval generation Mapping of hierarchical restrictions into a number of intervals Usage of special hierarchy indexes: DXh Index: (h t, h t-1,..., h 1, cs) Efficient interval computation Optimization for feature restrictions: Merging many small intervals to less large intervals Usage of hierarchical dependency for feature attributes, if supported by the schema (snowflake schemata)
18
Feb 2003 - 18 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Fact table access Combination of intervals of all clustering dimensions forms multidimensional query boxes QB i Fact table access with implicit tuple materialization Sequential processing of query boxes Fast retrieving of result tuples Postfiltering can be necessary depending on the UB-Tree dimensions and restrictions
19
Feb 2003 - 19 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Standard AEP Fact Table Access Residual Join Group Select Order By DkDk Predicate Evaluation Having Fact DiDi...
20
Feb 2003 - 20 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Optimization: Hierarchical pre-grouping Basic concept Hierarchy encoding stored in fact table (compound surrogates) Groups of hierarchical GROUP BY attributes built from compound surrogates Grouping not exact for non-prefix path grouping Drastic reduction of fact table result tuples Example (for hierarchy year – month – day): number of fact table result tuples: 100.000 pre-grouping (on month): ca. 3.000 (aggregated) tuples residual join with 3.000 instead of 100.000 tuples reduction by a factor of 30! Possibly post-grouping necessary for too fine pre-grouping
21
Feb 2003 - 21 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Hierarchical pre-grouping (cont.) D ln Fact Table Access Post-Group Order By Pre-Group Residual Join Having Predicate Evaluation Fact Residual Join D ei D e1 D l1...
22
Feb 2003 - 22 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Performance comparison Data: Real world data warehouse of electronic retailer in Greece 5 dimensions, 49 measures on fact table 3 years of transactions, i.e., 8,5 million fact table tuples (2,8 GB) Environment 2 Processor Pentium II (400 MHz), 768 MB RAM, Windows 2000 Queries 22 query classes with 1.320 real world user queries Comparisons MHC versus no multidimensional clustering Conventional grouping versus hierarchical pre-grouping
23
Feb 2003 - 23 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Perf. comp: MHC – no clustering FT Sel. %[0.0-0.1][0.1-1.0][1.0-5.0] STARAEPSTARAEPSTARAEP MIN0065227411 MAX3062909121947 MEDIAN11182847723 STD-DEV5176334614 Time of fact tuple access in seconds
24
Feb 2003 - 24 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Perf. comp: no pre-grouping – pre-grouping FT Sel. %All[0.0 - 0.25][0.25 - 1.0][1.0 - 10.0] MIN3,6 21,346,0 1. Quartile245,8135,1911,3816,2 MEDIAN1.139,5531,62.270,45.938,9 3. Quartile4.708,01.905,69.747,525.409,6 MAX593.280,019.340,078.384,0593.280,0 Comparison of grouping Cardinality: No pre-grouping / Hier. pre-grouping
25
Feb 2003 - 25 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Perf. comp: no pre-grouping – pre-grouping Speedup of the time of hierarchical pre-grouping FT Sel. %ALL[0.0 - 0.25][0.25 - 1.0][1.0 - 10.0] MIN0,3 0,80,6 1. Quartile3,02,43,94,6 MEDIAN4,43,65,86,6 3. Quartile6,55,27,27,8 MAX25,514,325,512,6
26
Feb 2003 - 26 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Summary MHC: Multidimensional hierarchical clustering Encoding for hierarchy paths, in order to support clustering multidimensional indexes Support of star and snowflake schemata Full implementation into Transbase® Integration into the query processor (maintenance of compound surrogates) Integration into the optimizer (interval generation, fact table access, hierarchical pre-grouping) Significant speedup of performance: Clustering vs. non-clustering organization: 2-20 Conventional grouping vs. hierarchical pre-grouping: 4-7
27
Feb 2003 - 27 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Questions ???? Everything clear? Otherwise contact: Roland Pieringer Tel: 089/62709-0 Transaction Software GmbH pieringer@transaction.de www.transaction.de
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.