Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH.

Similar presentations


Presentation on theme: "Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH."— Presentation transcript:

1 Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH Thomas-Dehler-Str. 18 81737 München, Germany www.transaction.de

2 Feb 2003 - 2 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Motivation Many applications have multidimensional data Multidimensional indexes support retrieval of MD data Application Field: Data Warehouses  Hierarchically organized dimensions (e.g., year – month – day)  Large data volumes  Relatively static  Mainly retrieval query profile MD indexes usually support numeric MD data Encoding for hierarchical data necessary  Multidimensional Hierarchical Clustering (MHC)

3 Feb 2003 - 3 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Theoretical comparison of range query performance ideal case multidimensional index multiple B-Trees, bitmap indexes compound primary B-Tree

4 Feb 2003 - 4 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de UB-Tree: basic concepts Combination of B + -Tree and Z-curve  Z-curve is used to map multidimensional points to one-dimensional values (Z-values)  Z-values are used as keys in B * -Tree  Z-curve preserves spatial-proximity  symmetric clustering Index part Data part 8178 39513951 28

5 Feb 2003 - 5 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Visualized range-queries Germany Sachsen Bayern Freiberg Leipzig Dresden Burgh München Passau Feb 2003Mar 2003Jun 2003

6 Feb 2003 - 6 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de MHC: Non-clustered hierarchy

7 Feb 2003 - 7 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de MHC: Clustered hierarchy

8 Feb 2003 - 8 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Basic technology of MHC MHC: Multidimensional Hierarchical Clustering MHC necessary because  Hierarchical organization of dimensions in warehouses  No intervals for hierarchical restrictions  Naive restrictions lead to many point queries instead of one interval on UB-Tree Artificial encoding of hierarchies:  Mapping of hierarchy restrictions to range restrictions  Mapping is used for physical clustering of the fact table  Modification of query algorithms necessary  Fast computation and space efficient

9 Feb 2003 - 9 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Implementation of MHC Implementation into Transbase® DBMS kernel  Computation and maintenance of MHC encoding  Integration into DDL and DML  Integration into optimizer  Integration into archiving tools Transparency to users  Physical optimization  No extension of the DML

10 Feb 2003 - 10 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Supported schemata Support of star schema and snowflake schema Star schemata  Conventional complete de-normaliation of the dimension tables  Foreign key relationships between fact table and dimension tables Supported snowflake schemata  Inner dimension tables de-normalized with hierarchy attributes  Feature attributes can be normalized  Fully supported by optimizer  More efficient than star schemata (knowledge about hierarchical dependency)

11 Feb 2003 - 11 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Transbase® DDL extension Dimension Table CREATE TABLE dim_segment ( country_id INTEGER NOT NULL, country_txt CHAR(*), region_id INTEGER NOT NULL, region_txt CHAR(*), micromarket_id INTEGER(*) NOT NULL, micromarket_txt CHAR(*), outlet_id INTEGER NOT NULL outlet_txt CHAR(*), SURROGATE cs_segment COMPOUND (country_id SIBLINGS 16, region_id SIBLINGS 19, micromarket_id SIBLINGS 6, outlet_id SIBLINGS 2202), PRIMARY KEY (outlet_id) )

12 Feb 2003 - 12 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Transbase® DDL extension (cont.) Fact Table: CREATE TABLE fact ( dsegINTEGER REFERENCES dim_segment(outlet_id) ON UPDATE CASCADE, dprodINTEGER REFERENCES dim_product(item_id) ON UPDATE CASCADE, dtimeINTEGER REFERENCES dim_time(day_id) ON UPDATE CASCADE, turnover NUMERIC(10,2) … SURROGATE cs_seg FOR dseg, SURROGATE cs_prod FOR dprod, SURROGATE cs_time FOR dtime, PRIMARY HCKEY (cs_seg, cs_prod, cs_time) )

13 Feb 2003 - 13 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de DML No change of DML statements (SELECT, INSERT, UPDATE, DELETE) Conventional star (snowflake) joins (SQL-92 compliant): SELECT country, department, category, group, year, quarter, month, SUM(price), SUM(turnover) FROM customer c, product p, date d, fact f WHERE f.custkey = c.customer AND f.prodkey = p.item_key AND f.datekey = d.day AND c.country = 'GERMANY' AND c.department = 'SOUTH' AND p.category = 'TV' AND d.month = '10/2002' AND d.year = '2002' GROUP BY country, department, category, group, year, month

14 Feb 2003 - 14 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Conventional query processing Standard method (non-clustering indexes):  Index evaluation of dimension restrictions  Fact table tuple materialization  Residual join with dimension tables  Grouping and aggregating  Sorting

15 Feb 2003 - 15 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de MHC query processing: Overview Abstract execution plan: better understanding, implementation in operator trees Three phases:  Interval generation (semi – join)  Fact table access  Grouping and residual join Optimizing: hierarchical pre-grouping  Minimize residual join operations by grouping before joining

16 Feb 2003 - 16 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de AEP - overview Fact Fact Table Access Group Select Order By Create Range DiDi DjDj Main Execution Phase Interval Generation...... Residual Join DkDk DiDi... Having

17 Feb 2003 - 17 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Interval generation Mapping of hierarchical restrictions into a number of intervals Usage of special hierarchy indexes:  DXh Index: (h t, h t-1,..., h 1, cs)  Efficient interval computation Optimization for feature restrictions:  Merging many small intervals to less large intervals  Usage of hierarchical dependency for feature attributes, if supported by the schema (snowflake schemata)

18 Feb 2003 - 18 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Fact table access Combination of intervals of all clustering dimensions forms multidimensional query boxes QB i Fact table access with implicit tuple materialization Sequential processing of query boxes Fast retrieving of result tuples Postfiltering can be necessary depending on the UB-Tree dimensions and restrictions

19 Feb 2003 - 19 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Standard AEP Fact Table Access Residual Join Group Select Order By DkDk Predicate Evaluation Having Fact DiDi...

20 Feb 2003 - 20 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Optimization: Hierarchical pre-grouping Basic concept  Hierarchy encoding stored in fact table (compound surrogates)  Groups of hierarchical GROUP BY attributes built from compound surrogates  Grouping not exact for non-prefix path grouping  Drastic reduction of fact table result tuples  Example (for hierarchy year – month – day): number of fact table result tuples: 100.000 pre-grouping (on month): ca. 3.000 (aggregated) tuples  residual join with 3.000 instead of 100.000 tuples  reduction by a factor of 30!  Possibly post-grouping necessary for too fine pre-grouping

21 Feb 2003 - 21 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Hierarchical pre-grouping (cont.) D ln Fact Table Access Post-Group Order By Pre-Group Residual Join Having Predicate Evaluation Fact Residual Join D ei D e1 D l1...

22 Feb 2003 - 22 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Performance comparison Data:  Real world data warehouse of electronic retailer in Greece  5 dimensions, 49 measures on fact table  3 years of transactions, i.e., 8,5 million fact table tuples (2,8 GB) Environment  2 Processor Pentium II (400 MHz), 768 MB RAM, Windows 2000 Queries  22 query classes with 1.320 real world user queries Comparisons  MHC versus no multidimensional clustering  Conventional grouping versus hierarchical pre-grouping

23 Feb 2003 - 23 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Perf. comp: MHC – no clustering FT Sel. %[0.0-0.1][0.1-1.0][1.0-5.0] STARAEPSTARAEPSTARAEP MIN0065227411 MAX3062909121947 MEDIAN11182847723 STD-DEV5176334614 Time of fact tuple access in seconds

24 Feb 2003 - 24 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Perf. comp: no pre-grouping – pre-grouping FT Sel. %All[0.0 - 0.25][0.25 - 1.0][1.0 - 10.0] MIN3,6 21,346,0 1. Quartile245,8135,1911,3816,2 MEDIAN1.139,5531,62.270,45.938,9 3. Quartile4.708,01.905,69.747,525.409,6 MAX593.280,019.340,078.384,0593.280,0 Comparison of grouping Cardinality: No pre-grouping / Hier. pre-grouping

25 Feb 2003 - 25 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Perf. comp: no pre-grouping – pre-grouping Speedup of the time of hierarchical pre-grouping FT Sel. %ALL[0.0 - 0.25][0.25 - 1.0][1.0 - 10.0] MIN0,3 0,80,6 1. Quartile3,02,43,94,6 MEDIAN4,43,65,86,6 3. Quartile6,55,27,27,8 MAX25,514,325,512,6

26 Feb 2003 - 26 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Summary MHC: Multidimensional hierarchical clustering  Encoding for hierarchy paths, in order to support clustering multidimensional indexes  Support of star and snowflake schemata Full implementation into Transbase®  Integration into the query processor (maintenance of compound surrogates)  Integration into the optimizer (interval generation, fact table access, hierarchical pre-grouping) Significant speedup of performance:  Clustering vs. non-clustering organization: 2-20  Conventional grouping vs. hierarchical pre-grouping: 4-7

27 Feb 2003 - 27 - BTW 2003 Transbase® Hypercube ©2003 Transaction Software GmbH www.transaction.de Questions ???? Everything clear? Otherwise contact: Roland Pieringer Tel: 089/62709-0 Transaction Software GmbH pieringer@transaction.de www.transaction.de


Download ppt "Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH."

Similar presentations


Ads by Google