Download presentation
Presentation is loading. Please wait.
Published byEric Sherman Atkinson Modified over 9 years ago
1
CURE for Cubes: C ubing U sing a R OLAP E ngine Konstantinos Morfonios Yannis Ioannidis University of Athens VLDB 2006
2
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
3
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
4
Introduction SELECT region, sum(revenue) FROM SALES WHERE month = ‘September’ GROUP BY region G ray O n D ata-warehousing: CUBE
5
Introduction SELECT A, B, C, SUM(M) FROM R GROUP BY A, B, C SELECT A, B, SUM(M) FROM R GROUP BY A, B SELECT SUM(M) FROM R
6
Introduction Problems Construction algorithm Storage scheme Focusing on ROLAP techniques (MVs) Stressed to limits? Complete solution? Unclear (not finished with efficient storage) Unclear (not focused on hierarchies)
7
Introduction Number of nodes: often Efficient execution plan Small domains in the higher levels of dimension hierarchies New partitioning algorithm Challenges of hierarchies: Number of tuples increases Novel storage scheme CURE
8
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
9
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
10
Execution Plan Extend BUC (Bottom-Up-Cube) [BR99] Efficient pipelining Cheap identification of some kinds of redundancy Inherent support for iceberg cubes and holistic functions Existing “BUC-based” methods: BU-BST [WLFY02] and QC-Tables [LPH02]
11
Execution Plan Dimensions: A, B, C ABC ACBCAB BCA
12
Execution Plan Dimensions: A 0 →A 1 →A 2, B 0 →B 1, C 0
13
Execution Plan Dimensions: A 0, A 1, A 2, B 0, B 1, C 0 A0B0A0B0 A0B1A0B1 A0C0A0C0 B0C0B0C0 B1C0B1C0 A0A0 B0B0 B1B1 C0C0 A0B0C0A0B0C0 A0B1C0A0B1C0 A1B0A1B0 A1B1A1B1 A1C0A1C0 A1A1 A1B0C0A1B0C0 A1B1C0A1B1C0 A2B0A2B0 A2B1A2B1 A2C0A2C0 A2A2 A2B0C0A2B0C0 A2B1C0A2B1C0
14
Execution Plan Dimensions: A 0, A 1, A 2, B 0, B 1, C 0 A0B0A0B0 A0B1A0B1 A0C0A0C0 B0C0B0C0 B1C0B1C0 A0A0 B0B0 B1B1 C0C0 A0B0C0A0B0C0 A0B1C0A0B1C0 A1B0A1B0 A1B1A1B1 A1C0A1C0 A1A1 A1B0C0A1B0C0 A1B1C0A1B1C0 A2B0A2B0 A2B1A2B1 A2C0A2C0 A2A2 A2B0C0A2B0C0 A2B1C0A2B1C0
15
Execution Plan Height: 3 Dimensions: A 0, A 1, A 2, B 0, B 1, C 0 A0B0A0B0 A0B1A0B1 A0C0A0C0 B0C0B0C0 B1C0B1C0 A0A0 B0B0 B1B1 C0C0 A0B0C0A0B0C0 A0B1C0A0B1C0 A1B0A1B0 A1B1A1B1 A1C0A1C0 A1A1 A1B0C0A1B0C0 A1B1C0A1B1C0 A2B0A2B0 A2B1A2B1 A2C0A2C0 A2A2 A2B0C0A2B0C0 A2B1C0A2B1C0
16
Execution Plan Dimensions: A 0 →A 1 →A 2, B 0 →B 1, C 0 A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0
17
Execution Plan Dimensions: A 0 →A 1 →A 2, B 0 →B 1, C 0 A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0
18
Execution Plan Height: 6 Dimensions: A 0 →A 1 →A 2, B 0 →B 1, C 0 A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0
19
Execution Plan Important properties of BUC-based cubing: Recursive calls at higher levels tend to be cheaper Benefits from early pruning recursion at some node N increase with the number of ancestors of N in the execution plan Advantage of taller execution plans ABC ACBCAB BCA ABC ACAB A
20
Execution Plan A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 CURE’s Plan:
21
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
22
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
23
External Partitioning Memory R
24
R A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 External Partitioning A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0
25
R Memory External Partitioning
26
R Memory Partitions
27
External Partitioning R Memory Sound
28
External Partitioning For sound partitioning |Biggest partition| ≤ |M| In flat datasets this holds in general In hierarchical datasets…
29
External Partitioning |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500 A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0
30
External Partitioning A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500
31
External Partitioning A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500
32
External Partitioning A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500
33
External Partitioning A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500
34
External Partitioning A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500
35
External Partitioning A2B1A2B1 B1B1 A2A2 C0C0 A2C0A2C0 B0B0 B1C0B1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A2B0C0A2B0C0 A1A1 A0A0 A1B1A1B1 A1C0A1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500
36
External Partitioning A2B1A2B1 B1B1 A2A2 C0C0 A2C0A2C0 B0B0 B1C0B1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A2B0C0A2B0C0 A1A1 A0A0 A1B1A1B1 A1C0A1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500
37
External Partitioning |A 0 |/|A 2 | times smaller than R |A 2 B 0 C 0 | ≈ 50 MB A2B1A2B1 B1B1 A2A2 C0C0 A2C0A2C0 B0B0 B1C0B1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A2B0C0A2B0C0 A1A1 A0A0 A1B1A1B1 A1C0A1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500
38
External Partitioning A2B1A2B1 B1B1 A2A2 C0C0 A2C0A2C0 B0B0 B1C0B1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A2B0C0A2B0C0 A1A1 A0A0 A1B1A1B1 A1C0A1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500
39
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
40
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
41
Storage Format Two types of redundancy Dimensional Redundancy (DR) Aggregational Redundancy (AR)
42
Storage Format ABC ACBCAB BCA Example with flat cube only for simplicity A2B1A2B1 B1B1 A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0
43
Storage Format CUBE with DRCUBE’ without DR t t1t1 t2t2 t’
44
Storage Format CUBE with DRCUBE’ without DR t t1t1 t2t2 t’
45
Storage Format CUBE with DR t t1t1 t2t2 t’ CUBE’ without DR
46
Storage Format CUBE with DRCUBE’ without DR
47
Storage Format CUBE with DRCUBE’ without DR
48
Storage Format CUBE with DRCUBE’ without DR Classify tuples according to AR into: Normal Tuples (NTs) Trivial Tuples (TTs) Common Aggregate Tuples (CATs)
49
Storage Format
58
Purpose of the previous example: Explanation of different types of redundancy Not construction algorithm Constructing an uncompressed cube and then compressing it would be inefficient Instead, CURE classifies tuples during construction itself (details in the paper)
59
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
60
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
61
Experimental Evaluation Hierarchical datasets: APB-1 Product: Code (6,500) → Class (435) → Group (215) → Family (54) → Line (11) → Division (3) Customer: Store (640) → Retailer (71) Time: Month (17) → Quarter (6) → Year (2) Channel: Base (9) Flat datasets: CovType, Sep85L, Synthetic
62
Experimental Evaluation Two versions of CURE: CURE CURE+
63
Experimental Evaluation Less than 3 hours
64
Experimental Evaluation ≈ 6.8 GB
65
Experimental Evaluation
68
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
69
Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions
70
Main contribution: CURE Efficient execution plan New partitioning algorithm Novel storage scheme Main advantages of CURE Efficient construction of complete cubes over large datasets with arbitrary hierarchies Cube compression Optimization opportunities for queries and updates Easy implementation
71
Current and Future Work Study of indexing for queries and updates Comparison with the most prominent MOLAP and Tree-based techniques
72
Questions???
73
Thank you!
74
Storage Format Memory Image Disk Image
75
Storage Format Memory Image Disk Image 45 65 100 110 150
76
Storage Format Memory Image Disk Image 150
77
Storage Format Memory Image Disk Image
78
Storage Format Memory Image Disk Image
79
Storage Format Memory Image Disk Image
80
Storage Format Memory Image Disk Image 20 30
81
Storage Format Memory Image Disk Image 30
82
Storage Format Memory Image Disk Image
83
Storage Format Memory Image Disk Image
84
Storage Format Memory Image Disk Image
85
Storage Format Memory Image Disk Image
86
Storage Format Memory Image Disk Image
87
Storage Format Memory Image Disk Image
88
Storage Format Memory Image Disk Image
89
Storage Format Memory Image Disk Image
90
Storage Format Memory Image Disk Image
91
Storage Format Memory Image Disk Image
92
Storage Format Memory Image Disk Image
93
Storage Format Memory Image Disk Image
94
Storage Format Memory Image Disk Image
95
Storage Format Memory Image Disk Image
96
Storage Format Memory Image Disk Image
97
Storage Format Memory Image Disk Image
98
Storage Format Memory Image Disk Image
99
Storage Format Memory Image Disk Image
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.