Presentation is loading. Please wait.

Presentation is loading. Please wait.

CURE for Cubes: C ubing U sing a R OLAP E ngine Konstantinos Morfonios Yannis Ioannidis University of Athens VLDB 2006.

Similar presentations


Presentation on theme: "CURE for Cubes: C ubing U sing a R OLAP E ngine Konstantinos Morfonios Yannis Ioannidis University of Athens VLDB 2006."— Presentation transcript:

1 CURE for Cubes: C ubing U sing a R OLAP E ngine Konstantinos Morfonios Yannis Ioannidis University of Athens VLDB 2006

2 Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions

3 Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions

4 Introduction SELECT region, sum(revenue) FROM SALES WHERE month = ‘September’ GROUP BY region G ray O n D ata-warehousing: CUBE

5 Introduction SELECT A, B, C, SUM(M) FROM R GROUP BY A, B, C SELECT A, B, SUM(M) FROM R GROUP BY A, B SELECT SUM(M) FROM R

6 Introduction Problems  Construction algorithm  Storage scheme Focusing on ROLAP techniques (MVs)  Stressed to limits?  Complete solution? Unclear (not finished with efficient storage) Unclear (not focused on hierarchies)

7 Introduction Number of nodes: often Efficient execution plan Small domains in the higher levels of dimension hierarchies New partitioning algorithm Challenges of hierarchies: Number of tuples increases Novel storage scheme  CURE

8 Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions

9 Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions

10 Execution Plan Extend BUC (Bottom-Up-Cube) [BR99]  Efficient pipelining  Cheap identification of some kinds of redundancy  Inherent support for iceberg cubes and holistic functions Existing “BUC-based” methods: BU-BST [WLFY02] and QC-Tables [LPH02]

11 Execution Plan Dimensions: A, B, C ABC ACBCAB BCA 

12 Execution Plan Dimensions: A 0 →A 1 →A 2, B 0 →B 1, C 0

13 Execution Plan Dimensions: A 0, A 1, A 2, B 0, B 1, C 0 A0B0A0B0 A0B1A0B1 A0C0A0C0 B0C0B0C0 B1C0B1C0  A0A0 B0B0 B1B1 C0C0 A0B0C0A0B0C0 A0B1C0A0B1C0 A1B0A1B0 A1B1A1B1 A1C0A1C0 A1A1 A1B0C0A1B0C0 A1B1C0A1B1C0 A2B0A2B0 A2B1A2B1 A2C0A2C0 A2A2 A2B0C0A2B0C0 A2B1C0A2B1C0

14 Execution Plan Dimensions: A 0, A 1, A 2, B 0, B 1, C 0 A0B0A0B0 A0B1A0B1 A0C0A0C0 B0C0B0C0 B1C0B1C0  A0A0 B0B0 B1B1 C0C0 A0B0C0A0B0C0 A0B1C0A0B1C0 A1B0A1B0 A1B1A1B1 A1C0A1C0 A1A1 A1B0C0A1B0C0 A1B1C0A1B1C0 A2B0A2B0 A2B1A2B1 A2C0A2C0 A2A2 A2B0C0A2B0C0 A2B1C0A2B1C0

15 Execution Plan Height: 3 Dimensions: A 0, A 1, A 2, B 0, B 1, C 0 A0B0A0B0 A0B1A0B1 A0C0A0C0 B0C0B0C0 B1C0B1C0  A0A0 B0B0 B1B1 C0C0 A0B0C0A0B0C0 A0B1C0A0B1C0 A1B0A1B0 A1B1A1B1 A1C0A1C0 A1A1 A1B0C0A1B0C0 A1B1C0A1B1C0 A2B0A2B0 A2B1A2B1 A2C0A2C0 A2A2 A2B0C0A2B0C0 A2B1C0A2B1C0

16 Execution Plan Dimensions: A 0 →A 1 →A 2, B 0 →B 1, C 0 A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0

17 Execution Plan Dimensions: A 0 →A 1 →A 2, B 0 →B 1, C 0 A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0

18 Execution Plan Height: 6 Dimensions: A 0 →A 1 →A 2, B 0 →B 1, C 0 A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0

19 Execution Plan Important properties of BUC-based cubing:  Recursive calls at higher levels tend to be cheaper  Benefits from early pruning recursion at some node N increase with the number of ancestors of N in the execution plan Advantage of taller execution plans ABC ACBCAB BCA  ABC ACAB A

20 Execution Plan A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 CURE’s Plan:

21 Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions

22 Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions

23 External Partitioning Memory R

24 R A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 External Partitioning A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0

25 R Memory External Partitioning

26 R Memory Partitions

27 External Partitioning R Memory Sound

28 External Partitioning For sound partitioning |Biggest partition| ≤ |M| In flat datasets this holds in general In hierarchical datasets…

29 External Partitioning |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500  A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0

30 External Partitioning A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500

31 External Partitioning A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500

32 External Partitioning  A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500

33 External Partitioning A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500

34 External Partitioning  A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500

35 External Partitioning A2B1A2B1 B1B1  A2A2 C0C0 A2C0A2C0 B0B0 B1C0B1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A2B0C0A2B0C0 A1A1 A0A0 A1B1A1B1 A1C0A1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500

36 External Partitioning A2B1A2B1 B1B1  A2A2 C0C0 A2C0A2C0 B0B0 B1C0B1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A2B0C0A2B0C0 A1A1 A0A0 A1B1A1B1 A1C0A1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500

37 External Partitioning |A 0 |/|A 2 | times smaller than R  |A 2 B 0 C 0 | ≈ 50 MB A2B1A2B1 B1B1  A2A2 C0C0 A2C0A2C0 B0B0 B1C0B1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A2B0C0A2B0C0 A1A1 A0A0 A1B1A1B1 A1C0A1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500

38 External Partitioning A2B1A2B1 B1B1  A2A2 C0C0 A2C0A2C0 B0B0 B1C0B1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A2B0C0A2B0C0 A1A1 A0A0 A1B1A1B1 A1C0A1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0 |R| = 500 GB, |M| = 1 GB A 0 (50,000)→A 1 (500)→A 2 (5) |R|/|M| = 500

39 Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions

40 Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions

41 Storage Format Two types of redundancy  Dimensional Redundancy (DR)  Aggregational Redundancy (AR)

42 Storage Format ABC ACBCAB BCA  Example with flat cube only for simplicity A2B1A2B1 B1B1  A2A2 C0C0 A1A1 A2C0A2C0 B0B0 B1C0B1C0 A0A0 A1B1A1B1 A1C0A1C0 A2B0A2B0 B0C0B0C0 A2B1C0A2B1C0 A0B1A0B1 A0C0A0C0 A1B0A1B0 A1B1C0A1B1C0 A2B0C0A2B0C0 A0B0A0B0 A0B1C0A0B1C0 A1B0C0A1B0C0 A0B0C0A0B0C0

43 Storage Format CUBE with DRCUBE’ without DR t t1t1 t2t2 t’

44 Storage Format CUBE with DRCUBE’ without DR t t1t1 t2t2 t’

45 Storage Format CUBE with DR t t1t1 t2t2 t’ CUBE’ without DR

46 Storage Format CUBE with DRCUBE’ without DR

47 Storage Format CUBE with DRCUBE’ without DR

48 Storage Format CUBE with DRCUBE’ without DR Classify tuples according to AR into: Normal Tuples (NTs) Trivial Tuples (TTs) Common Aggregate Tuples (CATs)

49 Storage Format

50

51

52

53

54

55

56

57

58 Purpose of the previous example:  Explanation of different types of redundancy  Not construction algorithm Constructing an uncompressed cube and then compressing it would be inefficient Instead, CURE classifies tuples during construction itself (details in the paper)

59 Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions

60 Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions

61 Experimental Evaluation Hierarchical datasets: APB-1  Product: Code (6,500) → Class (435) → Group (215) → Family (54) → Line (11) → Division (3)  Customer: Store (640) → Retailer (71)  Time: Month (17) → Quarter (6) → Year (2)  Channel: Base (9) Flat datasets: CovType, Sep85L, Synthetic

62 Experimental Evaluation Two versions of CURE:  CURE  CURE+

63 Experimental Evaluation Less than 3 hours

64 Experimental Evaluation ≈ 6.8 GB

65 Experimental Evaluation

66

67

68 Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions

69 Introduction Execution Plan External Partitioning Storage Format Experimental Evaluation Conclusions

70 Main contribution: CURE  Efficient execution plan  New partitioning algorithm  Novel storage scheme Main advantages of CURE  Efficient construction of complete cubes over large datasets with arbitrary hierarchies  Cube compression  Optimization opportunities for queries and updates  Easy implementation

71 Current and Future Work Study of indexing for queries and updates Comparison with the most prominent MOLAP and Tree-based techniques

72 Questions???

73 Thank you!

74 Storage Format Memory Image Disk Image

75 Storage Format Memory Image Disk Image 45 65 100 110 150

76 Storage Format Memory Image Disk Image 150

77 Storage Format Memory Image Disk Image

78 Storage Format Memory Image Disk Image

79 Storage Format Memory Image Disk Image

80 Storage Format Memory Image Disk Image 20 30

81 Storage Format Memory Image Disk Image 30

82 Storage Format Memory Image Disk Image

83 Storage Format Memory Image Disk Image

84 Storage Format Memory Image Disk Image

85 Storage Format Memory Image Disk Image

86 Storage Format Memory Image Disk Image

87 Storage Format Memory Image Disk Image

88 Storage Format Memory Image Disk Image

89 Storage Format Memory Image Disk Image

90 Storage Format Memory Image Disk Image

91 Storage Format Memory Image Disk Image

92 Storage Format Memory Image Disk Image

93 Storage Format Memory Image Disk Image

94 Storage Format Memory Image Disk Image

95 Storage Format Memory Image Disk Image

96 Storage Format Memory Image Disk Image

97 Storage Format Memory Image Disk Image

98 Storage Format Memory Image Disk Image

99 Storage Format Memory Image Disk Image


Download ppt "CURE for Cubes: C ubing U sing a R OLAP E ngine Konstantinos Morfonios Yannis Ioannidis University of Athens VLDB 2006."

Similar presentations


Ads by Google