Download presentation
Presentation is loading. Please wait.
Published byJeffry McKenzie Modified over 9 years ago
1
PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004
2
DOLAP 2004 2Jianlin Feng Outline l Introduction l Related Work l ODM: Ordered Datacube Model l BST-Condensed Cube l Prefix-sharing Condensed Cube l Comparisons l Conclusions
3
DOLAP 2004 3Jianlin Feng Introduction l Data Cube (ICDE’96) –N-dimensional cube(A1, A2, …, A N ) –2 N cuboids, i.e. GROUP-BYs l The Huge Size Problem –When R is sparse, the size of a cuboid is possibly close to the size of R. –The I/O cost even for storing the cube result tuples becomes dominative.
4
DOLAP 2004 4Jianlin Feng Related Work l Condensed Cube (ICDE’02) l Dwarf (SIGMOD’02) l Quotient Cube (VLDB’02) l QC-Tree (SIGMOD’03) l Basic idea: remove redundancies existing among cube tuples. –prefix redundancy –suffix redundancy
5
DOLAP 2004 5Jianlin Feng Prefix redundancy l Given an example cube(A, B, C) –Each value of dimension A occurs in 4 cuboids: cuboid(A), (AB), (AC) and (ABC) –Possibly many times in each cuboid except cuboid(A) l Inter-cuboid and Intra-cuboid prefix redundancy
6
DOLAP 2004 6Jianlin Feng Suffix Redundancy l Occurs when cube tuples belonging to different cuboids are actually aggregated from the same group of base relation tuples. l An extreme case –Let the source relation R have only one single tuple r(a 1, a 2, …, a n, m); –2 n cube tuples can be condensed into one physical tuple: (a 1, a 2, …, a n, V), where V = aggr(r); –together with some information indicating that it is a representative tuple.
7
DOLAP 2004 7Jianlin Feng Thinking… l Condensed cube –It condenses those cube tuples, aggregated from one single base tuple, into a physical tuple in order to reduce cube’s size. l Dwarf –Besides suffix coalescing, i.e. multi-base- tuple condensing, it also realized full prefix- sharing so as to achieve high cube size reducing effectiveness.
8
DOLAP 2004 8Jianlin Feng Motivation l HOW to further reduce condensed cube’s size while taking into account query characteristics we intend to answer - range query? l Augmenting BST-condensing with removing of intra-cuboid prefix redundancy!
9
DOLAP 2004 9Jianlin Feng Ordered Datacube Model l Value ALL(or *) is encoded as 0. l A dimension D and its cardinality C –each dimension value is one-to-one mapped to an integer value between 1 and C inclusively. l N dimensions form a N-dimensional space. l The origin O(0, 0, …, 0) represents the grand total.
10
DOLAP 2004 10Jianlin Feng Ordered Datacube Model l Under ODM, a range query against a data cube can actually be reduced to a sub-query against only one particular cuboid in the cube or a union of such sub-queries.
11
DOLAP 2004 11Jianlin Feng BST-Condensed Cube l Base Single Tuple (BST) –t1 is a BST on SD {A} and {B} –t2 is a BST on SD {B} l A unique minimal BST-Condensed Cube can be got when fully taking advantage of each BST with all of its SDs - MinCube.
12
DOLAP 2004 12Jianlin Feng BU-BST Condensed Cube l BottomUpBST algorithms (ICDE’02) l Each BST corresponds to only one SD. l It’s easier to compute and to restore normal cube tuple from condensed cube compared with MinCube. Note: BST Condensing is a special kind of Prefix-sharing ! A group of cube tuples with sharing prefix are represented by a BST!
13
DOLAP 2004 13Jianlin Feng A BU-BST Condensed Cube Example Note: Intra-cuboid prefix redundancy: ct3 and ct4 Inter-cuboid prefix redundancy: ct2, ct3 and ct5
14
DOLAP 2004 14Jianlin Feng Prefix-sharing Condensed Cube - PrefixCube BST Condensing + Intra-cuboid prefix-sharing Intra-cuboid prefix-sharing Prefix-sharing PrefixCube
15
DOLAP 2004 15Jianlin Feng A PrefixCube Example
16
DOLAP 2004 16Jianlin Feng Corresponding Dwarf
17
DOLAP 2004 17Jianlin Feng PrefixCube vs. Dwarf PrefixCubeDwarf Prefix-sharingIntra-cuboidInter- and Intra-cuboid PrefixCube does not aim at blindly achieving effective compression ratio, but it is intended to make a good compromise among cube size reducing ratio, restoring and updating costs, and query characteristics! Suffix Coalescing BST Condensing Multi-tuple Condensing Compression Ratio LowerHigher Saving extra value ALL? NoYes Tuple clustered by cuboid? YesNo
18
DOLAP 2004 18Jianlin Feng Effectiveness of Size Reduction l Datasets –synthetic datasets with uniform distribution –# of tuples: 1,000,000 (a) Cardinality = 100 (b) Cardinality = 1000
19
DOLAP 2004 19Jianlin Feng Effectiveness of Size Reduction l PrefixBUC –Full Cube (computed by BUC) –Prefix-sharing
20
DOLAP 2004 20Jianlin Feng Impact of Data Density l Datasets –Uniform distribution –# of dimensions: 6 –Cardinality of dimensions: 100 –# of tuples: range from 1,000 to 1,000,000
21
DOLAP 2004 21Jianlin Feng Impact of Data Skewness l Datasets –Zipf distribution –# of tuples: 1,000,000 –Cardinality of dimensions: range from 1,000 to 500 with 100 interval –Zipf factor: range from 0 to 0.8 with 0.2 interval
22
DOLAP 2004 22Jianlin Feng Real-world Dataset l Datasets –Weather Datasets –# of tuples: 1,015,367
23
DOLAP 2004 23Jianlin Feng Conclusion l A new cube structure PrefixCube was proposed by augmenting BU-BST condensing with intra-cuboid prefix- sharing. –It can greatly reduce data cube’s size compared with BU-BST condensed cube. –It can also reduce the impact of data skew on BU-BST condensing. –It can make a quite stable size reduction on both dense and sparse datasets.
24
DOLAP 2004 24Jianlin Feng The End Thank u! Any question?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.