Presentation is loading. Please wait.

Presentation is loading. Please wait.

PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. Nov 12, 2004.

Similar presentations


Presentation on theme: "PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. Nov 12, 2004."— Presentation transcript:

1 PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004

2 DOLAP 2004 2Jianlin Feng Outline l Introduction l Related Work l ODM: Ordered Datacube Model l BST-Condensed Cube l Prefix-sharing Condensed Cube l Comparisons l Conclusions

3 DOLAP 2004 3Jianlin Feng Introduction l Data Cube (ICDE’96) –N-dimensional cube(A1, A2, …, A N ) –2 N cuboids, i.e. GROUP-BYs l The Huge Size Problem –When R is sparse, the size of a cuboid is possibly close to the size of R. –The I/O cost even for storing the cube result tuples becomes dominative.

4 DOLAP 2004 4Jianlin Feng Related Work l Condensed Cube (ICDE’02) l Dwarf (SIGMOD’02) l Quotient Cube (VLDB’02) l QC-Tree (SIGMOD’03) l Basic idea: remove redundancies existing among cube tuples. –prefix redundancy –suffix redundancy

5 DOLAP 2004 5Jianlin Feng Prefix redundancy l Given an example cube(A, B, C) –Each value of dimension A occurs in 4 cuboids: cuboid(A), (AB), (AC) and (ABC) –Possibly many times in each cuboid except cuboid(A) l Inter-cuboid and Intra-cuboid prefix redundancy

6 DOLAP 2004 6Jianlin Feng Suffix Redundancy l Occurs when cube tuples belonging to different cuboids are actually aggregated from the same group of base relation tuples. l An extreme case –Let the source relation R have only one single tuple r(a 1, a 2, …, a n, m); –2 n cube tuples can be condensed into one physical tuple: (a 1, a 2, …, a n, V), where V = aggr(r); –together with some information indicating that it is a representative tuple.

7 DOLAP 2004 7Jianlin Feng Thinking… l Condensed cube –It condenses those cube tuples, aggregated from one single base tuple, into a physical tuple in order to reduce cube’s size. l Dwarf –Besides suffix coalescing, i.e. multi-base- tuple condensing, it also realized full prefix- sharing so as to achieve high cube size reducing effectiveness.

8 DOLAP 2004 8Jianlin Feng Motivation l HOW to further reduce condensed cube’s size while taking into account query characteristics we intend to answer - range query? l Augmenting BST-condensing with removing of intra-cuboid prefix redundancy!

9 DOLAP 2004 9Jianlin Feng Ordered Datacube Model l Value ALL(or *) is encoded as 0. l A dimension D and its cardinality C –each dimension value is one-to-one mapped to an integer value between 1 and C inclusively. l N dimensions form a N-dimensional space. l The origin O(0, 0, …, 0) represents the grand total.

10 DOLAP 2004 10Jianlin Feng Ordered Datacube Model l Under ODM, a range query against a data cube can actually be reduced to a sub-query against only one particular cuboid in the cube or a union of such sub-queries.

11 DOLAP 2004 11Jianlin Feng BST-Condensed Cube l Base Single Tuple (BST) –t1 is a BST on SD {A} and {B} –t2 is a BST on SD {B} l A unique minimal BST-Condensed Cube can be got when fully taking advantage of each BST with all of its SDs - MinCube.

12 DOLAP 2004 12Jianlin Feng BU-BST Condensed Cube l BottomUpBST algorithms (ICDE’02) l Each BST corresponds to only one SD. l It’s easier to compute and to restore normal cube tuple from condensed cube compared with MinCube. Note: BST Condensing is a special kind of Prefix-sharing ! A group of cube tuples with sharing prefix are represented by a BST!

13 DOLAP 2004 13Jianlin Feng A BU-BST Condensed Cube Example Note: Intra-cuboid prefix redundancy: ct3 and ct4 Inter-cuboid prefix redundancy: ct2, ct3 and ct5

14 DOLAP 2004 14Jianlin Feng Prefix-sharing Condensed Cube - PrefixCube BST Condensing + Intra-cuboid prefix-sharing Intra-cuboid prefix-sharing Prefix-sharing PrefixCube

15 DOLAP 2004 15Jianlin Feng A PrefixCube Example

16 DOLAP 2004 16Jianlin Feng Corresponding Dwarf

17 DOLAP 2004 17Jianlin Feng PrefixCube vs. Dwarf PrefixCubeDwarf Prefix-sharingIntra-cuboidInter- and Intra-cuboid PrefixCube does not aim at blindly achieving effective compression ratio, but it is intended to make a good compromise among cube size reducing ratio, restoring and updating costs, and query characteristics! Suffix Coalescing BST Condensing Multi-tuple Condensing Compression Ratio LowerHigher Saving extra value ALL? NoYes Tuple clustered by cuboid? YesNo

18 DOLAP 2004 18Jianlin Feng Effectiveness of Size Reduction l Datasets –synthetic datasets with uniform distribution –# of tuples: 1,000,000 (a) Cardinality = 100 (b) Cardinality = 1000

19 DOLAP 2004 19Jianlin Feng Effectiveness of Size Reduction l PrefixBUC –Full Cube (computed by BUC) –Prefix-sharing

20 DOLAP 2004 20Jianlin Feng Impact of Data Density l Datasets –Uniform distribution –# of dimensions: 6 –Cardinality of dimensions: 100 –# of tuples: range from 1,000 to 1,000,000

21 DOLAP 2004 21Jianlin Feng Impact of Data Skewness l Datasets –Zipf distribution –# of tuples: 1,000,000 –Cardinality of dimensions: range from 1,000 to 500 with 100 interval –Zipf factor: range from 0 to 0.8 with 0.2 interval

22 DOLAP 2004 22Jianlin Feng Real-world Dataset l Datasets –Weather Datasets –# of tuples: 1,015,367

23 DOLAP 2004 23Jianlin Feng Conclusion l A new cube structure PrefixCube was proposed by augmenting BU-BST condensing with intra-cuboid prefix- sharing. –It can greatly reduce data cube’s size compared with BU-BST condensed cube. –It can also reduce the impact of data skew on BU-BST condensing. –It can make a quite stable size reduction on both dense and sparse datasets.

24 DOLAP 2004 24Jianlin Feng The End Thank u! Any question?


Download ppt "PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. Nov 12, 2004."

Similar presentations


Ads by Google