Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.

Similar presentations


Presentation on theme: "1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung."— Presentation transcript:

1 1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung

2 2 Content  Introduction  Existing Methods  Proposed Method: Partitioned-Cube  Memory-Cube  Experiment  Conclusion

3 3 Introduction  Datacubes queries compute aggregates over database relations at a variety of granularities.  Cube by: Product, Country, Date  Aggregation Function: Sum(Sales)

4 4 Sparseness  Cardinality is a small fraction of the size of the cross product of the attribute domains.  Interest in sparse relations, as effective datacube computation is important.

5 5 Problem  Large Domain with CUBE BY attributes  Large number of CUBE BY attributes  Existing methods are not efficient We Need Something New  Partitioned - Cube

6 6 Existing Methods  PIPESORT  Optimize overall cost by evaluating each path  Poor performance when the relation is sparse  Lower bound of no. of sorting is  Large I / O cost for huge cuboids

7 7  OVERLAP  Minimize Disk Access by overlapping cuboids  But I / O cost is at least quadratic in k, even given memory-sized partition  Classify the cuboids into “Partition” and “SortRun” state  I / O depends on the partition size and number of sorted runs

8 8  Array – Based Algorithms  Partitioned the data, and store fragments to memory. Data Compression may be applied  Allow direct access to the memory cells  For sparse data, array fragments may not be fit into memory. Then, a more costly data structure would be required

9 9 Partitioned-Cube Partition the large relations into fragments that can be fitted into the memory  It follows the recursive structure of datacubes  A sub-datacube is obtained by fixing each possible value of a CUBE BY attribute

10 10 Partitioned-Cube(cont.) Algorithm Partition-Cube(R, {B 1, …, B m }, A, G) R: a set of tuples {B 1, …, B n }: CUBE BY attribute A: attribute to be aggregated G: aggregate function F: finest granularity datacube tuples D: remaining tuples Step 1:if (R fits in memory) then return Memory-Cube(R, {B 1, …, B n }, A, G) Step 2:scan R, partition on B j in {B 1, …, B n } Step 3:for (i = 1 to n) (F i, D i ) = Partition-Cube(R i, {B 1, …, B n }, A, G) Step 4:let F = union of F i ’s Step 5:let (F’, D’) = Partition-Cube(F, {B 1, … B m }, A, G) Step 6:let D = union of F’, D’ and D i ’s Step 7:return (F, D) CountryYearSale s US200010 US20015 US20008 US20026 HK20006 HK20018 HK20017 HK20027

11 11 Partitioned-Cube(cont.) STEP 1: Partition the large relations into fragments that can be fitted into the memory CountryYearSale s US200010 US20015 US20008 US20026 HK20006 HK20018 HK20017 HK20027 CountryYearSale s US200010 US20015 US20008 US20026 CountryYearSale s HK20006 HK20018 HK20017 HK20027 R R1R1 R2R2

12 12 Partitioned-Cube(cont.) STEP2: Compute the tuples in the corresponding sub-datacube CountryYearSales US200010 US20015 US20008 US20026 R1R1 F1F1 D1D1 CountryYearSales US200018 US20015 US20026 CountryYearSales USALL29

13 13 Partitioned-Cube(cont.) STEP3: In the same way, Compute F2 and D2 CountryYearSales HK20006 HK20018 HK20017 HK20027 R2R2 F2F2 D2D2 CountryYearSales HK20006 HK200115 HK20027 CountryYearSales HKALL28

14 14 Partitioned-Cube(cont.) Step 4:F= Step 5: by recursively call this function, get F’ and D’ CountryYearSales US200018 US20015 US20026 HK20006 HK200115 HK20017 F CountryYearSales All200024 All200120 All200213 F’ D’ CountryYearSales All 57

15 15 Partitioned-Cube(cont.) Step 6: Step 7: return F, D CountryYearSales US200018 US20015 US20026 HK20006 HK200115 HK20027 F CountryYearSales All200024 All200121 All200213 CountryYearSales All 57 F’ D’ CountryYearSales USALL29 CountryYearSales HKALL28 D1D1 D2D2 D

16 16 Partitioned-Cube(cont.) Recursively execute STEP2 if there are more than 2 attributes CountryYearSales US200010 US20015 US20008 US20026 R1R1 F1F1 D1D1 CountryYearSales US200018 US20015 US20026 CountryYearSales USALL29

17 17 Memory-Cube  Perform complex operation over each fragment independently  Minimize the total no. of paths in searching lattice  Share the sort work  Compute the tuples in the corresponding sub-datacube  Compute the datacube tuples with the value ALL for the attributes

18 18 Memory-Cube  Minimize the total no. of paths in searching lattice G(1) =D  Є G(2) =CD  C  Є D G(3) =BCD  BC  B  Є BD  D CD  C G(4) = ABCD  ABC  AB  A  Є ABD  AD  D ACD  AC  C BCD  BC  B BD CD 6 = 4 C 2

19 19 Memory-Cube  Share Sort Work  Re-Order the sorting sequence can improve the performance  Sorting result on shorter relation can be reused in longer relation  E.g. S6 = CD, S3 = CAD After sorting S6, for S3, the entire relation does not have to be resorted, only each block of tuples that shares a C value needs to be independently sorted in the AD order.

20 20 Memory-Cube  Sort in-memory relation according to the attribute  Like PIPESORT, make a single scan through the data  Aggregates all small fragments on the path  Output datacube result by combining these small fragments

21 21 Solution Analysis  I / O cost is linear of k  CPU Cost (In-memory sorts) is exponential in k  CPU Cost should be dominated by the I / O time

22 22 Experiment  CPU time v.s. No. of Tuples  Exponential in no. of CUBE BY attributes

23 23 Experiment  CPU, I / O, CPU Usage % v.s. no. of CUBE BY attributes  CPU Usage % drops for large no. of CUBE BY attributes

24 24 Experiment  Share sorting work  CPU Time is dominated by I / O Time

25 25 Conclusion  Partitioned-Cube is a fast computation of datacubes over large sparse relation  Minimize the number of sort orders  Show the advantages of sharing sort orders in the datacube computation  First solution with LINEAR I / O Cost

26 26 Reference  Kenneth A. Ross, Divesh Srivastava : Kenneth A. Ross Divesh Srivastava Fast Computation of Sparse Datacubes. VLDB 1997 VLDB 1997 : 116-125

27 27 Q & A Section


Download ppt "1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung."

Similar presentations


Ads by Google