1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.

1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University

2 Motivation Datasets for off-line processing are becoming larger. –A system storing and allowing analysis on such datasets is a data warehouse Frequent queries on data warehouses require aggregation along one or more dimensions –Data cube construction performs all aggregations in advance to facilitate fast responses to all queries Data cube construction is a compute and data- intensive problem –Memory requirements become the bottleneck for sequential algorithms Construct data cubes in parallel in cluster environments!

3 Outline Issues in sequential / parallel data cube construction Aggregation tree Sequential algorithm and properties Parallel algorithm using aggregation tree Data partitioning problem Experimental Validation Summary

4 Data Cube – Definition  Data cube construction involves computing aggregates for all values across all possible subsets of dimensions  If the original dataset is n dimensional, the data cube construction includes computing and storing n C m m-dimensional arrays Three-dimensional data cube construction involves computing arrays AB, AC, BC, A, B, C and a scalar value all

5 Main Issues Cache and Memory Reuse –Each portion of the parent array is read only once to compute its children. Corresponding portions of each child should be updated simultaneously Using Minimal Parents –If a child has more than one parent, it uses the minimal parent which requires less computation to obtain the child –Choose a spanning tree with minimal parents Memory Management –Write back the output array to the disk if there is no child which is computed from this array Communication Volume –Appropriately partition along one or more dimensions to guarantee minimal communication volume

6 Aggregation Tree Given a set X = {1, 2, …, n} and a prefix tree P(n), the corresponding aggregation tree A(n) is constructed by complementing every node in P(n) with respect to X Power Set LatticePrefix treeAggregation tree

7 Sequential Cube Construction Use Aggregation tree Simply do a right-to-left depth first traversal Compute all children of a node in the tree simultaneously Write-back an array when it doesn’t need to be expanded D1D2D3D1D2D3 D2D3D2D3 D1D3D1D3 D1D2D1D2 D3D3 D2D2 D1D1 all Three-dimensional array D 1 D 2 D 3 with |D 1 |  |D 2 |  |D 3 |

8 Key Property of this Algorithm The total memory requirements for storing intermediate results are bounded by the sum of size of arrays at the first level of the tree This is minimum across any algorithm which –Does not write-back parts of an array –Does maximal cache and memory reuse Forms the basis for using this data-structure and memory for parallel algorithm

9 Parallel Algorithm Main ideas Each processor computes a portion of each child at the first level. Lead processors have the final results after interprocessor communication. If the output is not used to compute other children, write it back; otherwise compute children on lead processors.

10 Example Assumption –8 processors –Each of the three dimensions is partitioned in half Initially –Each processor computes partial results for each of D 1 D 2, D 1 D 3 and D 2 D 3 D1D2D3D1D2D3 D2D3D2D3 D1D3D1D3 D1D2D1D2 D3D3 D2D2 D1D1 all Three-dimensional array D 1 D 2 D 3 with |D 1 |  |D 2 |  |D 3 |

11 Example (cont.) Lead processors for D 1 D 2 (l 1, l 2, 0) (l 1, l 2, 0) (l 1, l 2, 1) (0, 0, 0) (0, 0, 0) (0, 0, 1) (0, 1, 0) (0, 1, 0) (0, 1, 1) (1, 0, 0) (1, 0, 0) (1, 0, 1) (1, 1, 0) (1, 1, 0) (1, 1, 1) Write back D 1 D 2 on lead processors D1D2D3D1D2D3 D2D3D2D3 D1D3D1D3 D1D2D1D2 D3D3 D2D2 D1D1 all Three-dimensional array D 1 D 2 D 3 with |D 1 |  |D 2 |  |D 3 |

12 Example (cont.) Lead processors for D 1 D 3 (l 1, 0, l 3 ) (l 1, 0, l 3 ) (l 1, 1, l 3 ) (0, 0, 0) (0, 0, 0) (0, 1, 0) (0, 0, 1) (0, 0, 1) (0, 1, 1) (1, 0, 0) (1, 0, 0) (1, 1, 0) (1, 0, 1) (1, 0, 1) (1, 1, 1) Compute D 1 from D 1 D 3 on lead processors; write back D 1 D 3 on lead processors Lead processors for D 1 (l 1, 0, 0) (l 1, 0, 0) (l 1, 0, 1) (0, 0, 0) (0, 0, 0) (0, 0, 1) (1, 0, 0) (1, 0, 0) (1, 0, 1) Write back D 1 on lead processors D1D2D3D1D2D3 D2D3D2D3 D1D3D1D3 D1D2D1D2 D3D3 D2D2 D1D1 all Three-dimensional array D 1 D 2 D 3 with |D 1 |  |D 2 |  |D 3 |

13 Analysis of the Algorithm The algorithm still has the minimal memory bound of any parallel algorithm which does not write-back portions of an array and does maximal cache and memory reuse We can compute the total communication volume

14 Issues for Optimization Two issues in minimizing communication volume –Instantiation of the tree: how do we choose the ordering of the dimensions –Partitioning of the original array: i.e. how do we choose k i i = 1 … n Concrete theoretical results for both in our work

15 How to Order Dimensions Theorem: The communication volume is minimized by an ordering of dimensions where 1 |D 1 | >= |D 2 | >= …. |Dn| The same ordering ensures that each child is computed from the minimal parent

16 Partitioning of the Original Array The total number of possibilities to consider could be very large A simple algorithm can produce a provable optimal partitioning

17 Impact of Data Distribution Experimental Results match with Theoretical Results

18 Impact of Data Distribution on large datasets 16 6 dataset, 8 processors. Good speed up even on 16 6 dataset. Higher dimensional dataset decreases the speedup

19 32 6 dataset, 8 processors.256 4 dataset, 8 processors. Impact of Data Distribution Contd.. Experimental Results match with Theoretical Results

20 Related Work Goil et. al did the initial work on parallelizing data cube construction Dehne et. al focused on a shared-disk model where all processors access data from a common set of disks. They did not consider memory requirement issue either Our work includes concrete results on minimized memory requirements and communication volume. Our work focuses on a shared-nothing model which is more commonly used.

21 Summary of Results A new data-structure with optimal memory requirements Parallel algorithm based on this data-structure The same ordering of dimensions minimizes computation and communication Simple algorithm gives partitioning with lowest communication volume Experimental results validate theoretical results

1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.

Similar presentations

Presentation on theme: "1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.

Similar presentations

Presentation on theme: "1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University."— Presentation transcript:

Similar presentations

About project

Feedback