Download presentation
Presentation is loading. Please wait.
Published byLee Hart Modified over 9 years ago
1
1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University
2
2 Introduction to Data Cube Construction Data cube construction involves computing aggregates for all values across all possible subsets of dimensions. If the original dataset is n dimensional, the data cube construction includes computing and storing n C m m-dimensional arrays. Three-dimensional data cube construction involves computing arrays AB, AC, BC, A, B, C and a scalar value all. Part I
3
3 Motivation Datasets for off-line processing are becoming larger. –A system storing and allowing analysis on such datasets is a data warehouse. Frequent queries on data warehouses require aggregation along one or more dimensions. –Data cube construction performs all aggregations in advance to facilitate fast responses to all queries. Data cube construction is a compute and data- intensive problem. –Memory requirements become the bottleneck for sequential algorithms. Part I Construct data cubes in parallel in cluster environments!
4
4 Our Earlier Work Parallel Algorithms for Small Dimensional Cases and Use of a Cluster Middleware (CCGRID 2002, FGCS 2003) Parallel algorithms and theoretical results (ICPP 2003, HiPC 2003) Evaluating parallel algorithms (IPDPS 2003)
5
5 Using Tiling One important issue: memory requirements for intermediate results –From a Sparse m dimensional array, we compute m m- 1 dimensional dense arrays Tiling can help scale sequential and parallel datacube algorithms Two important issues: –Algorithms for Using Tiling –How to tile so as to have minimum overhead
6
6 Outline Main Issues and Data Structures Parallel algorithms without tiling Tiling for Sequential Datacube construction Theoretical analysis Tiling for Parallel Datacube construction Experimental evaluation
7
7 Related Work Goil et. al did the initial work on parallelizing data cube construction. Dehne et. al focused on a shared-disk model where all processors access data from a common set of disks. They did not consider memory requirement issue either. Part I Our work includes concrete results on minimized memory requirements and communication volume. Our work focuses on a shared-nothing model which is more commonly used.
8
8 Main Issues Cache and Memory Reuse –Each portion of the parent array is read only once to compute its children. Corresponding portions of each child should be updated simultaneously. Using Minimal Parents –If a child has more than one parent, it uses the minimal parent which requires less computation to obtain the child. Memory Management –Write back the output array to the disk if there is no child which is computed from this array. –Manage available main memory effectively Communication Volume –Appropriately partition along one or more dimensions to guarantee minimal communication volume. Part I
9
9 Aggregation Tree Given a set X = {1, 2, …, n} and a prefix tree P(n), the corresponding aggregation tree A(n) is constructed by complementing every node in P(n) with respect to X. Part III Prefix latticePrefix treeAggregation tree
10
10 Theoretical Results For data cube construction using aggregation tree –The total memory requirement for holding the results is bounded. –The total communication volume is bounded. –It is guranteed that all arrays are computed from their minimal parents. –A procedure of partitioning input datasets exists for minimizing interprocessor communication. Part III
11
11 Level One Parallel Algorithm Main ideas Each processor computes a portion of each child at the first level. Lead processors have the final results after interprocessor communication. If the output is not used to compute other children, write it back; otherwise compute children on lead processors. Part III
12
12 Example Assumption –8 processors –Each of the three dimensions is partitioned in half Initially –Each processor computes partial results for each of D 1 D 2, D 1 D 3 and D 2 D 3 D1D2D3D1D2D3 D2D3D2D3 D1D3D1D3 D1D2D1D2 D3D3 D2D2 D1D1 all Three-dimensional array D 1 D 2 D 3 with |D 1 | |D 2 | |D 3 | Part III
13
13 Example (cont.) Lead processors for D 1 D 2 (l 1, l 2, 0) (l 1, l 2, 0) (l 1, l 2, 1) (0, 0, 0) (0, 0, 0) (0, 0, 1) (0, 1, 0) (0, 1, 0) (0, 1, 1) (1, 0, 0) (1, 0, 0) (1, 0, 1) (1, 1, 0) (1, 1, 0) (1, 1, 1) Write back D 1 D 2 on lead processors D1D2D3D1D2D3 D2D3D2D3 D1D3D1D3 D1D2D1D2 D3D3 D2D2 D1D1 all Three-dimensional array D 1 D 2 D 3 with |D 1 | |D 2 | |D 3 | Part III
14
14 Example (cont.) Lead processors for D 1 D 3 (l 1, 0, l 3 ) (l 1, 0, l 3 ) (l 1, 1, l 3 ) (0, 0, 0) (0, 0, 0) (0, 1, 0) (0, 0, 1) (0, 0, 1) (0, 1, 1) (1, 0, 0) (1, 0, 0) (1, 1, 0) (1, 0, 1) (1, 0, 1) (1, 1, 1) Compute D 1 from D 1 D 3 on lead processors; write back D 1 D 3 on lead processors Lead processors for D 1 (l 1, 0, 0) (l 1, 0, 0) (l 1, 0, 1) (0, 0, 0) (0, 0, 0) (0, 0, 1) (1, 0, 0) (1, 0, 0) (1, 0, 1) Write back D 1 on lead processors D1D2D3D1D2D3 D2D3D2D3 D1D3D1D3 D1D2D1D2 D3D3 D2D2 D1D1 all Three-dimensional array D 1 D 2 D 3 with |D 1 | |D 2 | |D 3 | Part III
15
15 Tiling-based Approach Motivation –Parallel machines are not always available –Memory of individual computer is limited Tiling-based Approaches –Sequential: Tile along dimensions on one processor –Parallel: Partition among processors and on each processor tile along dimensions Part IV
16
16 Sequential Tiling-based Algorithm Main Idea A portion of a node in aggregation tree is expandable (can be used to compute its children) once enough tiles of the portion of this node have been processed. Main Mechanism Each tile is given a label D1D2D3D1D2D3 D1D2D1D2 D1D3D1D3 D2D3D2D3 D1D1 D2D2 D3D3 all Three-dimensional array D 1 D 2 D 3 with |D 1 | |D 2 | |D 3 | 4 tiles, tile along D 2, D 3. Each tile is given a lable (0, l 2, l 3 ) Tile 0 – (0, 0, 0) Tile 1 – (0, 0, 1) Tile 2 – (0, 1, 0) Tile 3 – (0, 1, 1) Part IV
17
17 Example D1D2D3D1D2D3 D1D2D1D2 D1D3D1D3 D2D3D2D3 D1D1 D2D2 D3D3 all Three-dimensional array D 1 D 2 D 3 with |D 1 | |D 2 | |D 3 | D2D3D2D3 D1D3D1D3 D1D2D1D2 Tile (0 0 0) donePortion 0 Tile (0 0 1) donePortion 1 Portion 0 Merge & expand Tile (0 1 0) done Portion 0 Merge & expand Portion 1 Tile (0 1 1) done Portion 1 Merge & expand Portion 1 Merge & expand Part IV
18
18 Tiling Overhead Tiling based algorithm requires writing back and rereading portions of results Want to tile to minimize the overhead Tile the dimension D i 2 ki times We can compute the total tiling overhead as
19
19 Minimizing Tiling Overhead Tile the largest dimension first, change its effective size Keep choosing the largest dimension, till the memory requirements are below the available memory
20
20 Parallel Tiling-based Algorithm Assumptions –Three-dimensional partition (0 1 1 1) –Two-dimensional tiling (0 0 1 1) D1D2D3D4D1D2D3D4 D1D2D3D1D2D3 D1D2D4D1D2D4 D1D3D4D1D3D4 D2D3D4D2D3D4 D1D2D1D2 D1D3D1D3 D2D3D2D3 D1D4D1D4 D2D4D2D4 D3D4D3D4 D1D1 D2D2 D3D3 D4D4 all Four-dimensional aggregation tree with |D 1 | |D 2 | |D 3 | |D 4 | Part IV Solutions –Apply tiling-based approaches to first level nodes only –Apply Level One Parallel Algorithm to other nodes
21
21 Choosing Tiling Parameters Tiling overhead exists. Tiling along multiple dimensions can reduce tiling overhead. Part IV
22
22 Parallel Tiling-based Algorithm Results Algorithm of choosing tiling parameters to reduce tiling overhead still takes effect in parallel environments! Part IV
23
23 More data goes here
24
24 Conclusions Tiling can help scale parallel datacube construction Algorithms and analytical results in our work
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.