1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.

Slides:

Advertisements

Similar presentations

Choosing an Order for Joins

Advertisements

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.

Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values.

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.

Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.

Fast Algorithms For Hierarchical Range Histogram Constructions

Lecture 3: Parallel Algorithm Design

AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.

Generating the Data Cube (Shared Disk) Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch.

A Ternary Unification Framework for Optimizing TCAM-Based Packet Classification Systems Author: Eric Norige, Alex X. Liu, and Eric Torng Publisher: ANCS.

Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?

Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.

I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.

Cube Tree Dimension: number of group-by values Relation tuples map to a point in the space Aggregates: projection of all data points on all the subspaces.

Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets Based on the work of Jeffrey Scott Vitter and Min Wang.

Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates

FLANN Fast Library for Approximate Nearest Neighbors

Randomized Algorithms - Treaps

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.

Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.

Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.

1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.

Efficient Methods for Data Cube Computation and Data Generalization

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

Parallel Suffix Array Construction by Accelerated Sampling Matthew Felice Pace University of Warwick Joint work with Alexander Tiskin University of Warwick.

Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.

Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.

1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Frank Dehnewww.dehne.net Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL.

Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

SAGA: Array Storage as a DB with Support for Structural Aggregations SSDBM 2014 June 30 th, Aalborg, Denmark 1 Yi Wang, Arnab Nandi, Gagan Agrawal The.

A Fault-Tolerant Environment for Large-Scale Query Processing Mehmet Can Kurt Gagan Agrawal Department of Computer Science and Engineering The Ohio State.

Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.

5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.

2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.

Finding concurrency Jakub Yaghob. Finding concurrency design space Starting point for design of a parallel solution Analysis The patterns will help identify.

Winter 2014Parallel Processing, Fundamental ConceptsSlide 1 2 A Taste of Parallel Algorithms Learn about the nature of parallel algorithms and complexity:

System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Gagan Agrawal Department of Computer and Information Sciences Ohio.

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

Research Overview Gagan Agrawal Associate Professor.

Modern Information Retrieval

Compressing Bi-Level Images by Block Matching on a Tree Architecture Sergio De Agostino Computer Science Department Sapienza University of Rome ITALY.

Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.

1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.

System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.

Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.

Label Embedding Trees for Large Multi-class Tasks Samy Bengio Jason Weston David Grangier Presented by Zhengming Xing.

A novel, low-latency algorithm for multiple group-by query optimization Duy-Hung Phan Pietro Michiardi ICDE16.

Dense-Region Based Compact Data Cube

Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University

RE-Tree: An Efficient Index Structure for Regular Expressions

Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz

Department of Computer Science University of California, Santa Barbara

Communication and Memory Efficient Parallel Decision Tree Construction

Fast and Exact K-Means Clustering

FREERIDE: A Framework for Rapid Implementation of Datamining Engines

Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.

Presentation transcript:

1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University

2 Introduction to Data Cube Construction  Data cube construction involves computing aggregates for all values across all possible subsets of dimensions.  If the original dataset is n dimensional, the data cube construction includes computing and storing n C m m-dimensional arrays. Three-dimensional data cube construction involves computing arrays AB, AC, BC, A, B, C and a scalar value all. Part I

3 Motivation Datasets for off-line processing are becoming larger. –A system storing and allowing analysis on such datasets is a data warehouse. Frequent queries on data warehouses require aggregation along one or more dimensions. –Data cube construction performs all aggregations in advance to facilitate fast responses to all queries. Data cube construction is a compute and data- intensive problem. –Memory requirements become the bottleneck for sequential algorithms. Part I Construct data cubes in parallel in cluster environments!

4 Our Earlier Work Parallel Algorithms for Small Dimensional Cases and Use of a Cluster Middleware (CCGRID 2002, FGCS 2003) Parallel algorithms and theoretical results (ICPP 2003, HiPC 2003) Evaluating parallel algorithms (IPDPS 2003)

5 Using Tiling One important issue: memory requirements for intermediate results –From a Sparse m dimensional array, we compute m m- 1 dimensional dense arrays Tiling can help scale sequential and parallel datacube algorithms Two important issues: –Algorithms for Using Tiling –How to tile so as to have minimum overhead

6 Outline Main Issues and Data Structures Parallel algorithms without tiling Tiling for Sequential Datacube construction Theoretical analysis Tiling for Parallel Datacube construction Experimental evaluation

7 Related Work Goil et. al did the initial work on parallelizing data cube construction. Dehne et. al focused on a shared-disk model where all processors access data from a common set of disks. They did not consider memory requirement issue either. Part I Our work includes concrete results on minimized memory requirements and communication volume. Our work focuses on a shared-nothing model which is more commonly used.

8 Main Issues Cache and Memory Reuse –Each portion of the parent array is read only once to compute its children. Corresponding portions of each child should be updated simultaneously. Using Minimal Parents –If a child has more than one parent, it uses the minimal parent which requires less computation to obtain the child. Memory Management –Write back the output array to the disk if there is no child which is computed from this array. –Manage available main memory effectively Communication Volume –Appropriately partition along one or more dimensions to guarantee minimal communication volume. Part I

9 Aggregation Tree Given a set X = {1, 2, …, n} and a prefix tree P(n), the corresponding aggregation tree A(n) is constructed by complementing every node in P(n) with respect to X. Part III Prefix latticePrefix treeAggregation tree

10 Theoretical Results For data cube construction using aggregation tree –The total memory requirement for holding the results is bounded. –The total communication volume is bounded. –It is guranteed that all arrays are computed from their minimal parents. –A procedure of partitioning input datasets exists for minimizing interprocessor communication. Part III

11 Level One Parallel Algorithm Main ideas Each processor computes a portion of each child at the first level. Lead processors have the final results after interprocessor communication. If the output is not used to compute other children, write it back; otherwise compute children on lead processors. Part III

12 Example Assumption –8 processors –Each of the three dimensions is partitioned in half Initially –Each processor computes partial results for each of D 1 D 2, D 1 D 3 and D 2 D 3 D1D2D3D1D2D3 D2D3D2D3 D1D3D1D3 D1D2D1D2 D3D3 D2D2 D1D1 all Three-dimensional array D 1 D 2 D 3 with |D 1 |  |D 2 |  |D 3 | Part III

13 Example (cont.) Lead processors for D 1 D 2 (l 1, l 2, 0) (l 1, l 2, 0) (l 1, l 2, 1) (0, 0, 0) (0, 0, 0) (0, 0, 1) (0, 1, 0) (0, 1, 0) (0, 1, 1) (1, 0, 0) (1, 0, 0) (1, 0, 1) (1, 1, 0) (1, 1, 0) (1, 1, 1) Write back D 1 D 2 on lead processors D1D2D3D1D2D3 D2D3D2D3 D1D3D1D3 D1D2D1D2 D3D3 D2D2 D1D1 all Three-dimensional array D 1 D 2 D 3 with |D 1 |  |D 2 |  |D 3 | Part III

14 Example (cont.) Lead processors for D 1 D 3 (l 1, 0, l 3 ) (l 1, 0, l 3 ) (l 1, 1, l 3 ) (0, 0, 0) (0, 0, 0) (0, 1, 0) (0, 0, 1) (0, 0, 1) (0, 1, 1) (1, 0, 0) (1, 0, 0) (1, 1, 0) (1, 0, 1) (1, 0, 1) (1, 1, 1) Compute D 1 from D 1 D 3 on lead processors; write back D 1 D 3 on lead processors Lead processors for D 1 (l 1, 0, 0) (l 1, 0, 0) (l 1, 0, 1) (0, 0, 0) (0, 0, 0) (0, 0, 1) (1, 0, 0) (1, 0, 0) (1, 0, 1) Write back D 1 on lead processors D1D2D3D1D2D3 D2D3D2D3 D1D3D1D3 D1D2D1D2 D3D3 D2D2 D1D1 all Three-dimensional array D 1 D 2 D 3 with |D 1 |  |D 2 |  |D 3 | Part III

15 Tiling-based Approach Motivation –Parallel machines are not always available –Memory of individual computer is limited Tiling-based Approaches –Sequential: Tile along dimensions on one processor –Parallel: Partition among processors and on each processor tile along dimensions Part IV

16 Sequential Tiling-based Algorithm Main Idea A portion of a node in aggregation tree is expandable (can be used to compute its children) once enough tiles of the portion of this node have been processed. Main Mechanism Each tile is given a label D1D2D3D1D2D3 D1D2D1D2 D1D3D1D3 D2D3D2D3 D1D1 D2D2 D3D3 all Three-dimensional array D 1 D 2 D 3 with |D 1 |  |D 2 |  |D 3 | 4 tiles, tile along D 2, D 3. Each tile is given a lable (0, l 2, l 3 ) Tile 0 – (0, 0, 0) Tile 1 – (0, 0, 1) Tile 2 – (0, 1, 0) Tile 3 – (0, 1, 1) Part IV

17 Example D1D2D3D1D2D3 D1D2D1D2 D1D3D1D3 D2D3D2D3 D1D1 D2D2 D3D3 all Three-dimensional array D 1 D 2 D 3 with |D 1 |  |D 2 |  |D 3 | D2D3D2D3 D1D3D1D3 D1D2D1D2 Tile (0 0 0) donePortion 0 Tile (0 0 1) donePortion 1 Portion 0 Merge & expand Tile (0 1 0) done Portion 0 Merge & expand Portion 1 Tile (0 1 1) done Portion 1 Merge & expand Portion 1 Merge & expand Part IV

18 Tiling Overhead Tiling based algorithm requires writing back and rereading portions of results Want to tile to minimize the overhead Tile the dimension D i 2 ki times We can compute the total tiling overhead as

19 Minimizing Tiling Overhead Tile the largest dimension first, change its effective size Keep choosing the largest dimension, till the memory requirements are below the available memory

20 Parallel Tiling-based Algorithm Assumptions –Three-dimensional partition ( ) –Two-dimensional tiling ( ) D1D2D3D4D1D2D3D4 D1D2D3D1D2D3 D1D2D4D1D2D4 D1D3D4D1D3D4 D2D3D4D2D3D4 D1D2D1D2 D1D3D1D3 D2D3D2D3 D1D4D1D4 D2D4D2D4 D3D4D3D4 D1D1 D2D2 D3D3 D4D4 all Four-dimensional aggregation tree with |D 1 |  |D 2 |  |D 3 |  |D 4 | Part IV Solutions –Apply tiling-based approaches to first level nodes only –Apply Level One Parallel Algorithm to other nodes

21 Choosing Tiling Parameters Tiling overhead exists. Tiling along multiple dimensions can reduce tiling overhead. Part IV

22 Parallel Tiling-based Algorithm Results Algorithm of choosing tiling parameters to reduce tiling overhead still takes effect in parallel environments! Part IV

23 More data goes here

24 Conclusions Tiling can help scale parallel datacube construction Algorithms and analytical results in our work