Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.

Slides:

Advertisements

Similar presentations

Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.

Advertisements

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Parallel Computing MapReduce Examples Parallel Efficiency Assignment

Spark: Cluster Computing with Working Sets

Matei Zaharia Large-Scale Matrix Operations Using a Data Flow Engine.

APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.

IMapReduce: A Distributed Computing Framework for Iterative Computation Yanfeng Zhang, Northeastern University, China Qixin Gao, Northeastern University,

ACL, June Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard University of Maryland,

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.

Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.

MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.

IPDPS, Supporting Fault Tolerance in a Data-Intensive Computing Middleware Tekin Bicer, Wei Jiang and Gagan Agrawal Department of Computer Science.

Pregel: A System for Large-Scale Graph Processing

By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.

A Map-Reduce-Like System for Programming and Optimizing Data-Intensive Computations on Emerging Parallel Architectures Wei Jiang Data-Intensive and High.

Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

DisCo: Distributed Co-clustering with Map-Reduce S. Papadimitriou, J. Sun IBM T.J. Watson Research Center Speaker: 吳宏君陳威遠洪浩哲.

X-Stream: Edge-Centric Graph Processing using Streaming Partitions

Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.

1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.

Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)

Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

A Map-Reduce System with an Alternate API for Multi-Core Environments Wei Jiang, Vignesh T. Ravi and Gagan Agrawal.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.

Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal June 1,

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.

Optimizing MapReduce for GPUs with Effective Shared Memory Usage Department of Computer Science and Engineering The Ohio State University Linchuan Chen.

Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3,

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.

HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.

Rapid Tomographic Image Reconstruction via Large-Scale Parallelization Ohio State University Computer Science and Engineering Dep. Gagan Agrawal Argonne.

ApproxHadoop Bringing Approximations to MapReduce Frameworks

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

MATE-CG: A MapReduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters Wei Jiang and Gagan Agrawal.

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.

Next Generation of Apache Hadoop MapReduce Owen

COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University

Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.

EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.

Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.

Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,

A Peta-Scale Graph Mining System

PEGASUS: A PETA-SCALE GRAPH MINING SYSTEM

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Wei Jiang Advisor: Dr. Gagan Agrawal

Data-Intensive Computing: From Clouds to GPU Clusters

Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz

Yi Wang, Wei Jiang, Gagan Agrawal

A Map-Reduce System with an Alternate API for Multi-Core Environments

Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.

Presentation transcript:

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal

Outline October 2,  Background  System Design of Ex-MATE  Parallel Graph Mining with Ex-MATE  Experiments  Related Work  Conclusion

Outline October 2,  Background  System Design of Ex-MATE  Parallel Graph Mining with Ex-MATE  Experiments  Related Work  Conclusion

October 2,  Map-Reduce  Simple API : map and reduce  Easy to write parallel programs  Fault-tolerant for large-scale data centers  Performance?  Always a concern for HPC community  Generalized Reduction  First proposed in FREERIDE that was developed at Ohio State  Shared a similar processing structure  The key difference lies in a programmer-managed reduction-object  Better performance? Background (I)

October 2, Map-Reduce Execution

Comparing Processing Structures 6 Reduction Object represents the intermediate state of the execution Reduce func. is commutative and associative Sorting, grouping...overheads are eliminated with red. func/obj. October 2, 2015

Our Previous Work  A comparative study between FREERIDE and Hadoop:  FREERIDE outperformed Hadoop with factors of 5 to 10  Possible reasons:  Java VS C++? HDFS overheads? Inefficiency of Hadoop?  API difference?  Developed MATE (Map-Reduce system with an AlternaTE API) on top of Phoenix from Stanford  Adopted Generalized Reduction  Focused on API differences  MATE improved Phoenix with an average of 50%  Avoids large set of intermediate pairs between Map & Reduce  Reduces memory requirements October 2, 20157

Extending MATE  Main issues of the original MATE:  Only works on a single multi-core machine  Datasets should reside in memory  Assumes the reduction object MUST fit in memory  This paper extended MATE to address these limitations  Focus on graph mining: an emerging class of apps  Require large-sized reduction objects as well as large-scale datasets  E.g., PageRank could have a 8GB reduction object!  Support of managing arbitrary-sized reduction objects  Also reading disk-resident input data  Evaluated Ex-MATE using PEGASUS  PEGASUS: A Hadoop-based graph mining system October 2, 20158

Outline October 2,  Background  System Design of Ex-MATE  Parallel Graph Mining with Ex-MATE  Experiments  Related Work  Conclusion

October 2, System Design and Implementation  System design of Ex-MATE  Execution overview  Support of distributed environments  System APIs in Ex-MATE  One set provided by the runtime  operations on reduction objects  Another set defined or customized by the users  reduction, combination, etc..  Runtime in Ex-MATE  Data partitioning  Task scheduling  Other low-level details

October 2, Ex-MATE Runtime Overview  Basic one-stage execution

October 2, Implementation Considerations  Support for processing very large datasets  Partitioning function:  Partition and distribute to a number of nodes  Splitting function:  Use the multi-core CPU on each node  Management of a large reduction-object (R.O.):  Reduce disk I/O!  Outputs (R.O.) are updated in a demand-driven way  Partition the reduction object into splits  Inputs are re-organized based on data access patterns  Reuse a R.O. split as much as possible in memory  Example: Matrix-Vector Multiplication

A MV-Multiplication Example October 2, Output Vector Input Vector Input Matrix (1, 1) (2, 1) (1, 2)

Outline October 2,  Background  System Design of Ex-MATE  Parallel Graph Mining with Ex-MATE  Experiments  Related Work  Conclusion

GIM-V for Graph Mining (I)  Generalized Iterative Matrix-Vector Multiplication(GIM-V)  Proposed at CMU at first  Similar to the common MV Multiplication  MV Mul. :  Three operations in  GIM-V:  combine m(i, j) and v(j) :  Not have to be a multiplication  combineAll n partial results for the element i :  Not have to be the sum  assign v(new) to v(i) :  The previous value of v(i) is updated by a new value October 2, Multiplication Sum Assignment

GIM-V for Graph Mining (II)  A set of graph mining applications can fit into this GIM-V  PageRank, Diameter Estimation, Finding Connected Components, Random Walk with Restart, etc..  Parallelization of GIM-V:  Use Map-Reduce in PEGASUS  A two-stage algorithm: two consecutive map-reduce jobs  Use Generalized Reduction in Ex-MATE  A one-stage algorithm: simpler code October 2,

GIM-V Example: PageRank  PageRank is used by Google to calculate the relative importance of web-pages:  Direct implementation of GIM-V: v(j) is the ranking value  The three customized operations are: October 2, Multiplication Sum Assignment

GIM-V: Other Algorithms  Diameter Estimation: HADI is an algorithm to estimate the diameter of a given graph  The three customized operations are:  Finding Connected Components: HCC is a new algorithm to find the connected components of large graphs  The three customized operations are: October 2, Multiplication Bitwise-or Multiplication Minimal

Parallelization of GIM-V (I)  Using Map-Reduce: Stage I  Map: October 2, Map M(i,j) and V(j) to reducer j

Parallelization of GIM-V (II)  Using Map-Reduce: Stage I (cont.)  Reduce: October 2, Map “combine2(M(i,j), V(j)) “ to reducer i

Parallelization of GIM-V (III)  Using Map-Reduce: Stage II  Map: October 2,

Parallelization of GIM-V (IV)  Using Map-Reduce: Stage II (cont.)  Reduce: October 2,

Parallelization of GIM-V (V)  Using Generalized Reduction in Ex-MATE:  Reduction: October 2,

Parallelization of GIM-V (VI)  Using Generalized Reduction in Ex-MATE:  Finalize: October 2,

Outline October 2,  Background  System Design of Ex-MATE  Parallel Graph Mining with Ex-MATE  Experiments  Related Work  Conclusion

October 2,  Applications:  Three graph mining algorithms:  PageRank, Diameter Estimation, and Finding Connected Components  Evaluation:  Performance comparison with PEGASUS  PEGASUS provides a naïve version and an optimized version  Speedups with an increasing number of nodes  Scalability speedups with an increasing size of datasets  Experimental platform:  A cluster of multi-core CPU machines  Used up to 128 cores (16 nodes) Experiments Design

October 2, Results: Graph Mining (I)  PageRank: 16GB dataset; a graph of 256 million nodes and 1 billion edges Avg. Time Per Iteration (min) # of nodes 10.0 speedup

October 2, Results: Graph Mining (II)  HADI: 16GB dataset; a graph of 256 million nodes and 1 billion edges Avg. Time Per Iteration (min) # of nodes 11.0 speedup

October 2, Results: Graph Mining (III)  HCC: 16GB dataset; a graph of 256 million nodes and 1 billion edges Avg. Time Per Iteration (min) # of nodes 9.0 speedup

October 2, Scalability: Graph Mining (IV)  HCC: 8GB dataset; a graph of 256 million nodes and 0.5 billion edges Avg. Time Per Iteration (min) # of nodes 1.7 speedup 1.9 speedup

October 2, Scalability: Graph Mining (V)  HCC: 32GB dataset; a graph of 256 million nodes and 2 billion edges Avg. Time Per Iteration (min) # of nodes 1.9 speedup 2.7 speedup

October 2, Scalability: Graph Mining (VI)  HCC: 64GB dataset; a graph of 256 million nodes and 4 billion edges Avg. Time Per Iteration (min) # of nodes 1.9 speedup 2.8 speedup

Observations October 2,  Performance trends are similar for all three applications  Consistent with the fact that all three applications are implemented using the GIM-V method  Ex-MATE outperforms PEGASUS significantly for all three graph mining algorithms  Reasonable speedups for different datasets  Better scalability for larger datasets with a increasing number of nodes

Outline October 2,  Background  System Design of Ex-MATE  Parallel Graph Mining with Ex-MATE  Experiments  Related Work  Conclusion

Related Work: Academia October 2,  Evaluation of Map-Reduce-like models in various parallel programming environments:  Phoenix-rebirth for large-scale multi-core machines  Mars for a single GPU  MITHRA for GPGPUs in heterogeneous platforms  Recent IDAV for GPU clusters  Improvement of Map-Reduce API:  Integrating pre-fetch and pre-shuffling into Hadoop  Supporting online queries  Enforcing a less restrictive synchronization semantics between Map and Reduce

Related Work: Industry October 2,  Google’s Pregel System:  Map-reduce may not so suitable for graph operations  Proposed to target graph processing  Open source version: HAMA project in Apache  Variants of Map-Reduce:  Dryad/DryadLINQ from Microsoft  Sawzall from Google  Pig/Map-Reduce-Merge from Yahoo!  Hive from Facebook

Outline October 2,  Background  System Design of Ex-MATE  Parallel Graph Mining with Ex-MATE  Experiments  Related Work  Conclusion

October 2, Conclusion  Ex-MATE supports the management of reduction objects of arbitrary sizes  Deals with disk-resident reduction objects  Outperforms PEGASUS for both the naïve and optimized implementations for all three graph mining application  Has a simpler code  Offers a promising alternative for developing efficient data-intensive applications,  Uses GIM-V for parallelizing graph mining

39 Thank You, and Acknowledgments  Questions and comments  Wei Jiang -  Gagan Agrawal -  This project was supported by: