Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal
Outline October 2, Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
Outline October 2, Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
October 2, Map-Reduce Simple API : map and reduce Easy to write parallel programs Fault-tolerant for large-scale data centers Performance? Always a concern for HPC community Generalized Reduction First proposed in FREERIDE that was developed at Ohio State Shared a similar processing structure The key difference lies in a programmer-managed reduction-object Better performance? Background (I)
October 2, Map-Reduce Execution
Comparing Processing Structures 6 Reduction Object represents the intermediate state of the execution Reduce func. is commutative and associative Sorting, grouping...overheads are eliminated with red. func/obj. October 2, 2015
Our Previous Work A comparative study between FREERIDE and Hadoop: FREERIDE outperformed Hadoop with factors of 5 to 10 Possible reasons: Java VS C++? HDFS overheads? Inefficiency of Hadoop? API difference? Developed MATE (Map-Reduce system with an AlternaTE API) on top of Phoenix from Stanford Adopted Generalized Reduction Focused on API differences MATE improved Phoenix with an average of 50% Avoids large set of intermediate pairs between Map & Reduce Reduces memory requirements October 2, 20157
Extending MATE Main issues of the original MATE: Only works on a single multi-core machine Datasets should reside in memory Assumes the reduction object MUST fit in memory This paper extended MATE to address these limitations Focus on graph mining: an emerging class of apps Require large-sized reduction objects as well as large-scale datasets E.g., PageRank could have a 8GB reduction object! Support of managing arbitrary-sized reduction objects Also reading disk-resident input data Evaluated Ex-MATE using PEGASUS PEGASUS: A Hadoop-based graph mining system October 2, 20158
Outline October 2, Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
October 2, System Design and Implementation System design of Ex-MATE Execution overview Support of distributed environments System APIs in Ex-MATE One set provided by the runtime operations on reduction objects Another set defined or customized by the users reduction, combination, etc.. Runtime in Ex-MATE Data partitioning Task scheduling Other low-level details
October 2, Ex-MATE Runtime Overview Basic one-stage execution
October 2, Implementation Considerations Support for processing very large datasets Partitioning function: Partition and distribute to a number of nodes Splitting function: Use the multi-core CPU on each node Management of a large reduction-object (R.O.): Reduce disk I/O! Outputs (R.O.) are updated in a demand-driven way Partition the reduction object into splits Inputs are re-organized based on data access patterns Reuse a R.O. split as much as possible in memory Example: Matrix-Vector Multiplication
A MV-Multiplication Example October 2, Output Vector Input Vector Input Matrix (1, 1) (2, 1) (1, 2)
Outline October 2, Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
GIM-V for Graph Mining (I) Generalized Iterative Matrix-Vector Multiplication(GIM-V) Proposed at CMU at first Similar to the common MV Multiplication MV Mul. : Three operations in GIM-V: combine m(i, j) and v(j) : Not have to be a multiplication combineAll n partial results for the element i : Not have to be the sum assign v(new) to v(i) : The previous value of v(i) is updated by a new value October 2, Multiplication Sum Assignment
GIM-V for Graph Mining (II) A set of graph mining applications can fit into this GIM-V PageRank, Diameter Estimation, Finding Connected Components, Random Walk with Restart, etc.. Parallelization of GIM-V: Use Map-Reduce in PEGASUS A two-stage algorithm: two consecutive map-reduce jobs Use Generalized Reduction in Ex-MATE A one-stage algorithm: simpler code October 2,
GIM-V Example: PageRank PageRank is used by Google to calculate the relative importance of web-pages: Direct implementation of GIM-V: v(j) is the ranking value The three customized operations are: October 2, Multiplication Sum Assignment
GIM-V: Other Algorithms Diameter Estimation: HADI is an algorithm to estimate the diameter of a given graph The three customized operations are: Finding Connected Components: HCC is a new algorithm to find the connected components of large graphs The three customized operations are: October 2, Multiplication Bitwise-or Multiplication Minimal
Parallelization of GIM-V (I) Using Map-Reduce: Stage I Map: October 2, Map M(i,j) and V(j) to reducer j
Parallelization of GIM-V (II) Using Map-Reduce: Stage I (cont.) Reduce: October 2, Map “combine2(M(i,j), V(j)) “ to reducer i
Parallelization of GIM-V (III) Using Map-Reduce: Stage II Map: October 2,
Parallelization of GIM-V (IV) Using Map-Reduce: Stage II (cont.) Reduce: October 2,
Parallelization of GIM-V (V) Using Generalized Reduction in Ex-MATE: Reduction: October 2,
Parallelization of GIM-V (VI) Using Generalized Reduction in Ex-MATE: Finalize: October 2,
Outline October 2, Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
October 2, Applications: Three graph mining algorithms: PageRank, Diameter Estimation, and Finding Connected Components Evaluation: Performance comparison with PEGASUS PEGASUS provides a naïve version and an optimized version Speedups with an increasing number of nodes Scalability speedups with an increasing size of datasets Experimental platform: A cluster of multi-core CPU machines Used up to 128 cores (16 nodes) Experiments Design
October 2, Results: Graph Mining (I) PageRank: 16GB dataset; a graph of 256 million nodes and 1 billion edges Avg. Time Per Iteration (min) # of nodes 10.0 speedup
October 2, Results: Graph Mining (II) HADI: 16GB dataset; a graph of 256 million nodes and 1 billion edges Avg. Time Per Iteration (min) # of nodes 11.0 speedup
October 2, Results: Graph Mining (III) HCC: 16GB dataset; a graph of 256 million nodes and 1 billion edges Avg. Time Per Iteration (min) # of nodes 9.0 speedup
October 2, Scalability: Graph Mining (IV) HCC: 8GB dataset; a graph of 256 million nodes and 0.5 billion edges Avg. Time Per Iteration (min) # of nodes 1.7 speedup 1.9 speedup
October 2, Scalability: Graph Mining (V) HCC: 32GB dataset; a graph of 256 million nodes and 2 billion edges Avg. Time Per Iteration (min) # of nodes 1.9 speedup 2.7 speedup
October 2, Scalability: Graph Mining (VI) HCC: 64GB dataset; a graph of 256 million nodes and 4 billion edges Avg. Time Per Iteration (min) # of nodes 1.9 speedup 2.8 speedup
Observations October 2, Performance trends are similar for all three applications Consistent with the fact that all three applications are implemented using the GIM-V method Ex-MATE outperforms PEGASUS significantly for all three graph mining algorithms Reasonable speedups for different datasets Better scalability for larger datasets with a increasing number of nodes
Outline October 2, Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
Related Work: Academia October 2, Evaluation of Map-Reduce-like models in various parallel programming environments: Phoenix-rebirth for large-scale multi-core machines Mars for a single GPU MITHRA for GPGPUs in heterogeneous platforms Recent IDAV for GPU clusters Improvement of Map-Reduce API: Integrating pre-fetch and pre-shuffling into Hadoop Supporting online queries Enforcing a less restrictive synchronization semantics between Map and Reduce
Related Work: Industry October 2, Google’s Pregel System: Map-reduce may not so suitable for graph operations Proposed to target graph processing Open source version: HAMA project in Apache Variants of Map-Reduce: Dryad/DryadLINQ from Microsoft Sawzall from Google Pig/Map-Reduce-Merge from Yahoo! Hive from Facebook
Outline October 2, Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
October 2, Conclusion Ex-MATE supports the management of reduction objects of arbitrary sizes Deals with disk-resident reduction objects Outperforms PEGASUS for both the naïve and optimized implementations for all three graph mining application Has a simpler code Offers a promising alternative for developing efficient data-intensive applications, Uses GIM-V for parallelizing graph mining
39 Thank You, and Acknowledgments Questions and comments Wei Jiang - Gagan Agrawal - This project was supported by: