Seunghwa Kang David A. Bader Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System.

Slides:



Advertisements
Similar presentations
 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
Advertisements

U of Houston – Clear Lake
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Overview of MapReduce and Hadoop
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Bahman Bahmani  Fundamental Tradeoffs  Drug Interaction Example [Adapted from Ullman’s slides, 2012]  Technique I: Grouping 
Running Large Graph Algorithms – Evaluation of Current State-of-the-Art Andy Yoo Lawrence Livermore National Laboratory – Google Tech Talk Feb Summarized.
DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
Felix Halim, Roland H.C. Yap, Yongzheng Wu
Clydesdale: Structured Data Processing on MapReduce Jackie.
Distributed Computations
Bin Fu Eugene Fink, Julio López, Garth Gibson Carnegie Mellon University Astronomy application of Map-Reduce: Friends-of-Friends algorithm A distributed.
MSSG: A Framework for Massive-Scale Semantic Graphs Timothy D. R. Hartley, Umit Catalyurek, Fusun Ozguner, Andy Yoo, Scott Kohn, Keith Henderson Dept.
ACL, June Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard University of Maryland,
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Sorting, Searching, and Simulation in the MapReduce Framework Michael T. Goodrich Dept. of Computer Science.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Pregel: A System for Large-Scale Graph Processing
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian, Haojie Zhou, Yongqiang He,Li Zha 簡報人:碩資工一甲 董耀文.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
Storage in Big Data Systems
DisCo: Distributed Co-clustering with Map-Reduce S. Papadimitriou, J. Sun IBM T.J. Watson Research Center Speaker: 吳宏君 陳威遠 洪浩哲.
GRAPH PROCESSING Hi, I am Mayank and the second presenter for today is Shadi. We will be talking about Graph Processing.
Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010.
Introduction to Hadoop and HDFS
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Data Structures and Algorithms in Parallel Computing Lecture 2.
1 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray)
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
The Similarity Graph: Analyzing Database Access Patterns Within A Company PhD Preliminary Examination Robert Searles Committee John Cavazos and Stephan.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
1 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray)
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Data Structures and Algorithms in Parallel Computing
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Efficient Evaluation of XQuery over Streaming Data
Some slides adapted from those of Yuan Yu and Michael Isard
Seth Pugsley, Jeffrey Jestes,
Introduction to MapReduce and Hadoop
Architecture Background
Community detection in graphs
MapReduce Simplied Data Processing on Large Clusters
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
Introduction to Multiprocessors
Chapter 4 Multiprocessors
Presentation transcript:

Seunghwa Kang David A. Bader Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System 1

Various Complex Networks Friendship network Citation network Web-link graph Collaboration network => Need to extract graphs from large volumes of raw data. => Extracted graphs are highly irregular. 2 Source: Source:

A Challenge Problem Extracting a subgraph from a larger graph. -The input graph: An R-MAT* graph (undirected, unweighted) with approx billion vertices and 275 billion edges (7.4 TB in text format). -Extract subnetworks that cover 10%, 5%, and 2% of the vertices. Finding a single-pair shortest path (for up to 30 pairs). 3 * D. Chakrabarti, Y. Zhan, and C. Faloutsos, “R-MAT: A recursive model for graph mining,” SIAM Int’l Conf. on Data Mining (SDM), a=0.55 c=0.1 a d=0.25 c b=0.1 d Source: Seokhee Hong

Presentation Outline Present the hybrid system. Solve the problem using three different systems: A MapReduce cluster, a highly multithreaded system, and the hybrid system. Show the effectiveness of the hybrid system by - Algorithm level analyses - System level analyses - Experimental results 4

Highlights A MapReduce cluster A highly multithreaded system A hybrid system of the two Theory level analysis Graph extraction: W MapReduce (n) ≈ θ(T * (n)) Shortest path: W MapReduce (n) > θ(T * (n)) Work optimalEffective if |T hmt - T MapReduce | > n / BW inter System level analysis Bisection bandwidth and disk I/O overhead Limited aggregate computing power, disk capacity, and I/O bandwidth BW inter is important. Experi- ments Five orders of magnitude slower than the highly multithreaded system in finding a shortest path Incapable of storing the input graph Efficient in solving the challenge problem. 5

A Hybrid System to Address the Distinct Computational Challenges 6 1. graph extraction 2. graph analysis queries A MapReduce cluster A highly multithreaded system

The MapReduce Programming Model Scans the entire input data in the map phase. # MapReduce iterations = the depth of a directed acyclic graph (DAG) for MapReduce computation 7 map sort reduce Input data Intermediate data Sorted intermediate data Output data A[0] A[1]A[2] A[3] A[4] A[5]A[6] A[7] A’[0]A’[1]A’[2]A’[3]A’[4] A’’[0]A’’[1]A’’[2]A’’[3]A’’[4] Depth 1 2 3

Evaluating the efficiency of MapReduce Algorithms W MapReduce = Σ i = 1 to k (O(n i (1 + f i (1 + r i )) + p r Sort(n i f i / p r )) -k: # MapReduce iterations. -n i : the input data size for the ith iteration. -f i : map output size / map input size -r i : reduce output size / reduce input size. -p r : # reducers Extracting a subgraph - k = 1 and f i << 1  W MapReduce (n) ≈ θ(T * (n)), T * (n): the time complexity of the best sequential algorithm Finding a single-pair shortest path - k = ┌ d/2 ┐, f i ≈ 1  W MapReduce (n) > θ(T*(n)) 8

A single-pair shortest path 9 Source:

Bisection Bandwidth Requirements for a MapReduce Cluster The shuffle phase, which requires inter-node communication, can be overlapped with the map phase. If T map > T shuffle, T shuffle does not affect the overall execution time. -T map scales trivially. -To scale T shuffle linearly, bisection bandwidth also needs to scale in proportion to a number of nodes. Yet, the cost to linearly scale bisection bandwidth increases super-linearly. -If f << 1, the sub-linear scaling of T shuffle does not increase the overall execution time. -If f ≈ 1, it increases the overall execution time. 10

Disk I/O overhead Disk I/O overhead is unavoidable if the size of data overflows the main memory capacity. Raw data can be very large. Extracted graphs are much smaller. - The Facebook network: 400 million users × 130 friends per user  less than 256 GB using the sparse representation

A Highly Multithreaded System w/ the Shared Memory Programming Model Provide a random access mechanism. In SMPs, non-contiguous accesses are expensive.* Multithreading tolerates memory access latency.+ There is a work optimal parallel algorithm to find a single-pair shortest path. 12 Source: Cray Sun Fire T2000 (Niagara) Source: Sun Microsystems * D. R. Helman and J. Ja’Ja’, “Prefix computations on symmetric multiprocessors,” J. of parallel and distributed computing, 61(2), D. A. Bader, V. Kanade, and K. Madduri, “SWARM: A parallel programming framework for multi-core processors,” Workshop on Multithreaded Architectures and Applications, Cray XMT

A single-pair shortest path 13 Source:

Low Latency High Bisection Bandwidth Interconnection Network Latency increases as the size of a system increases. - A larger number of threads and additional parallelism are required as latency increases. Network cost to linearly scale bisection bandwidth increases super-linearly. - But not too expensive for a small number of nodes. These limit the size of a system. - Reveal limitations in extracting a subgraph from a very large graph. 14

The Time Complexity of an Algorithm on the Hybrid System T hybrid = Σ i = 1 to k min(T i, MapReduce + Δ, T i, hmt + Δ) -k: # steps -T i, MapReduce and T i, hmt : time complexities of the i th step on a MapReduce cluster and a highly multithreaded system, respectively. -Δ: n i / BW inter ×δ(i – 1, i), -n i : the input data size for the i th step. -BW inter : the bandwidth between a MapReduce cluster and a highly multithreaded system. -δ(i – 1, i): 0 if selected platforms for the i - 1 th and i th steps are same. 1, otherwise. 15

Test Platforms A MapReduce cluster -4 nodes -4 dual core 2.4 GHz Opteron processors and 8 GB main memory per node disks (1 TB per disk). A highly multithreaded system -A single socket UltraSparc T2 1.2 GHz processor (8 core, 64 threads). -32 GB main memory. -2 disks (145 GB per disk) A hybrid system of the two Source: Sun Fire T2000 (Niagara) Source: Sun Microsystems 16

17 A subgraph that covers 10% of the input graph Once the subgraph is loaded into the memory, the hybrid system analyzes the subgraph five orders of magnitude faster than the MapReduce cluster (103 hours vs 2.6 seconds). MapReduceHybrid Subgraph extraction 24 Memory loading Finding a shortest path (for 30 pairs)

MapReduceHybrid Subgraph extraction 22 Memory loading Finding a shortest path (for 30 pairs) MapReduceHybrid Subgraph extraction 21 Memory loading Finding a shortest path (for 30 pairs) Subgraphs that cover 5% (left) and 2% (right) of the input graph

Conclusions We identified the key computational challenges in large-scale complex network analysis problems. Our hybrid system effectively addresses the challenges by using a right tool in a right place in a synergistic way. Our work showcases a holistic approach to solve real-world challenges. 19

Acknowledgment of Support 20