PowerGraph: Distributed Graph- Parallel Computation on Natural Graphs Joseph E. Gonzalez, Yucheng Low, Haijie Gu, and Danny Bickson, Carnegie Mellon University;

Slides:



Advertisements
Similar presentations
Differentiated Graph Computation and Partitioning on Skewed Graphs
Advertisements

METIS Three Phases Coarsening Partitioning Uncoarsening
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
Distributed Breadth-First Search with 2-D Partitioning Edmond Chow, Keith Henderson, Andy Yoo Lawrence Livermore National Laboratory LLNL Technical report.
GraphChi: Big Data – small machine
Yucheng Low Aapo Kyrola Danny Bickson A Framework for Machine Learning and Data Mining in the Cloud Joseph Gonzalez Carlos Guestrin Joe Hellerstein.
LFGRAPH: SIMPLE AND FAST DISTRIBUTED GRAPH ANALYTICS Hoque, Imranul, Vmware Inc. and Gupta, Indranil, University of Illinois at Urbana-Champaign – TRIOS.
Distributed Message Passing for Large Scale Graphical Models Alexander Schwing Tamir Hazan Marc Pollefeys Raquel Urtasun CVPR2011.
Libperf libperf provides a tracing interface into the Linux Kernel Performance Counters (LKPC) subsystem recently introduced into the Linux Kernel mainline.
What is the next line of the proof? a). Let G be a graph with k vertices. b). Assume the theorem holds for all graphs with k+1 vertices. c). Let G be a.
Graph Processing Recap: data-intensive cloud computing – Just database management on the cloud – But scaling it to thousands of nodes – Handling partial.
Dynamic Hypercube Topology Stefan Schmid URAW 2005 Upper Rhine Algorithms Workshop University of Tübingen, Germany.
GraphLab A New Parallel Framework for Machine Learning Carnegie Mellon Based on Slides by Joseph Gonzalez Mosharaf Chowdhury.
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud Alexander G. Connor Panos K. Chrysanthis Alexandros Labrinidis Advanced Data Management.
What is the next line of the proof? a). Assume the theorem holds for all graphs with k edges. b). Let G be a graph with k edges. c). Assume the theorem.
K-Coloring k-coloring: A k-coloring of a graph G is a labeling f: V(G)  S, where |S|=k. The labels are colors; the vertices of one color form a color.
Joseph Gonzalez Postdoc, UC Berkeley AMPLab A System for Distributed Graph-Parallel Machine Learning Yucheng Low Aapo Kyrola.
Balanced Graph Edge Partition ACM KDD 2014 Florian Bourse ENS Marc Lelarge INRIA-ENS Milan Vojnovic Microsoft Research.
Joseph Gonzalez Yucheng Low Aapo Kyrola Danny Bickson Joe Hellerstein Alex Smola Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu The.
BiGraph BiGraph: Bipartite-oriented Distributed Graph Partitioning for Big Learning Jiaxin Shi Rong Chen, Jiaxin Shi, Binyu Zang, Haibing Guan Institute.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
GraphLab A New Framework for Parallel Machine Learning
Titan Graph Database Meet Bhatt(13MCEC02).
The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.
Carnegie Mellon University GraphLab Tutorial Yucheng Low.
Network Aware Resource Allocation in Distributed Clouds.
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
GRAPH PROCESSING Hi, I am Mayank and the second presenter for today is Shadi. We will be talking about Graph Processing.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Carnegie Mellon Yucheng Low Aapo Kyrola Danny Bickson A Framework for Machine Learning and Data Mining in the Cloud Joseph Gonzalez Carlos Guestrin Joe.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Joseph Gonzalez Yucheng Low Danny Bickson Distributed Graph-Parallel Computation on Natural Graphs Haijie Gu Joint work with: Carlos Guestrin.
Carnegie Mellon Yucheng Low Aapo Kyrola Danny Bickson A Framework for Machine Learning and Data Mining in the Cloud Joseph Gonzalez Carlos Guestrin Joe.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
BOĞAZİÇİ UNIVERSITY – COMPUTER ENGINEERING Mehmet Balman Computer Engineering, Boğaziçi University Parallel Tetrahedral Mesh Refinement.
Parallel Graph Partioning Using Simulated Annealing Parallel and Distributed Computing I Sadik Gokhan Caglar.
Data Structures and Algorithms in Parallel Computing
Graph-Based Parallel Computing
Computing Approximate Weighted Matchings in Parallel Fredrik Manne, University of Bergen with Rob Bisseling, Utrecht University Alicia Permell, Michigan.
A Distributed Framework for Machine Learning and Data Mining in the Cloud BY YUCHENG LOW, JOSEPH GONZALEZ, AAPO KYROLA, DANNY BICKSON, CARLOS GUESTRIN.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Supporting On-Demand Elasticity in Distributed Graph Processing Mayank Pundir*, Manoj Kumar, Luke M. Leslie, Indranil Gupta, Roy H. Campbell University.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Parallel and Distributed Systems for Probabilistic Reasoning
Data Driven Resource Allocation for Distributed Learning
CSCI5570 Large Scale Data Processing Systems
Distributed Graph-Parallel Computation on Natural Graphs
Maximum Matching in the Online Batch-Arrival Model
CSCI5570 Large Scale Data Processing Systems
Hagen-Kahng EIG Partitioning
Mayank Bhatt, Jayasi Mehar
Replication-based Fault-tolerance for Large-scale Graph Processing
Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC
تقسیم گراف در سیستم های کلان داده مرکزیت راس
Integrating Efficient Partitioning Techniques for Graph Oriented Applications My dissertation work represents a study of load balancing and data locality.
Da Yan, James Cheng, Yi Lu, Wilfred Ng Presented By: Nafisa Anzum
Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC
Key Manager Domains February, 2019.
Network Flow.
Saeed Rahmani, Dr. Mohammd Hadi Sadroddini Shiraz University
Splash Belief Propagation:
Algorithms Lecture # 27 Dr. Sohail Aslam.
Parallel Programming in C with MPI and OpenMP
Network Flow.
Presentation transcript:

PowerGraph: Distributed Graph- Parallel Computation on Natural Graphs Joseph E. Gonzalez, Yucheng Low, Haijie Gu, and Danny Bickson, Carnegie Mellon University; Carlos Guestrin, University of Washington

Current State 1.Many MLDM problems represented as graphs 2.Graph structured computation is important 3. Graphs are big 4. Current systems provide graph parallel computation – Pregel – GraphLab

Solution 1: Pregel Vertex Program

Solution 2: GraphLab Shared Distributed Graph

Problem Many graphs have skewed degree distribution Issue: Natural Graphs Machine 1 Machine 3

What is a Natural Graph

GraphLab and Pregel on Natural Graphs Work Imbalance Random Partitioning Storage is linear in degree Expensive Communication

Solution PregelPowerGraph Edge CutVertex Cut Replicate EdgesReplicate Vertices Parallelize Vertex Program across all machines with that vertex

Balanced P-way Vertex Cut V V V Idea: Distribute edges while minimizing vertex replications

Distributing Edges: Random Idea: Randomly Assign Edges to Machines - Why is this better than Pregel? Theorem: For a Given edge-cut with g ghosts, any vertex cut along the same partition boundary has fewer than g mirrors.

Distributing Edges: Greedy -Further minimize replication of vertices -Idea: Place next edge that minimizes vertex replication -Greedy Approaches -Coordinated -Oblivious

Edge Distribution

Implementations Synchronous (Pregel) Asynchronous Asynchronous and Serializable (GraphLab)

Discussion: Edge Placement and Run Time

Discussion: GAS Decomposition Gather: collect information about surrounding vertices Apply: Vertex updates value based on gathered data Scatter: Vertex shares its new value with neighbors

What About Alpha? PowerGraph is a solution to Natural Graphs Can we do better if alpha is always around 2?

Fully Characterizing Natural Graphs Conclusions: -Out degree grows overtime, changing the value of alpha -Vertex diameters often decrease as a graph grows What does this mean when graphs are constantly changing in PowerGraph?

Takeaways Vertex Cut implementation allows for greater parallelization of vertex programs and reduced replication of mirrors GAS Decomposition is not fundamental to PowerGraph’s Implementation