Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.

Slides:



Advertisements
Similar presentations
Pregel: A System for Large-Scale Graph Processing
Advertisements

Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Distributed Graph Processing Abhishek Verma CS425.
PaaS Techniques Programming Model
Distributed Computations
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
MapReduce Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, Vol. 51, No. 1, January Shahram.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.
Pregel: A System for Large-Scale Graph Processing
Big Data Infrastructure Jimmy Lin University of Maryland Monday, April 13, 2015 Session 10: Beyond MapReduce — Graph Processing This work is licensed under.
Paper by: Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) Pregel: A System for.
Distributed Computations MapReduce
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 22: Stream Processing, Graph Processing All slides © IG.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Google Distributed System and Hadoop Lakshmi Thyagarajan.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
Pregel: A System for Large-Scale Graph Processing
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
Parallel Programming Models Basic question: what is the “right” way to write parallel programs –And deal with the complexity of finding parallelism, coarsening.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
Large-scale file systems and Map-Reduce Single-node architecture Memory Disk CPU Google example: 20+ billion web pages x 20KB = 400+ Terabyte 1 computer.
Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.
MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
MapReduce How to painlessly process terabytes of data.
Google’s MapReduce Connor Poske Florida State University.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
MapReduce and the New Software Stack CHAPTER 2 1.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Distributed Systems CS
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
Data Structures and Algorithms in Parallel Computing Lecture 4.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Data Structures and Algorithms in Parallel Computing
Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
MapReduce: Simplified Data Processing on Large Cluster Authors: Jeffrey Dean and Sanjay Ghemawat Presented by: Yang Liu, University of Michigan EECS 582.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
TensorFlow– A system for large-scale machine learning
Large-scale file systems and Map-Reduce
PREGEL Data Management in the Cloud
Data Structures and Algorithms in Parallel Computing
MapReduce Simplied Data Processing on Large Clusters
COS 418: Distributed Systems Lecture 1 Mike Freedman
湖南大学-信息科学与工程学院-计算机与科学系
Distributed Systems CS
Distributed Systems CS
CS 345A Data Mining MapReduce This presentation has been altered.
5/7/2019 Map Reduce Map reduce.
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkwoski Google, Inc. SIGMOD ’10 15 Mar 2013 Dong Chang

Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 2

Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 3

Introduction (1/2) 4

Introduction (2/2) 5  Many practical computing problems concern large graphs  MapReduce is ill-suited for graph processing –Many iterations are needed for parallel graph processing –Materializations of intermediate results at every MapReduce iteration harm performance Large graph data Web graph Transportation routes Citation relationships Social networks Graph algorithms PageRank Shortest path Connected components Clustering techniques

MapReduce Execution 6  Map invocations are distributed across multiple machine s by automatically partitioning the input data into a set o f M splits.  The input splits can be processed in parallel by different machines  Reduce invocations are distributed by partitioning the int ermediate key space into R pieces using a hash function: hash(key) mod R – R and the partitioning function are specified by the pr ogrammer.

MapReduce Execution 7 / 40

Data Flow  Input, final output are stored on a distributed file system – Scheduler tries to schedule map tasks “close” to physical storage location of input data  Intermediate results are stored on local file system of ma p and reduce workers  Output can be input to another map reduce task 8 / 40

MapReduce Execution 9 / 40

MapReduce Parallel Execution 10 / 40

Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 11

Computation Model (1/3) 12 Input Output Supersteps (a sequence of iterations)

Computation Model (2/3) 13  “Think like a vertex”  Inspired by Valiant’s Bulk Synchronous Parallel model (1990) Source:

Computation Model (3/3) 14  Superstep: the vertices compute in parallel –Each vertex  Receives messages sent in the previous superstep  Executes the same user-defined function  Modifies its value or that of its outgoing edges  Sends messages to other vertices (to be received in the next superstep)  Mutates the topology of the graph  Votes to halt if it has no further work to do –Termination condition  All vertices are simultaneously inactive  There are no messages in transit

An Example 15 / 40

Example: SSSP – Parallel BFS in Pregel 16 0    

Example: SSSP – Parallel BFS in Pregel 17 0            

Example: SSSP – Parallel BFS in Pregel  

Example: SSSP – Parallel BFS in Pregel  

Example: SSSP – Parallel BFS in Pregel

Example: SSSP – Parallel BFS in Pregel

Example: SSSP – Parallel BFS in Pregel

Example: SSSP – Parallel BFS in Pregel

Example: SSSP – Parallel BFS in Pregel

Differences from MapReduce 25  Graph algorithms can be written as a series of chained MapReduce invocation  Pregel –Keeps vertices & edges on the machine that performs computation –Uses network transfers only for messages  MapReduce –Passes the entire state of the graph from one stage to the next –Needs to coordinate the steps of a chained MapReduce

Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 26

C++ API 27  Writing a Pregel program –Subclassing the predefined Vertex class Override this! in msgs out msg

Example: Vertex Class for SSSP 28

Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 29

MapReduce Coordination  Master data structures – Task status: (idle, in-progress, completed) – Idle tasks get scheduled as workers become available – When a map task completes, it sends the master the l ocation and sizes of its R intermediate files, one for ea ch reducer – Master pushes this info to reducers  Master pings workers periodically to detect failures 30 / 40

Mapreduce Failures  Map worker failure –Map tasks completed or in-progress at worker are reset to idle –Reduce workers are notified when task is rescheduled on another worker  Reduce worker failure –Only in-progress tasks are reset to idle  Master failure –MapReduce task is aborted and client is notified 31 / 40

System Architecture 32  Pregel system also uses the master/worker model –Master  Maintains worker  Recovers faults of workers  Provides Web-UI monitoring tool of job progress –Worker  Processes its task  Communicates with the other workers  Persistent data is stored as files on a distributed storage system (such as GFS or BigTable)  Temporary data is stored on local disk

Execution of a Pregel Program 33 1.Many copies of the program begin executing on a cluster of machines 2.The master assigns a partition of the input to each worker –Each worker loads the vertices and marks them as active 3.The master instructs each worker to perform a superstep –Each worker loops through its active vertices & computes for each vertex –Messages are sent asynchronously, but are delivered before the end of the superstep –This step is repeated as long as any vertices are active, or any messages are in transit 4.After the computation halts, the master may instruct each worker to save its portion of the graph

Fault Tolerance 34  Checkpointing –The master periodically instructs the workers to save the state of their partitions to persistent storage  e.g., Vertex values, edge values, incoming messages  Failure detection –Using regular “ping” messages  Recovery –The master reassigns graph partitions to the currently available workers –The workers all reload their partition state from most recent available checkpoint

Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 35

Experiments 36  Environment –H/W: A cluster of 300 multicore commodity PCs –Data: binary trees, log-normal random graphs (general graphs)  Naïve SSSP implementation –The weight of all edges = 1 –No checkpointing

Experiments 37  SSSP – 1 billion vertex binary tree: varying # of worker tasks

Experiments 38  SSSP – binary trees: varying graph sizes on 800 worker tasks

Experiments 39  SSSP – Random graphs: varying graph sizes on 800 worker tasks

Outline  Introduction  Computation Model  Writing a Pregel Program  System Implementation  Experiments  Conclusion & Future Work 40

Conclusion & Future Work 41  Pregel is a scalable and fault-tolerant platform with an API that is sufficiently flexible to express arbitrary graph algorithms  Future work –Relaxing the synchronicity of the model  Not to wait for slower workers at inter-superstep barriers –Assigning vertices to machines to minimize inter-machine communication –Caring dense graphs in which most vertices send messages to most other vertices

Thank You!