Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,

Slides:



Advertisements
Similar presentations
Pregel: A System for Large-Scale Graph Processing
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
epiC: an Extensible and Scalable System for Processing Big Data
Armend Hoxha Trevor Hodde Kexin Shi Mizan: A system for Dynamic Load Balancing in Large-Scale Graph Processing Presented by:
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
Distributed Graph Processing Abhishek Verma CS425.
Spark: Cluster Computing with Working Sets
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
PaaS Techniques Programming Model
Distributed Computations
Yuzhou Zhang ﹡, Jianyong Wang #, Yi Wang §, Lizhu Zhou ¶ Presented by Nam Nguyen Parallel Community Detection on Large Networks with Propinquity Dynamics.
MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.
Pregel: A System for Large-Scale Graph Processing
Big Data Infrastructure Jimmy Lin University of Maryland Monday, April 13, 2015 Session 10: Beyond MapReduce — Graph Processing This work is licensed under.
Paper by: Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) Pregel: A System for.
Distributed Computations MapReduce
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Network Support for Cloud Services Lixin Gao, UMass Amherst.
Pregel: A System for Large-Scale Graph Processing
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
A System for Large-Scale Graph Processing
MapReduce: Simplified Data Processing on Large Clusters 컴퓨터학과 김정수.
Software tools for Complex Networks Analysis Giovanni Neglia, Small changes to a set of slides from Fabrice Huet, University of Nice Sophia- Antipolis.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz (Slides by Tyler S. Randolph)
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
MapReduce How to painlessly process terabytes of data.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Distributed Systems CS
Data Structures and Algorithms in Parallel Computing Lecture 4.
NETE4631 Network Information Systems (NISs): Big Data and Scaling in the Cloud Suronapee, PhD 1.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Data Structures and Algorithms in Parallel Computing
Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.
REX: RECURSIVE, DELTA-BASED DATA-CENTRIC COMPUTATION Yavuz MESTER Svilen R. Mihaylov, Zachary G. Ives, Sudipto Guha University of Pennsylvania.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Mizan:Graph Processing System
TensorFlow– A system for large-scale machine learning
Miraj Kheni Authors: Toyotaro Suzumura, Koji Ueno
The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.
PREGEL Data Management in the Cloud
Data Structures and Algorithms in Parallel Computing
MapReduce Simplied Data Processing on Large Clusters
Mayank Bhatt, Jayasi Mehar
Distributed Systems CS
Distributed Systems CS
Replication-based Fault-tolerance for Large-scale Graph Processing
Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC
Introduction to locality sensitive approach to distributed systems
Pregelix: Think Like a Vertex, Scale Like Spandex
Da Yan, James Cheng, Yi Lu, Wilfred Ng Presented By: Nafisa Anzum
MapReduce: Simplified Data Processing on Large Clusters
Iterative and non-Iterative Computations
Presentation transcript:

Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (GOOGLE, INC.)

Overview What is a graph? Graph Problems The Purpose of Pregel Model of Computation C++ API Implementation Applications Experiments

What is a graph? G = (V, E) Binary Tree

Graph Problems Network Routing Social Network Connections

The Purpose of Pregel Google was interested in applications that could perform internet-related graph algorithms, such as PageRank, so they designed Pregel to perform these tasks efficiently. It is a scalable, general-purpose system for implementing graph algorithms in a distributed environment. Focus on “Thinking Like a Vertex” and parallelism

Model of Computation

Model of Computation (Vertex) Vertex ID Vertex Value Edge Value Vertex ID Edge Value

Model of Computation (Superstep) Superstep 0Superstep 1Superstep 2 Execution Time Compute()

Model of Computation (Vertex Actions) A vertex can: Vertex ID Vertex Value Modify its values Receive messages from previous superstep Send messages Request topology changes

Model of Computation (State Machine)

C++ API

C++ API (Message Passing) Destination Vertex ID Message Value Message Buffer

C++ API (Combiners & Aggregators) Combiner Aggregator

C++ API (Topology Mutations) V Superstep

C++ API (Input and Output)

Implementation

Implementation (Basic Architecture)

Implementation (Program Execution) Flow: 1.Copy user program – Master copy & worker copies 2.Master assigns graph partitions 3.Master takes user input data, assigns to workers – load vertex data 4.Supersteps (Compute() and send messages) 5.Save output

Implementation (Fault Tolerance) Checkpoint Worker Save() Worker Save() Worker Save() Recover Worker Recompute() Worker Recompute() X

Implementation (Worker) Worker

Implementation (Master) List of Workers Master Partitions

Application s

Applications (Shortest Path)

Experiment s

Experiments (Description) Test the execution times of Pregel running the Single- Source Shortest Path algorithm. Use a cluster of 300 multicore commodity PCs. Run Pregel with Binary Tree graphs, and with a more realistic, randomly-distributed graph. Results do not include initialization, graph generation, and result verification times. Failure Recovery is not included (reduces overhead)

Conclusion Pregel is a model suitable for large-scale graph computing with a production-quality, scalable and fault tolerant implementation. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges. This implementation is flexible enough to express a broad set of algorithms.