Distributed Systems CS

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
epiC: an Extensible and Scalable System for Processing Big Data
Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.
Distributed Graph Processing Abhishek Verma CS425.
Spark: Cluster Computing with Working Sets
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Yucheng Low Aapo Kyrola Danny Bickson A Framework for Machine Learning and Data Mining in the Cloud Joseph Gonzalez Carlos Guestrin Joe Hellerstein.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
GraphLab A New Parallel Framework for Machine Learning Carnegie Mellon Based on Slides by Joseph Gonzalez Mosharaf Chowdhury.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Distributed Systems CS Programming Models- Part V Replication and Consistency- Part I Lecture 18, Oct 29, 2014 Mohammad Hammoud 1.
Paper by: Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) Pregel: A System for.
Distributed Systems (15-440) Mohammad Hammoud December 4 th, 2013.
Pregel: A System for Large-Scale Graph Processing
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Carnegie Mellon University GraphLab Tutorial Yucheng Low.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Distributed shared memory. What we’ve learnt so far  MapReduce/Dryad as a distributed programming model  Data-flow (computation as vertex, data flow.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz (Slides by Tyler S. Randolph)
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.
Distributed Systems CS
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Data Structures and Algorithms in Parallel Computing Lecture 4.
Data Structures and Algorithms in Parallel Computing
Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
A Distributed Framework for Machine Learning and Data Mining in the Cloud BY YUCHENG LOW, JOSEPH GONZALEZ, AAPO KYROLA, DANNY BICKSON, CARLOS GUESTRIN.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Chenning Xie+, Rong Chen+, Haibing Guan*, Binyu Zang+ and Haibo Chen+
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
TensorFlow– A system for large-scale machine learning
Majd F. Sakr, Suhail Rehman and
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Spark Presentation.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Distributed Systems CS
PREGEL Data Management in the Cloud
Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.
CHAPTER 3 Architectures for Distributed Systems
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Distributed Systems CS
Data Structures and Algorithms in Parallel Computing
湖南大学-信息科学与工程学院-计算机与科学系
Distributed Systems CS
Replication-based Fault-tolerance for Large-scale Graph Processing
Pregelix: Think Like a Vertex, Scale Like Spandex
Distributed Systems CS
Distributed Systems CS
Distributed Systems CS
CS639: Data Management for Data Science
5/7/2019 Map Reduce Map reduce.
Distributed Systems (15-440)
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Distributed Systems CS 15-440 Pregel & GraphLab Lecture 14, October 30, 2017 Mohammad Hammoud

Today… Last Session: Today’s Session: Announcements: Hadoop (Cont’d) Pregel & GraphLab Announcements: PS4 is due on Wed November 1st by midnight P3 is due on Nov 12th by midnight

Distributed Analytics Frameworks Pregel Introduction & Execution Model Programming Model Architectural & Scheduling Models

Google’s Pregel MapReduce is a good fit for a wide array of large-scale applications but ill-suited for graph processing Pregel is a large-scale “graph-parallel” distributed analytics framework Some Characteristics: In-Memory across iterations (or super-steps) High scalability Automatic fault-tolerance Flexibility in expressing graph algorithms Message-Passing programming model Tree-style, master-slave architecture Synchronous Pregel is inspired by Valiant’s Bulk Synchronous Parallel (BSP) model

The BSP Model Iterations Barrier Barrier Barrier Super-Step 1 Data Barrier Data Data Data CPU 1 CPU 2 CPU 1 CPU 1 Data CPU 2 CPU 2 Data Data CPU 3 CPU 3 CPU 3 Data Data Data Super-Step 1 Super-Step 2 Super-Step 3

Google’s Pregel: A Bird’s Eye View The input to Pregel is a directed graph, which can be stored on a distributed storage layer (e.g., GFS or Bigtable) The input graph is partitioned (e.g., using hash partitioning) and distributed across cluster machines Execution is pursued in super-steps and final output can be stored again in a distributed storage layer HDFS BLK A Master Machine HDFS BLK Dataset To HDFS HDFS BLK HDFS HDFS BLK

Distributed Analytics Frameworks Pregel Introduction & Execution Model Programming Model Architectural & Scheduling Models

Workflow and the Programming Model A user-defined function (say F) is executed at each vertex (say V) F can read messages sent to V in super-step S – 1 and send messages to other vertices, which will receive them at super-step S + 1 F can modify the state of V and its outgoing edges F can alter the topology of the graph Messages in F are “explicitly” sent/received by programmers Hence, Pregel employs a message-passing programming model Machines Local Computations Communication Barrier Synchronization Vertical Structure of a Super-Step

When Does a Pregel Program Terminate? A Pregel program works as follows: At super-step 0, every vertex is active ONLY active vertices in any super-step perform computations A vertex deactivates itself by voting to halt It, subsequently, enters an inactive state A vertex can return to an active state if it receives an external message A Pregel program terminates when: All vertices are inactive And, there are no messages in transit Vote to Halt Active Inactive Message Received Vertex State Machine

Example: Find Max Value 3 6 2 1 Blue Arrows are messages 3 6 2 1 6 6 2 6 Blue vertices have voted to halt S1 6 6 2 6 6 S2 6 6 S3

Distributed Analytics Frameworks Pregel Introduction & Execution Model Programming Model Architectural & Scheduling Models

The Architectural and Scheduling Models Pregel assumes a tree-style network topology and a master-slave architecture Core Switch Rack Switch Rack Switch Worker1 Worker2 Worker3 Worker4 Worker5 Master Push work (i.e., partitions) to all workers Send Completion Signals When the master receives the completion signal from every worker in super-step S, it starts super-step S + 1

Google’s Pregel: Summary Aspect Google’s Pregel Programming Model Message-Passing Execution Model Synchronous Architectural Model Master-Slave Scheduling Model Push-Based Suitable Applications Strongly-Connected Applications

Suitable Applications MapReduce vs. Pregel Aspect Hadoop MapReduce Google’s Pregel Programming Model Shared-Based Message-Passing Execution Model Synchronous Architectural Model Master-Slave Scheduling Model Pull-Based Push-Based Suitable Applications Loosly-Connected Strongly-Connected

The GraphLab Analytics Engine Motivation & Definition Input, Output & Components The Architectural Model The Programming Model The Computation Model

Motivation for GraphLab There is an exponential growth in the scale of Machine Learning and Data Mining (MLDM) algorithms Designing, implementing, and testing MLDM at large-scale are challenging due to: Synchronization Deadlocks Scheduling Distributed state management Fault-tolerance The interest on analytics engines that can execute MLDM algorithms automatically and efficiently is increasing MapReduce is inefficient with iterative jobs (common in MLDM algorithms) Pregel cannot run asynchronous problems (common in MLDM algorithms)

What is GraphLab? GraphLab is a large-scale graph-parallel distributed analytics engine Some Characteristics: In-Memory (opposite to MapReduce and similar to Pregel) High scalability Automatic fault-tolerance Flexibility in expressing arbitrary graph algorithms (more flexible than Pregel) Shared-based abstraction (opposite to Pregel but similar to MapReduce) Peer-to-peer architecture (dissimilar to Pregel and MapReduce) Asynchronous (dissimilar to Pregel and MapReduce)

The GraphLab Analytics Engine Motivation & Definition Input, Output & Components The Architectural Model The Programming Model The Computation Model

Input, Graph Flow and Output GraphLab assumes problems modeled as graphs It adopts two phases, the initialization and the execution phases Initialization Phase GraphLab Execution Phase Distributed File system (MapReduce) Graph Builder Distributed File system Cluster Distributed File system TCP RPC Comms Parsing + Partitioning Atom Index Atom Index Raw Graph Data Monitoring + Atom Placement GraphLab assumes input problems modeled as graphs, in which vertices represent computations and edges encode data dependences or communication Atom File Atom File Atom File Atom File Atom Collection GL Engine Raw Graph Data Atom File Atom File Atom File Atom File GL Engine Atom File Atom File Atom File Atom File Index Construction GL Engine

Components of the GraphLab Engine: The Data-Graph The GraphLab engine incorporates three main parts: The data-graph, which represents the user program state at a cluster machine Vertex Edge Data-Graph

Components of the GraphLab Engine: The Update Function The GraphLab engine incorporates three main parts: The update function, which involves two main sub-functions: 2.1- Altering data within a scope of a vertex 2.2- Scheduling future update functions at neighboring vertices Sv The scope of a vertex v (i.e., Sv) is the data stored in v and in all v’s adjacent edges and vertices v

Components of the GraphLab Engine: The Update Function The GraphLab engine incorporates three main parts: The update function, which involves two main sub-functions: 2.1- Altering data within a scope of a vertex 2.2- Scheduling future update functions at neighboring vertices Algorithm: The GraphLab Execution Engine Schedule v The update function

Components of the GraphLab Engine: The Update Function The GraphLab engine incorporates three main parts: The update function, which involves two main sub-functions: 2.1- Altering data within a scope of a vertex 2.2- Scheduling future update functions at neighboring vertices CPU 1 e f g k j i h d c b a b c Scheduler e f b a i h i j CPU 2 The process repeats until the scheduler is empty

Components of the GraphLab Engine: The Sync Operation The GraphLab engine incorporates three main parts: The sync operation, which maintains global statistics describing data stored in the data- graph Global values maintained by the sync operation can be written by all update functions across the cluster machines The sync operation is similar to Pregel’s aggregators A mutual exclusion mechanism is applied by the sync operation to avoid write- write conflicts For scalability reasons, the sync operation is not enabled by default

The GraphLab Analytics Engine Motivation & Definition Input, Output & Components The Architectural Model The Programming Model The Computation Model

The Architectural Model GraphLab adopts a peer-to-peer architecture All engine instances are symmetric Engine instances communicate together using Remote Procedure Call (RPC) protocol over TCP/IP The first triggered engine has an additional responsibility of being a monitoring/master engine Advantages: Highly scalable Precludes centralized bottlenecks and single point of failures Main disadvantage: Complexity

The GraphLab Analytics Engine Motivation & Definition Input, Output & Components The Architectural Model The Programming Model The Computation Model

The Programming Model GraphLab offers a shared-based programming model It allows scopes to overlap and vertices to read/write from/to their scopes

Consistency Models in GraphLab GraphLab guarantees sequential consistency Provides the same result as a sequential execution of the computational steps User-defined consistency models Full Consistency Vertex Consistency Edge Consistency Full Consistency Edge Consistency Vertex Consistency Vertex v

Consistency Models in GraphLab Read Consistency Full Model Write 1 2 3 4 5 D1↔2 D2↔3 D3↔4 D4↔5 D1 D2 D3 D4 D5 Read Consistency Edge Model Write 1 2 3 4 5 D1↔2 D2↔3 D3↔4 D4↔5 D1 D2 D3 D4 D5 Read Write Consistency Vertex Model 1 2 3 4 5 D1↔2 D2↔3 D3↔4 D4↔5 D1 D2 D3 D4 D5

The GraphLab Analytics Engine Motivation & Definition Input, Output & Components The Architectural Model The Programming Model The Computation Model

The Computation Model GraphLab employs an asynchronous computation model It suggests two asynchronous engines Chromatic Engine Locking Engine The chromatic engine executes vertices partially asynchronous It applies vertex coloring (e.g., no adjacent vertices share the same color) All vertices with the same color are executed before proceeding to a different color The locking engine executes vertices fully asynchronously Data on vertices and edges are susceptible to corruption It applies a permission-based distributed mutual exclusion mechanism to avoid read- write and write-write hazards Asynchronous systems avoid waiting at pre-determined points for the completion of a certain task/vertex. This allows MLDM algorithms to converge faster (e.g., linear systems, belief propagation, expectation maximization and stochastic optimization algorithms converge faster on asynchronous systems).

How Does GraphLab Compare to MapReduce and Pregel?

GraphLab vs. Pregel vs. MapReduce Aspect Hadoop MapReduce Pregel GraphLab Programming Model Shared-Based Message-Passing Shared-Memory Computation Model Synchronous Asynchronous Parallelism Model Data-Parallel Graph-Parallel Architectural Model Master-Slave Peer-to-Peer Task/Vertex Scheduling Model Pull-Based Push-Based Application Suitability Loosely-Connected/Embarrassingly Parallel Applications Strongly-Connected Applications Strongly-Connected Applications (more precisely MLDM apps) Aspect Hadoop MapReduce Pregel GraphLab Programming Model Shared-Based Message-Passing Shared-Memory Computation Model Synchronous Asynchronous Parallelism Model Data-Parallel Graph-Parallel Architectural Model Master-Slave Peer-to-Peer Task/Vertex Scheduling Model Pull-Based Push-Based Aspect Hadoop MapReduce Pregel GraphLab Programming Model Shared-Based Message-Passing Shared-Memory Computation Model Synchronous Asynchronous Parallelism Model Data-Parallel Graph-Parallel Architectural Model Master-Slave Peer-to-Peer Aspect Hadoop MapReduce Pregel GraphLab Programming Model Shared-Based Message-Passing Shared-Memory Computation Model Synchronous Asynchronous Aspect Hadoop MapReduce Pregel GraphLab Programming Model Shared-Based Message-Passing Shared-Memory Computation Model Synchronous Asynchronous Parallelism Model Data-Parallel Graph-Parallel Aspect Hadoop MapReduce Pregel GraphLab Programming Model Shared-Based Message-Passing Shared-Memory

Next Week… Caching

PageRank PageRank is a link analysis algorithm The rank value indicates an importance of a particular web page A hyperlink to a page counts as a vote of support A page that is linked to by many pages with high PageRank receives a high rank itself A PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 PageRank PageRank is a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page.

PageRank (Cont’d) Iterate: Where: α is the random reset probability L[j] is the number of links on page j 1 2 3 4 5 6

PageRank Example in GraphLab PageRank algorithm is defined as a per-vertex operation working on the scope of the vertex pagerank(i, scope){ // Get Neighborhood data (R[i], Wij, R[j]) scope; // Update the vertex data // Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); } Dynamic computation