Data Structures and Algorithms in Parallel Computing

Slides:



Advertisements
Similar presentations
Pregel: A System for Large-Scale Graph Processing
Advertisements

Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
epiC: an Extensible and Scalable System for Processing Big Data
Armend Hoxha Trevor Hodde Kexin Shi Mizan: A system for Dynamic Load Balancing in Large-Scale Graph Processing Presented by:
© 2004 Goodrich, Tamassia Breadth-First Search1 CB A E D L0L0 L1L1 F L2L2.
Breadth-First Search Seminar – Networking Algorithms CS and EE Dept. Lulea University of Technology 27 Jan Mohammad Reza Akhavan.
Edited by Malak Abdullah Jordan University of Science and Technology Data Structures Using C++ 2E Chapter 12 Graphs.
Distributed Graph Processing Abhishek Verma CS425.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Breadth-First Search1 Part-H3 Breadth-First Search CB A E D L0L0 L1L1 F L2L2.
CS 206 Introduction to Computer Science II 03 / 30 / 2009 Instructor: Michael Eckmann.
Big Data Infrastructure Jimmy Lin University of Maryland Monday, April 13, 2015 Session 10: Beyond MapReduce — Graph Processing This work is licensed under.
Paper by: Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) Pregel: A System for.
The Shortest Path Problem
Pregel: A System for Large-Scale Graph Processing
Data Structures Using C++ 2E
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki
Data Structures and Algorithms in Parallel Computing Lecture 2.
Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees Da Yan (CUHK), James Cheng (CUHK), Kai Xing (HKUST), Yi Lu (CUHK), Wilfred.
Is Your Graph Algorithm Eligible for Nondeterministic Execution? Zhiyuan Shao, Lin Hou, Yan Ai, Yu Zhang and Hai Jin Services Computing Technology and.
CSCI 115 Chapter 8 Topics in Graph Theory. CSCI 115 §8.1 Graphs.
Data Structures and Algorithms in Parallel Computing Lecture 4.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Data Structures and Algorithms in Parallel Computing
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
CSE 421 Algorithms Richard Anderson Autumn 2015 Lecture 5.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Representing Graphs Depth First Search Breadth First Search Graph Searching Algorithms.
Mizan:Graph Processing System
Graphs ORD SFO LAX DFW Graphs 1 Graphs Graphs
Breadth-First Search L0 L1 L2
Parallel Graph Algorithms
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Lecture 11 Graph Algorithms
Breadth-First Search L0 L1 L2 C B A E D F Breadth-First Search
PREGEL Data Management in the Cloud
CC 215 Data Structures Graph Searching
Graphs.
Breadth-First Search L0 L1 L2 C B A E D F Breadth-First Search
Breadth-First Search L0 L1 L2 C B A E D F Breadth-First Search
Connected Components Minimum Spanning Tree
Pregelix: Think Like a Vertex, Scale Like Spandex
Elementary Graph Algorithms
CSE 373 Data Structures Lecture 16
Richard Anderson Autumn 2016 Lecture 5
Subgraphs, Connected Components, Spanning Trees
Lecture 8: Synchronous Network Algorithms
Algorithms Lecture # 29 Dr. Sohail Aslam.
Richard Anderson Winter 2009 Lecture 6
CS 584 Project Write up Poster session for final Due on day of final
Lecture 12 CSE 331 Sep 22, 2014.
Breadth-First Search L0 L1 L2 C B A E D F 4/25/2019 3:12 AM
Important Problem Types and Fundamental Data Structures
Parallel Graph Algorithms
Richard Anderson Winter 2019 Lecture 6
Breadth-First Search L0 L1 L2 C B A E D F 5/14/ :22 AM
Lecture 10 Graph Algorithms
Richard Anderson Winter 2019 Lecture 5
Breadth-First Search L0 L1 L2 C B A E D F 7/28/2019 1:03 PM
Richard Anderson Autumn 2015 Lecture 6
Presentation transcript:

Data Structures and Algorithms in Parallel Computing Lecture 5

BSP Processors + network + synchronization Superstep Concurrent parallel computation Message exchanges between processors Barrier synchronization All processors reaching this point wait for the rest

Supersteps A BSP algorithm is a sequence of supersteps Computation superstep Many small steps Example: floating point operations (addition, subtraction, etc.) Communication superstep Communication operations each transmitting a data word Example: transfer a real number between 2 processors In theory we distinguish between the 2 types of supersteps In practice we assume a single superstep

Some applications Pagerank Single Source Shortest Path (SSSP) Connected Components

Pagerank Analysis algorithm to determine the importance of a document Based on the number of references to it and the importance of the source documents Named after Larry Page

Pagerank

Pagerank Source: wikipedia

Solving Pagerank System of linear equations Iterative loop till convergence

Pagerank in Pregel

Experimental results On Apache Giraph Taken from http://muratbuffalo.blogspot.ro/2015/09/one-trillion-edges-graph-processing-at.html

SSSP Find shortest path between a single source vertex and every other vertex in the graph Dijsktra’s algorithm for sequential computations

Sequential SSSP Source: wikipedia

SSSP in Pregel

Experimental results Binary trees

Connected components (recap) Label 2 vertices with same label iff there is a path between the two Sequentially it can be achieved by depth first or breadth first search

CC in Pregel Use graph contraction Algorithm Each vertex starts with a label Each vertex sends its label to all neighbors Each vertex replaces its label with the minimum (maximum) value it receives from neighbors Algorithm stops when convergence is achieved

Experimental results

Apache Giraph Pregel is proprietary Giraph is an open source Pregel implementation Runs on standard Hadoop Computation is executed in memory Can be a job in a pipeline (MapReduce) Uses Zookeeper for synchronization

Building an application Create a custom vertex by extending BasicVertex Create a custom input format Adjacency list where each line looks like vertexID neighborID1 neighborID2 … Extend the TextVertexInputFormat Create a custom output format Extend the TextVertexOutputFormat

What’s next? Vertex centric vs. subgraph centric Load balancing ... Importance of partitioning and graph type ...