Data Structures and Algorithms in Parallel Computing

Slides:

Advertisements

Similar presentations

Load Balancing Parallel Applications on Heterogeneous Platforms.

Advertisements

Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.

epiC: an Extensible and Scalable System for Processing Big Data

1 TDD: Topics in Distributed Databases Distributed Query Processing MapReduce Vertex-centric models for querying graphs Distributed query evaluation by.

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

Fall 2008Parallel Query Optimization1. Fall 2008Parallel Query Optimization2 Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,

1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.

Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)

Distributed Graph Analytics Imranul Hoque CS525 Spring 2013.

Distributed Process Management

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

Distributed Graph Processing Abhishek Verma CS425.

APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.

LFGRAPH: SIMPLE AND FAST DISTRIBUTED GRAPH ANALYTICS Hoque, Imranul, Vmware Inc. and Gupta, Indranil, University of Illinois at Urbana-Champaign – TRIOS.

Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.

Reference: Message Passing Fundamentals.

1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.

Yuzhou Zhang ﹡, Jianyong Wang ＃, Yi Wang §, Lizhu Zhou ¶ Presented by Nam Nguyen Parallel Community Detection on Large Networks with Propinquity Dynamics.

Graph Processing Recap: data-intensive cloud computing – Just database management on the cloud – But scaling it to thousands of nodes – Handling partial.

Parallel Algorithms for Relational Operations. Models of Parallelism There is a collection of processors. –Often the number of processors p is large,

Fast algorithm for detecting community structure in networks.

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.

Paper by: Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) Pregel: A System for.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Pregel: A System for Large-Scale Graph Processing

Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.

1 Distributed Operating Systems and Process Scheduling Brett O’Neill CSE 8343 – Group A6.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,

X-Stream: Edge-Centric Graph Processing using Streaming Partitions

Distributed shared memory. What we’ve learnt so far  MapReduce/Dryad as a distributed programming model  Data-flow (computation as vertex, data flow.

Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.

CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.

1.1 Operating System Concepts Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered.

LogP and BSP models. LogP model Common MPP organization: complete machine connected by a network. LogP attempts to capture the characteristics of such.

Super computers Parallel Processing By Lecturer: Aisha Dawood.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Data Structures and Algorithms in Parallel Computing Lecture 2.

Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.

Data Structures and Algorithms in Parallel Computing Lecture 1.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

Data Structures and Algorithms in Parallel Computing Lecture 4.

Data Structures and Algorithms in Parallel Computing Lecture 3.

Data Structures and Algorithms in Parallel Computing Lecture 7.

Static Process Scheduling

Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.

Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”

Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.

Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.

EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.

Topo Sort on Spark GraphX Lecturer: 苟毓川

Parallel Graph Algorithms

Parallel Databases.

The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.

Parallel Programming By J. H. Wang May 2, 2017.

PREGEL Data Management in the Cloud

The University of Adelaide, School of Computer Science

Greedy Algorithm for Community Detection

Efficient and Simplified Parallel Graph Processing over CPU and MIC

Data Structures and Algorithms in Parallel Computing

Data Structures and Algorithms in Parallel Computing

Distributed Systems CS

Replication-based Fault-tolerance for Large-scale Graph Processing

Parallel Sort, Search, Graph Algorithms

COMP60621 Fundamentals of Parallel and Distributed Systems

3.3 Network-Centric Community Detection

Parallel Programming in C with MPI and OpenMP

COMP60611 Fundamentals of Parallel and Distributed Systems

Presentation transcript:

Data Structures and Algorithms in Parallel Computing Lecture 6

BSP Processors + network + synchronization Superstep Concurrent parallel computation Message exchanges between processors Barrier synchronization All processors reaching this point wait for the rest

Supersteps A BSP algorithm is a sequence of supersteps Computation superstep Many small steps Example: floating point operations (addition, subtraction, etc.) Communication superstep Communication operations each transmitting a data word Example: transfer a real number between 2 processors In theory we distinguish between the 2 types of supersteps In practice we assume a single superstep

Vertex centric model Simple distributed programming model Algorithms are expressed by “thinking like a vertex” A vertex contains information about itself and the outgoing edges Computation is expressed at vertex level Vertex execution take place in parallel and are interleaved with synchronized message exchanges

Disadvantages Costly messaging due to vertex logic Porting shared memory algorithms to vertex centric ones may not be trivial Decoupled programming logic from data layout on disk IO penalties

Thinking like a graph Subgraph centric abstraction Express computation at subgraph instead of vertex level Information flows freely inside the subgraph Messages are sent only across subgraphs

Subgraph centric model Graph is k-partitioned Subgraph is a connected component or a weakly connected component for directed graphs Two subgraphs do not share vertices A partition can store one or more subgraphs Partitions are distributed Subgraph is a meta-vertex Remote edges connect them together Each subgraph is an independent unit of computation

Subgraph centric programming User logic operates on a sub graph as an independent unit of computation Execution follows a BSP model Resource allocation Single Partition → Single Machine Single Sub-graph → Single CPU Data loading Complete partition is loaded on to the memory before computation Sub graph tasks keeps sub-graphs in memory within the task scope Sub-graph-Task 1 Sub-graph-Task 2 Sub-graph-Task 3 Sub-graph – Weakly connected component identified within a graph partition. So if two sub graphs in the same partition have a connected edge they become one by definition

Advantages Messages exchanged Number of supersteps Direct access to entire subgraph Messages sent only across partitions Subgraphs are disconnected Pregel has aggregators but they operate after messages are sent Number of supersteps Depending on algorithm the required supersteps can be reduced Limited synchronization overhead Reduces the skew in execution due to unbalanced partitions Reuse of single-machine algorithms Direct reuse of shared memory graph algorithms

GoFFish https://github.com/usc-cloud/goffish Subgraph programming abstraction Gopher Flexibility of reusing well-known shared memory graph abstractions Leverages subgraph concurrency within a multicore machine and distributed scaling using BSP Efficient distributed storage GoFS Write once read many approach Method naming similar to that of Pregel

Find max example

Find max example

Subgraph centric SSSP

Pagerank case study Same number of supersteps to converge as the vertex centric approach Alternative is to use blockrank Assumes some websites to be highly interconnected Like subgraphs Calculates pagerank for vertices by treating blocks (subgraphs) independently (1 SS) Ranks each block based on its relative importance (1 SS) Normalizes the vertex pagerank with the block rank to use as initial value before running the classic pagerank (n SS) Costlier first superstep but faster convergence in the last n supersteps

Efficiency

Execution time skew For the 1st superstep in Pagerank

Subgraph centric w/o the abstraction Louvain community detection Given graph check where there is a “natural division” of vertices Is edge cut realistic enough? Modularity based community detection mc – number of edges inside community c dc – sum of degrees of vertices inside community c M – total number of edges C – set of all communities in graph

Louvain sequential algorithm A greedy modularity maximization approach. Two main steps repeated iteratively while(improvement) { mod = detect-communities() if( (mod – prev_mod) > T) improvement = true else improvement = false prev_mod = mod collapse-graph() }

Louvain algorithm (2) Scan through all the nodes in a given order Nodes adopts its neighbors community joining which gives a maximum +ve increase in modularity This processed repeated iteratively until local maximum modularity is reached Applicable Notes

Louvain algorithm (3) New network is built collapsing communities into single nodes Applicable Notes

Observations First iteration is the costliest iteration 79.53% of total time on average Graph reduced to much smaller graph after the first iteration First iteration community structures are smaller Small number of vertices

Analysis Community graphs 6x times improvement over sequential No degradation in result quality

Parallel Louvain Partition graph in k subgraphs Run first Louvain iteration (the costliest) in parallel Merge and run the sequential Louvain PMETIS Partitioning Louvainp0 Louvainpn-1 … Iteration 1 Iteration 2 to N

What’s next? Load balancing Parallel sorting Importance of partitioning and graph type Parallel sorting Parallel computational geometry Parallel numerical algorithms …