Parallel Graph Algorithms

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Parallel Graph Algorithms
Distributed Breadth-First Search with 2-D Partitioning Edmond Chow, Keith Henderson, Andy Yoo Lawrence Livermore National Laboratory LLNL Technical report.
Data Structures & Algorithms Graph Search Richard Newman based on book by R. Sedgewick and slides by S. Sahni.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Graph Partitioning Donald Nguyen October 24, 2011.
Load Balancing and Termination Detection Load balance : - statically before the execution of any processes - dynamic during the execution of the processes.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Data Structures and Algorithms in Parallel Computing Lecture 7.
1 Directed Graphs Chapter 8. 2 Objectives You will be able to: Say what a directed graph is. Describe two ways to represent a directed graph: Adjacency.
Parallel Graph Algorithms Sathish Vadhiyar. Graph Traversal  Graph search plays an important role in analyzing large data sets  Relationship between.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
1 Distributed Vertex Coloring. 2 Vertex Coloring: each vertex is assigned a color.
Subject Four Graphs Data Structures. What is a graph? A data structure that consists of a set of nodes (vertices) and a set of edges that relate the nodes.
© 2006 Pearson Addison-Wesley. All rights reserved14 B-1 Chapter 14 (continued) Graphs.
Sparse LA Sathish Vadhiyar. Motivation  Sparse computations much more challenging than dense due to complex data structures and memory references  Many.
Breadth-First Search (BFS)
Auburn University
Graphs Chapter 20.
Elementary Graph Algorithms
Chapter 28 Graphs and Applications
Chapter 22 Elementary Graph Algorithms
Thinking about Algorithms Abstractly
Graph Algorithms BFS, DFS, Dijkstra’s.
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
CSC317 Graph algorithms Why bother?
Csc 2720 Instructor: Zhuojun Duan
Graph Search Lecture 17 CS 2110 Fall 2017.
Community detection in graphs
Data Structures and Algorithms in Parallel Computing
CS120 Graphs.
Data Structures and Algorithms for Information Processing
Spanning Trees Longin Jan Latecki Temple University based on slides by
Elementary Graph Algorithms
Graphs A graph G = (V, E) V = set of vertices, E = set of edges
Breadth-First Search (BFS)
Graph & BFS.
Graphs Representation, BFS, DFS
Tree Construction (BFS, DFS, MST) Chapter 5
Data Structures – Stacks and Queus
Graphs Chapter 13.
Parallel Sort, Search, Graph Algorithms
Search Related Algorithms
Graph Representation (23.1/22.1)
What is a Graph? a b c d e V= {a,b,c,d,e} E= {(a,b),(a,c),(a,d),
Trees CMSC 202, Version 5/02.
Graphs Part 2 Adjacency Matrix
CMSC 202 Trees.
Subgraphs, Connected Components, Spanning Trees
CSC4005 – Distributed and Parallel Computing
Load Balancing Definition: A load is balanced if no processes are idle
Graph Algorithms "A charlatan makes obscure what is clear; a thinker makes clear what is obscure. " - Hugh Kingsmill CLRS, Sections 22.2 – 22.4.
Algorithms Searching in a Graph.
Adaptivity and Dynamic Load Balancing
GRAPHS G=<V,E> Adjacent vertices Undirected graph
Graphs.
Parallel Graph Algorithms
Chapter 14 Graphs © 2011 Pearson Addison-Wesley. All rights reserved.
Elementary Graph Algorithms
CSC 325: Algorithms Graph Algorithms David Luebke /24/2019.
Lecture 10 Graph Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 32 Graphs I
INTRODUCTION A graph G=(V,E) consists of a finite non empty set of vertices V , and a finite set of edges E which connect pairs of vertices .
Presentation transcript:

Parallel Graph Algorithms Sathish Vadhiyar

Graph Traversal Graph search plays an important role in analyzing large data sets Relationship between data objects represented in the form of graphs Breadth first search used in finding shortest path or sets of paths

Level-synchronized algorithm Proceeds level-by-level starting with the source vertex Level of a vertex – its graph distance from the source Also, called frontier-based algorithm The parallel processes process a level, synchronize at the end of the level, before moving to the next level – Bulk Synchronous Parallelism (BSP) model How to decompose the graph (vertices, edges and adjacency matrix) among processors?

Distributed BFS with 1D Partitioning Each vertex and edges emanating from it are owned by one processor 1-D partitioning of the adjacency matrix Edges emanating from vertex v is its edge list = list of vertex indices in row v of adjacency matrix A

1-D Partitioning At each level, each processor owns a set F – set of frontier vertices owned by the processor Edge lists of vertices in F are merged to form a set of neighboring vertices, N Some vertices of N owned by the same processor, while others owned by other processors Messages are sent to those processors to add these vertices to their frontier set for the next level

Lvs(v) – level of v, i.e, graph distance from source vs

BFS on GPUs

BFS on GPUs One GPU thread for a vertex In each iteration, each vertex looks at its entry in the frontier array If true, it forms the neighbors and frontiers Severe load imbalance among the treads Scope for improvement

Depth First Search (DFS)

Parallel Depth First Search Easy to parallelize Left subtree can be searched in parallel with the right subtree Statically assign a node to a processor – the whole subtree rooted at that node can be searched independently. Can lead to load imbalance; Load imbalance increases with the number of processors (more later)

Maintaining Search Space Each processor searches the space depth-first Unexplored states saved as stack; each processor maintains its own local stack Initially, the entire search space assigned to one processor The stack is then divided and distributed to processors

Termination Detection As processors are independently, how will they know when to terminate the program? Dijikstra’s Token Termination Detection Algorithm Based on passing of a token in a logical ring; P0 initiates a token when idle; A processor holds a token until it has completed its work, and then passes to the next processor; when P0 receives again, then all processors have completed However, a processor may get more work after becoming idle due to dynamic load balancing (more later)

Tree Based Termination Detection Uses weights Initially processor 0 has weight 1 When a processor transfers work to another processor, the weights are halved in both the processors When a processor finishes, weights are returned Termination is when processor 0 gets back 1 Goes with the DFS algorithm; No separate communication steps Figure 11.10

Graph Partitioning

Graph Partitioning For many parallel graph algorithms, the graph has to be partitioned into multiple partitions and each processor takes care of a partition Criteria: The partitions must be balanced (uniform computations) The edge cuts between partitions must be minimal (minimizing communications) Some methods BFS: Find BFS and descend down the tree until the cumulative number of nodes = desired partition size Mostly: Multi-level partitioning based on coarsening and refinement (a bit advanced) Another popular method: Kernighan-Lin

Parallel partitioning Can use divide and conquer strategy A master node creates two partitions Keeps one for itself and gives the other partition to another processor Further partitioning by the two processors and so on…

Graph Coloring

Graph Coloring Problem Given G(A) = (V, E) σ: V {1,2,…,s} is s-coloring of G if σ(i) ≠ σ(j) for every (i, j) edge in E Minimum possible value of s is chromatic number of G Graph coloring problem is to color nodes with chromatic number of colors NP-complete problem

Parallel graph Coloring – General algorithm

Parallel Graph Coloring – Finding Maximal Independent Sets – Luby (1986) I = null V’ = V G’ = G While G’ ≠ empty Choose an independent set I’ in G’ I = I U I’; X = I’ U N(I’) (N(I’) – adjacent vertices to I’) V’ = V’ \ X; G’ = G(V’) end For choosing independent set I’: (Monte Carlo Heuristic) For each vertex, v in V’ determine a distinct random number p(v) v in I iff p(v) > p(w) for every w in adj(v) Color each MIS a different color Disadvantage: Each new choice of random numbers requires a global synchronization of the processors.

Parallel Graph Coloring – Gebremedhin and Manne (2003) Pseudo-Coloring

Sources/References Paper: A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L. Yoo et al. SC 2005. Paper:Accelerating large graph algorithms on the GPU usingCUDA. Harish and Narayanan. HiPC 2007. M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM Journal on Computing. 15(4)1036-1054 (1986) A.H. Gebremedhin, F. Manne, Scalable parallel graph coloring algorithms, Concurrency: Practice and Experience 12 (2000) 1131-1146.