Download presentation
Presentation is loading. Please wait.
Published byDwayne Cain Modified over 6 years ago
1
Data Structures and Algorithms in Parallel Computing
Lecture 5
2
BSP Processors + network + synchronization Superstep
Concurrent parallel computation Message exchanges between processors Barrier synchronization All processors reaching this point wait for the rest
3
Supersteps A BSP algorithm is a sequence of supersteps
Computation superstep Many small steps Example: floating point operations (addition, subtraction, etc.) Communication superstep Communication operations each transmitting a data word Example: transfer a real number between 2 processors In theory we distinguish between the 2 types of supersteps In practice we assume a single superstep
4
Some applications Pagerank Single Source Shortest Path (SSSP)
Connected Components
5
Pagerank Analysis algorithm to determine the importance of a document
Based on the number of references to it and the importance of the source documents Named after Larry Page
6
Pagerank
7
Pagerank Source: wikipedia
8
Solving Pagerank System of linear equations
Iterative loop till convergence
9
Pagerank in Pregel
10
Experimental results On Apache Giraph
Taken from
11
SSSP Find shortest path between a single source vertex and every other vertex in the graph Dijsktra’s algorithm for sequential computations
12
Sequential SSSP Source: wikipedia
13
SSSP in Pregel
14
Experimental results Binary trees
15
Connected components (recap)
Label 2 vertices with same label iff there is a path between the two Sequentially it can be achieved by depth first or breadth first search
16
CC in Pregel Use graph contraction Algorithm
Each vertex starts with a label Each vertex sends its label to all neighbors Each vertex replaces its label with the minimum (maximum) value it receives from neighbors Algorithm stops when convergence is achieved
17
Experimental results
18
Apache Giraph Pregel is proprietary
Giraph is an open source Pregel implementation Runs on standard Hadoop Computation is executed in memory Can be a job in a pipeline (MapReduce) Uses Zookeeper for synchronization
19
Building an application
Create a custom vertex by extending BasicVertex Create a custom input format Adjacency list where each line looks like vertexID neighborID1 neighborID2 … Extend the TextVertexInputFormat Create a custom output format Extend the TextVertexOutputFormat
20
What’s next? Vertex centric vs. subgraph centric Load balancing ...
Importance of partitioning and graph type ...
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.