Data Structures and Algorithms in Parallel Computing Lecture 5
BSP Processors + network + synchronization Superstep Concurrent parallel computation Message exchanges between processors Barrier synchronization All processors reaching this point wait for the rest
Supersteps A BSP algorithm is a sequence of supersteps Computation superstep Many small steps Example: floating point operations (addition, subtraction, etc.) Communication superstep Communication operations each transmitting a data word Example: transfer a real number between 2 processors In theory we distinguish between the 2 types of supersteps In practice we assume a single superstep
Some applications Pagerank Single Source Shortest Path (SSSP) Connected Components
Pagerank Analysis algorithm to determine the importance of a document Based on the number of references to it and the importance of the source documents Named after Larry Page
Pagerank
Pagerank Source: wikipedia
Solving Pagerank System of linear equations Iterative loop till convergence
Pagerank in Pregel
Experimental results On Apache Giraph Taken from http://muratbuffalo.blogspot.ro/2015/09/one-trillion-edges-graph-processing-at.html
SSSP Find shortest path between a single source vertex and every other vertex in the graph Dijsktra’s algorithm for sequential computations
Sequential SSSP Source: wikipedia
SSSP in Pregel
Experimental results Binary trees
Connected components (recap) Label 2 vertices with same label iff there is a path between the two Sequentially it can be achieved by depth first or breadth first search
CC in Pregel Use graph contraction Algorithm Each vertex starts with a label Each vertex sends its label to all neighbors Each vertex replaces its label with the minimum (maximum) value it receives from neighbors Algorithm stops when convergence is achieved
Experimental results
Apache Giraph Pregel is proprietary Giraph is an open source Pregel implementation Runs on standard Hadoop Computation is executed in memory Can be a job in a pipeline (MapReduce) Uses Zookeeper for synchronization
Building an application Create a custom vertex by extending BasicVertex Create a custom input format Adjacency list where each line looks like vertexID neighborID1 neighborID2 … Extend the TextVertexInputFormat Create a custom output format Extend the TextVertexOutputFormat
What’s next? Vertex centric vs. subgraph centric Load balancing ... Importance of partitioning and graph type ...