Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Structures and Algorithms in Parallel Computing

Similar presentations


Presentation on theme: "Data Structures and Algorithms in Parallel Computing"— Presentation transcript:

1 Data Structures and Algorithms in Parallel Computing
Lecture 5

2 BSP Processors + network + synchronization Superstep
Concurrent parallel computation Message exchanges between processors Barrier synchronization All processors reaching this point wait for the rest

3 Supersteps A BSP algorithm is a sequence of supersteps
Computation superstep Many small steps Example: floating point operations (addition, subtraction, etc.) Communication superstep Communication operations each transmitting a data word Example: transfer a real number between 2 processors In theory we distinguish between the 2 types of supersteps In practice we assume a single superstep

4 Some applications Pagerank Single Source Shortest Path (SSSP)
Connected Components

5 Pagerank Analysis algorithm to determine the importance of a document
Based on the number of references to it and the importance of the source documents Named after Larry Page

6 Pagerank

7 Pagerank Source: wikipedia

8 Solving Pagerank System of linear equations
Iterative loop till convergence

9 Pagerank in Pregel

10 Experimental results On Apache Giraph
Taken from

11 SSSP Find shortest path between a single source vertex and every other vertex in the graph Dijsktra’s algorithm for sequential computations

12 Sequential SSSP Source: wikipedia

13 SSSP in Pregel

14 Experimental results Binary trees

15 Connected components (recap)
Label 2 vertices with same label iff there is a path between the two Sequentially it can be achieved by depth first or breadth first search

16 CC in Pregel Use graph contraction Algorithm
Each vertex starts with a label Each vertex sends its label to all neighbors Each vertex replaces its label with the minimum (maximum) value it receives from neighbors Algorithm stops when convergence is achieved

17 Experimental results

18 Apache Giraph Pregel is proprietary
Giraph is an open source Pregel implementation Runs on standard Hadoop Computation is executed in memory Can be a job in a pipeline (MapReduce) Uses Zookeeper for synchronization

19 Building an application
Create a custom vertex by extending BasicVertex Create a custom input format Adjacency list where each line looks like vertexID neighborID1 neighborID2 … Extend the TextVertexInputFormat Create a custom output format Extend the TextVertexOutputFormat

20 What’s next? Vertex centric vs. subgraph centric Load balancing ...
Importance of partitioning and graph type ...


Download ppt "Data Structures and Algorithms in Parallel Computing"

Similar presentations


Ads by Google