Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Structures and Algorithms in Parallel Computing

Similar presentations


Presentation on theme: "Data Structures and Algorithms in Parallel Computing"— Presentation transcript:

1 Data Structures and Algorithms in Parallel Computing
Lecture 6

2 BSP Processors + network + synchronization Superstep
Concurrent parallel computation Message exchanges between processors Barrier synchronization All processors reaching this point wait for the rest

3 Supersteps A BSP algorithm is a sequence of supersteps
Computation superstep Many small steps Example: floating point operations (addition, subtraction, etc.) Communication superstep Communication operations each transmitting a data word Example: transfer a real number between 2 processors In theory we distinguish between the 2 types of supersteps In practice we assume a single superstep

4 Vertex centric model Simple distributed programming model
Algorithms are expressed by “thinking like a vertex” A vertex contains information about itself and the outgoing edges Computation is expressed at vertex level Vertex execution take place in parallel and are interleaved with synchronized message exchanges

5 Disadvantages Costly messaging due to vertex logic
Porting shared memory algorithms to vertex centric ones may not be trivial Decoupled programming logic from data layout on disk IO penalties

6 Thinking like a graph Subgraph centric abstraction
Express computation at subgraph instead of vertex level Information flows freely inside the subgraph Messages are sent only across subgraphs

7 Subgraph centric model
Graph is k-partitioned Subgraph is a connected component or a weakly connected component for directed graphs Two subgraphs do not share vertices A partition can store one or more subgraphs Partitions are distributed Subgraph is a meta-vertex Remote edges connect them together Each subgraph is an independent unit of computation

8 Subgraph centric programming
User logic operates on a sub graph as an independent unit of computation Execution follows a BSP model Resource allocation Single Partition → Single Machine Single Sub-graph → Single CPU Data loading Complete partition is loaded on to the memory before computation Sub graph tasks keeps sub-graphs in memory within the task scope Sub-graph-Task 1 Sub-graph-Task 2 Sub-graph-Task 3 Sub-graph – Weakly connected component identified within a graph partition. So if two sub graphs in the same partition have a connected edge they become one by definition

9 Advantages Messages exchanged Number of supersteps
Direct access to entire subgraph Messages sent only across partitions Subgraphs are disconnected Pregel has aggregators but they operate after messages are sent Number of supersteps Depending on algorithm the required supersteps can be reduced Limited synchronization overhead Reduces the skew in execution due to unbalanced partitions Reuse of single-machine algorithms Direct reuse of shared memory graph algorithms

10 GoFFish https://github.com/usc-cloud/goffish
Subgraph programming abstraction Gopher Flexibility of reusing well-known shared memory graph abstractions Leverages subgraph concurrency within a multicore machine and distributed scaling using BSP Efficient distributed storage GoFS Write once read many approach Method naming similar to that of Pregel

11 Find max example

12 Find max example

13 Subgraph centric SSSP

14 Pagerank case study Same number of supersteps to converge as the vertex centric approach Alternative is to use blockrank Assumes some websites to be highly interconnected Like subgraphs Calculates pagerank for vertices by treating blocks (subgraphs) independently (1 SS) Ranks each block based on its relative importance (1 SS) Normalizes the vertex pagerank with the block rank to use as initial value before running the classic pagerank (n SS) Costlier first superstep but faster convergence in the last n supersteps

15 Efficiency

16 Execution time skew For the 1st superstep in Pagerank

17 Subgraph centric w/o the abstraction
Louvain community detection Given graph check where there is a “natural division” of vertices Is edge cut realistic enough? Modularity based community detection mc – number of edges inside community c dc – sum of degrees of vertices inside community c M – total number of edges C – set of all communities in graph

18 Louvain sequential algorithm
A greedy modularity maximization approach. Two main steps repeated iteratively while(improvement) { mod = detect-communities() if( (mod – prev_mod) > T) improvement = true else improvement = false prev_mod = mod collapse-graph() }

19 Louvain algorithm (2) Scan through all the nodes in a given order
Nodes adopts its neighbors community joining which gives a maximum +ve increase in modularity This processed repeated iteratively until local maximum modularity is reached Applicable Notes

20 Louvain algorithm (3) New network is built collapsing communities into single nodes Applicable Notes

21 Observations First iteration is the costliest iteration
79.53% of total time on average Graph reduced to much smaller graph after the first iteration First iteration community structures are smaller Small number of vertices

22 Analysis Community graphs 6x times improvement over sequential
No degradation in result quality

23 Parallel Louvain Partition graph in k subgraphs
Run first Louvain iteration (the costliest) in parallel Merge and run the sequential Louvain PMETIS Partitioning Louvainp0 Louvainpn-1 Iteration 1 Iteration 2 to N

24 What’s next? Load balancing Parallel sorting
Importance of partitioning and graph type Parallel sorting Parallel computational geometry Parallel numerical algorithms


Download ppt "Data Structures and Algorithms in Parallel Computing"

Similar presentations


Ads by Google