Data Structures and Algorithms in Parallel Computing Lecture 6
BSP Processors + network + synchronization Superstep Concurrent parallel computation Message exchanges between processors Barrier synchronization All processors reaching this point wait for the rest
Supersteps A BSP algorithm is a sequence of supersteps Computation superstep Many small steps Example: floating point operations (addition, subtraction, etc.) Communication superstep Communication operations each transmitting a data word Example: transfer a real number between 2 processors In theory we distinguish between the 2 types of supersteps In practice we assume a single superstep
Vertex centric model Simple distributed programming model Algorithms are expressed by “thinking like a vertex” A vertex contains information about itself and the outgoing edges Computation is expressed at vertex level Vertex execution take place in parallel and are interleaved with synchronized message exchanges
Disadvantages Costly messaging due to vertex logic Porting shared memory algorithms to vertex centric ones may not be trivial Decoupled programming logic from data layout on disk IO penalties
Thinking like a graph Subgraph centric abstraction Express computation at subgraph instead of vertex level Information flows freely inside the subgraph Messages are sent only across subgraphs
Subgraph centric model Graph is k-partitioned Subgraph is a connected component or a weakly connected component for directed graphs Two subgraphs do not share vertices A partition can store one or more subgraphs Partitions are distributed Subgraph is a meta-vertex Remote edges connect them together Each subgraph is an independent unit of computation
Subgraph centric programming User logic operates on a sub graph as an independent unit of computation Execution follows a BSP model Resource allocation Single Partition → Single Machine Single Sub-graph → Single CPU Data loading Complete partition is loaded on to the memory before computation Sub graph tasks keeps sub-graphs in memory within the task scope Sub-graph-Task 1 Sub-graph-Task 2 Sub-graph-Task 3 Sub-graph – Weakly connected component identified within a graph partition. So if two sub graphs in the same partition have a connected edge they become one by definition
Advantages Messages exchanged Number of supersteps Direct access to entire subgraph Messages sent only across partitions Subgraphs are disconnected Pregel has aggregators but they operate after messages are sent Number of supersteps Depending on algorithm the required supersteps can be reduced Limited synchronization overhead Reduces the skew in execution due to unbalanced partitions Reuse of single-machine algorithms Direct reuse of shared memory graph algorithms
GoFFish https://github.com/usc-cloud/goffish Subgraph programming abstraction Gopher Flexibility of reusing well-known shared memory graph abstractions Leverages subgraph concurrency within a multicore machine and distributed scaling using BSP Efficient distributed storage GoFS Write once read many approach Method naming similar to that of Pregel
Find max example
Find max example
Subgraph centric SSSP
Pagerank case study Same number of supersteps to converge as the vertex centric approach Alternative is to use blockrank Assumes some websites to be highly interconnected Like subgraphs Calculates pagerank for vertices by treating blocks (subgraphs) independently (1 SS) Ranks each block based on its relative importance (1 SS) Normalizes the vertex pagerank with the block rank to use as initial value before running the classic pagerank (n SS) Costlier first superstep but faster convergence in the last n supersteps
Efficiency
Execution time skew For the 1st superstep in Pagerank
Subgraph centric w/o the abstraction Louvain community detection Given graph check where there is a “natural division” of vertices Is edge cut realistic enough? Modularity based community detection mc – number of edges inside community c dc – sum of degrees of vertices inside community c M – total number of edges C – set of all communities in graph
Louvain sequential algorithm A greedy modularity maximization approach. Two main steps repeated iteratively while(improvement) { mod = detect-communities() if( (mod – prev_mod) > T) improvement = true else improvement = false prev_mod = mod collapse-graph() }
Louvain algorithm (2) Scan through all the nodes in a given order Nodes adopts its neighbors community joining which gives a maximum +ve increase in modularity This processed repeated iteratively until local maximum modularity is reached Applicable Notes
Louvain algorithm (3) New network is built collapsing communities into single nodes Applicable Notes
Observations First iteration is the costliest iteration 79.53% of total time on average Graph reduced to much smaller graph after the first iteration First iteration community structures are smaller Small number of vertices
Analysis Community graphs 6x times improvement over sequential No degradation in result quality
Parallel Louvain Partition graph in k subgraphs Run first Louvain iteration (the costliest) in parallel Merge and run the sequential Louvain PMETIS Partitioning Louvainp0 Louvainpn-1 … Iteration 1 Iteration 2 to N
What’s next? Load balancing Parallel sorting Importance of partitioning and graph type Parallel sorting Parallel computational geometry Parallel numerical algorithms …