Da Yan, James Cheng, Yi Lu, Wilfred Ng Presented By: Nafisa Anzum

Da Yan, James Cheng, Yi Lu, Wilfred Ng Presented By: Nafisa Anzum
Blogel: A block-centric Framework for Distributed Computation on Real-world Graphs Da Yan, James Cheng, Yi Lu, Wilfred Ng Presented By: Nafisa Anzum

Overview Introduction Background Proposed Solution System Overview
Applications Conclusion

Distributed Graph Systems
Increasing need to deal with massive graphs Distributed graph computing systems e.g Pregel, GraphLab, Giraph, GPS Popularity of Vertex-Centric Model Think like a vertex philosophy More natural and easier design and implementation

Performance Bottlenecks
Real world large graphs have many different characteristics Skewed degree distribution (e.g. power law graphs, social networks, web graphs) High density (e.g. social networks, mobile phone networks) Large diameter (e.g. road networks, terrain graphs) Creates bottleneck to vertex-centric parallelism Skewed workload distribution Heavy message passing Impractically many rounds of computation

Existing Systems Pregel Vertex placement Giraph++ GraphLab GRACE
Vertex-centric model Efficient, scalable, and fault-tolerant implementation on cluster of machines Suffers from performance bottleneck Vertex placement Minimize number of cross-worker edges Workers hold same number of vertices (approximately) Extensive processing but gain limited GRACE Single machine environment Vertex-centric model with a scheduler Not very expressive, different focus Giraph++ Graph-centric programming model Does not support block-level communication Incurs serialization cost Expensive graph partitioning GraphLab Asynchronous execution Decrease the workload for some algorithm Extra cost due to blocking/unblocking Less expressive than Pregel

Blogel A block-centric graph processing framework
Block is a connected subgraph Message exchanges among blocks. Eliminates the three bottleneck caused by real-world graphs

Hash-Min Algorithm for finding CC
Vertex-Centric Model First Superstep Each vertex v sets min(v) = id(v) and broadcast min(v) to all of its neighbors and votes to halt Each later Supersteps Each vertex received msg from its neighbors Set min* of received id as min(v) and broadcast min* to neighbors At the end, all vertices votes to halt When converges, min(v) = cc(v) Block-Centric Model Set of blocks, each block B has id(B) Vertices in a block is connected Each block maintains a field min(B) and broadcast the smallest block id they have seen When converges, all vertices v with same min(block(v)) belong to same CC

Why Block Centric Model?
Vertex-Centric Model Block-Centric Model Skewed workload among workers Balanced workload Neighbors of high degree vertices are in the same block, no msg passing needed Heavy message passing Neighbors of many vertices are in the same block No need to pass message Many round of computation Reduced computation Messages are propagated in much larger units in blocks Skewed degree distribution graphs High density graphs Large diameter graphs

Performance of Hash-Min

System Overview Implemented in C++ as group of header files
Data is stored in HDFS MPI is used for communication Blogel operates in 3 computing modes B-mode V-mode VB-mode

Blogel Framework Supports 3 types of jobs:
Vertex-centric graph computing (worker : v- worker) Graph partitioning (worker : partitioner) Block-centric graph computing (worker : B-worker)

Partitioners Graph Voronoi Diagram Partitioner GVD:
A undirected unweighted graph G = (V,E) A set of source vertices s1, s2,....,sk ∈ V Partition of V : {VC(s1), VC(s2) ,..., VC(sk)}, where v is in VC(si) only if si is closer to v than other sources Implemented by multi-source BFS Superstep 1: each source s sets block(s) = s and broadcast it to neighbors, for all other vertices v, block(v) is unassigned Superstep i, i>1: if block(v) is unassigned, it assigns a arbitrary source received and broadcast to neighbors, otherwise v votes to halt When converges, we have block(v) = si Total msg exchanged O(|E|)

Partitioners Graph Voronoi Diagram Partitioner
Each vertex v samples itself as a source with probability psamp Multi-source BFS is performed to partition If size of a block is larger than bmax Unassigned every vertex v in that block Increase psamp by factor of f Sample again with increased psamp Partition again Stop conditions A(i)/A(i-1) > γ, γ<1 Psamp > Pmax To prevent from running too many supersteps Halts in superstep δmax, a user specified parameter There still may some vertices unassigned Perform Hash-Min algorithm

Performance of GVD Partitioners

Partitioners 2D Partitioner
For spatial networks, vertices are associated with (x,y) Each v with an additional field (x,y) Vertex-centric job: Samples a subset with psamp and send to master Partitions nx slots by the x-coordinate Partitions ny slots by the y-coordinate Assign vertices to superblocks

Partitioners 2D Partitioner Super block may not be connected
Performs Block-centric job: Runs BFS over their superblocks to break them into connected blocks Assign unique id for each block Each worker sends # of blocks to master Master computes a prefix sum sumj for each worker wj by adding the # of blocks found by the previous workers wk, for all k<j Sends the sumj to each worker wj

Performance of 2D Partitioners

Single-Source Shortest Path
Vertex-centric algorithm Initially, s is active with dist(s) = 0 and for all v ∈ V, dist(v) = ∞, s sends (s, dist(s)) to each u ∈ out neighbor of s. In superstep i (i>1), a vertex v receives messages from its in-neighbors (w, dist(w)). Updates (prev(v), dist(v)) = (w*, dist(w*)), where dist(w*)<dist(v) Broadcast (v, dist(v)) to its out neighbors. Finally v votes to halt Block-centric algorithm Operates in VB mode In each superstep V-compute() is executed, however, vertex v only halts if dist(w*) >= dist(v), otherwise update and remain active Executes B-compute() Each block B collects all it’s active vertices v in a priority queue Q Run Dijkstra’s algorithm on B, taking each vertex of Q and update its neighbors in block B Saves significant amount of computation cost

Performance of Single-Source Shortest Path

Reachability Vertex-centric algorithm Block-centric algorithm
Set tag(s) = 10, tag(t) = 01, for all other vertices v, tag(v) = 00 Superstep 1: s sends its tag(s) to all out-neighbors, t sends it tag(t) to all in-neighbors Superstep i (i>1): vertex compute bitwise OR (tag*) for all messages receives and sets tag(v) = tag* If tag* = 11, it calls terminate() If tag* = 10/01, sends tag* to in/out neighbors Block-centric algorithm Operates in VB mode In each superstep V-compute() is executed B-compute() is called Collects all its active vertices with tag 10(0)1 to a queue Qs(Qt) If 11, B-terminate() is called Otherwise performs a forward BFS using Qs and backward BFS using Qt

Performance of Reachability

Page Rank Pregel’s PageRank algorithm PageRank Loss
Superstep 1: Each vertex v initializes pr(v) = 1/|V|, send out-neighbors the value pr(v)/#of out-neighbors Superstep i (i>1): Each v sums up the received pr values (sum) and compute pr(v) = 0.15/|V| *sum Distribute pr(v) evenly to out-neighbors PageRank Loss Total amount of PageRank value be 1 (15% held evenly by vertices, 85% by propagating along the edges) If a sink node exists, it does not propagate the value, hence lost

Page Rank Vertex-centric algorithm Block-centric algorithm
An aggregator based solution In compute(), if v’s out-neighbor is 0, it aggregates agg = ∑pr(v) then updates, pr(v) = 0.15/|V| +0.85*(sum = agg/|V|) Stop condition: |pri(v) - pri1(v)| < =є/|V| Block-centric algorithm All vertices with same host name in a block First job computes in B-mode In block_init(), B computes the local PageRank each each vertex v, lpr(v) Block B construct Neighbor of B from neighbor of v in V(B) B-compute() computes PageRank for each block B, br(B) Block rank in distributed to out-neighbors Second job operates in V-mode Initializes pr(v) = lpr(v).br(block(v)) Performs standard PageRank on G

Performance of PageRank

Conclusion A block-centric framework, Blogel, is presented
Blogel solves the performance bottlenecks faced by vertex-centric models Blogel allows to execute vertex-centric algorithms, block-centric algorithms, and hybrid algorithms Blogel is significantly faster than existing graph computing systems

Da Yan, James Cheng, Yi Lu, Wilfred Ng Presented By: Nafisa Anzum

Similar presentations

Presentation on theme: "Da Yan, James Cheng, Yi Lu, Wilfred Ng Presented By: Nafisa Anzum"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Da Yan, James Cheng, Yi Lu, Wilfred Ng Presented By: Nafisa Anzum

Similar presentations

Presentation on theme: "Da Yan, James Cheng, Yi Lu, Wilfred Ng Presented By: Nafisa Anzum"— Presentation transcript:

Similar presentations

About project

Feedback