PowerGraph: Distributed Graph- Parallel Computation on Natural Graphs Joseph E. Gonzalez, Yucheng Low, Haijie Gu, and Danny Bickson, Carnegie Mellon University; Carlos Guestrin, University of Washington
Current State 1.Many MLDM problems represented as graphs 2.Graph structured computation is important 3. Graphs are big 4. Current systems provide graph parallel computation – Pregel – GraphLab
Solution 1: Pregel Vertex Program
Solution 2: GraphLab Shared Distributed Graph
Problem Many graphs have skewed degree distribution Issue: Natural Graphs Machine 1 Machine 3
What is a Natural Graph
GraphLab and Pregel on Natural Graphs Work Imbalance Random Partitioning Storage is linear in degree Expensive Communication
Solution PregelPowerGraph Edge CutVertex Cut Replicate EdgesReplicate Vertices Parallelize Vertex Program across all machines with that vertex
Balanced P-way Vertex Cut V V V Idea: Distribute edges while minimizing vertex replications
Distributing Edges: Random Idea: Randomly Assign Edges to Machines - Why is this better than Pregel? Theorem: For a Given edge-cut with g ghosts, any vertex cut along the same partition boundary has fewer than g mirrors.
Distributing Edges: Greedy -Further minimize replication of vertices -Idea: Place next edge that minimizes vertex replication -Greedy Approaches -Coordinated -Oblivious
Edge Distribution
Implementations Synchronous (Pregel) Asynchronous Asynchronous and Serializable (GraphLab)
Discussion: Edge Placement and Run Time
Discussion: GAS Decomposition Gather: collect information about surrounding vertices Apply: Vertex updates value based on gathered data Scatter: Vertex shares its new value with neighbors
What About Alpha? PowerGraph is a solution to Natural Graphs Can we do better if alpha is always around 2?
Fully Characterizing Natural Graphs Conclusions: -Out degree grows overtime, changing the value of alpha -Vertex diameters often decrease as a graph grows What does this mean when graphs are constantly changing in PowerGraph?
Takeaways Vertex Cut implementation allows for greater parallelization of vertex programs and reduced replication of mirrors GAS Decomposition is not fundamental to PowerGraph’s Implementation