Balanced Graph Edge Partition ACM KDD 2014 Florian Bourse ENS Marc Lelarge INRIA-ENS Milan Vojnovic Microsoft Research
Balanced Graph Partition 2
Different Variants VP EPA u u u u u u EP VPA Vertex partition Edge partition No Aggregation Aggregation traditional ? ? ? PowerGraph [OSDI 2012] 3
Questions Performance benefits of using balanced edge partition as opposed to using more traditional balanced vertex partition ? Practical algorithms for balanced edge partition w/o aggregation and their theoretical guarantees ? Streaming heuristics for balanced edge partition ? 4
Costs: Cuts and Loads 5 Master vertex assignment
Expected Costs of Random Assignments 6
Random Assignment Comparison 7
Approximation Guarantees 8
Approximation Guarantees (cont’d) 9
Streaming Heuristics Online assignment of vertices or edges as they are observed in an input stream Irrevocable assignments Reassignments are expensive in web-scale systems (consistency of distributed state) Use local graph knowledge (neighbourhood sets) Scalable One pass through the vertices or edges Previously proposed streaming heuristic: PowerGraph [OSDI 2012] 10
PowerGraph Streaming Heuristic Prioritizes assignment of edges to clusters that already contain its end vertices: prone to large load imbalance Place e to Place e to a least loaded cluster
Greedy: Least Incremental Cost 12
Experimental Evaluation 13
Performance of Random Assignment Graph: Amazon 14
Streaming Heuristics Graph: Amazon 15
Performance of Random Assignment (cont’d) Graph: Youtube 16
Concluding Remarks 17
Streaming Heuristics 18