Fast, Exact Graph Diameter Computation with Vertex Programming Corey Pennycuff and Tim Weninger SIGKDD Workshop on High Performance Graph Mining August 10, 2015 Vertex-Centric Computing for Large Scale Graph Analytics
Dijkstra’s Single Source Shortest Path A C F E D B 0 2 ABCDEFG A G
Medium Graphs 4 million nodes 200 million edges
Bigger Graphs Solution – Hadoop data mappers shuffle and sort reducers result 234 DISK
Graph Diameter HADIReverse Cuthill-McKeeRandom BFS
Bulk Synchronous Parallel (BSP) Created in 1990 by Les Valiant and Bill McColl at Oxford data result Superstep 1 Superstep 2 Superstep 3 Data kept in memory DISK Superstep 0 barrier
Graph Analytics with BSP Require the programmer to “think like a vertex” A C F E D B …
The Vertex Each Vertex Can: Receive messages from previous superstep Modify its value/datum Send messages
BSP Single Source Shortest Path compute(MessageIterator* msgs){ bool changed = false; foreach(msg : msgs){ if(msg < datum){ datum = msg; changed = true; } if(changed) { foreach(edge : GetOutEdgeIterator()){ sendMessageTo(edge.dest, datum + edge.weight) } }else{ voteToHalt(); } A C F E D B G
Dijkstra’s Single Source Shortest Path ABCDEFG A0 Superstep 0 master A C F E D B 0 G
Dijkstra’s Single Source Shortest Path ABCDEFG A0112 Superstep 1 A C F E D B 0 G
Dijkstra’s Single Source Shortest Path Superstep 2 A C F E D B 0 G ABCDEFG A0112
Supersteps-1 = Node Eccenctricity A C F E D B 0 G ABCDEFG A0112
Diameter Measurement A C F E D B G A C F E D B G A C F E D B G A C F E D B G A C F E D B G A C F E D B G A C F E D B G
Limitations Must be synchronous Designed for unweighted graphs
Performance Results ER-Graphs (p=32%)
Performance Results SF-Graphs (k=3)
Performance Results Real World Graphs
Thank you