Roman Manevich Rashid Kaleem Keshav Pingali University of Texas at Austin Synthesizing Concurrent Graph Data Structures: a Case Study
vision 2 Problem How to utilize parallel hardware Programming model for parallel applications High-level language for parallelism Program in terms of sequential semantics Choose tuning parameters for better performance Decouple semantics from implementation Compiler synthesizes parallel code Correctness guarantees Avoids usual pitfalls: deadlocks, data races, etc. For any value of tuning parameters
this talk 3 Problem How to utilize parallel hardware Programming model for parallel applications High-level language for parallelism Program in terms of sequential semantics Choose tuning parameters for better performance Decouple semantics from implementation Compiler synthesizes parallel code Correctness guarantees Avoids usual pitfalls: deadlocks, data races, etc. For any value of tuning parameters Parallelizing graph algorithms Implementing concurrent graph data structures Relational algebra Relation decomposition and tiling Autograph generates Java code Linearizability Speculation support: abstract locks + undos
context 4 Graph algorithms are ubiquitous Computational biology Social NetworksComputer Graphics
organization 5 Speculative parallelism background Speculative parallelization via Galois Data structures for speculative parallelism Autograph Specifying relational data structures Optimizations Empirical evaluation Outperform library data structures up to 2x
minimum spanning tree problem 6 cd ab ef g
7 cd ab ef g
Boruvka’s algorithm 8 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node cd ab ef g d a,c b ef g lt
parallelism in Boruvka 9 cd ab ef g Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node
non-conflicting iterations 10 cd ab Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node ef g 4 6
non-conflicting iterations 11 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node d a,c b e f,g 6
conflicting iterations 12 cd ab ef g Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node
Amorphous data-parallelism 13 Algorithm = repeated application of operator to graph Active node: Node where computation is needed Activity: Application of operator to active node Neighborhood: Sub-graph read/written to perform activity Unordered algorithms: Active nodes can be processed in any order Parallel execution of activities, subject to neighborhood constraints Neighborhoods unknown at compile time Use speculation i1i1 i2i2 i3i3
optimistic parallelization in Galois 14 Programming model Client code has sequential semantics Library of concurrent data structures Parallel execution model Thread-level speculation (TLS) Activities executed speculatively Conflict detection Each node has associated exclusive lock Graph operations acquire locks on accessed nodes Lock owned by another thread conflict iteration rollback i1i1 i2i2 i3i3
concurrent data structure contract 15 Linearizability [Herlihy & Wing TOPLAS’90] Method calls should appear to execute atomically Synchronization w.r.t concrete data structure Support speculation [Pingali et al. PLDI’07] [Herlihy & Koskinen PPoPP’08] Methods acquire abstract locks Synchronization w.r.t abstract data type Methods should register undo actions for rollback (Data-race freedom) (Deadlock freedom) (Non-blocking methods)
library graph data structure 16 thread id a b next dummy next f c dummy next d dummy e next dummy 0123 next Boruvka only removes nodes in_flag=1 set of nodes:
customized graph data structure 17 a b f c d e next in_flag=1 remove(d) set of nodes:
customized graph data structure 18 a b f c d e next in_flag=1 in_flag=0 in_flag=1 remove(d) set of nodes:
organization 19 Speculative parallelism background Speculative parallelization via Galois Data structures for speculative parallelism Autograph Specifying relational data structures Optimizations Empirical evaluation Outperform library data structures up to 2x
high-level spec at a glance 20 Structure nodes : rel(node) edges : rel(src, dst, wt) FD {src, dst} → {wt} FK src → node FK dst → node Decomposition Methods edgeExists : contains(src, dst) removeNode : remove(node) findMin : map(src, out dst, out wt) { if (wt < minWeight) { lt = dst; minWeight = wt; }... other methods... nodes:Set node edges:List edgesOut:Map src succ:Map dstwt edgesIn:Map dstpred:Set src Tiling edgestile ListTile nodestile AttArrLinkedSet edgesOuttile AttMap edgesIn tile AttMap succ tile DualArrayMap pred tile ArraySet semanticsimplementation
specifying a graph for Boruvka 21 Structure nodes : rel(node) edges : rel(src, dst, wt) FD {src, dst} → {wt} FK src → node FK dst → node
relational representation of graph 22 Structure nodes : rel(node) edges : rel(src, dst, wt) FD {src, dst} → {wt} FK src → node FK dst → node ab5 ac2 bd4 cd7 de1 ef6 ba5 ca2 db4 dc7 ed1 fe6 srcdstwtnode a b c d e f nodesedges cd ab ef 6
specifying methods 23 Structure nodes : rel(node) edges : rel(src, dst, wt) FD {src, dst} → {wt} FK src → node FK dst → node Methods edgeExists : contains(src, dst) removeNode : remove(node) findMin : map(src, out dst, out wt) { if (wt < minWeight) { lt = dst; minWeight = wt; }... other methods... ab5 ac2 bd4 cd7 de1 ef6 ba5 ca2 db4 dc7 ed1 fe6 srcdstwtnode a b c d e f nodesedges can we implement efficiently?
decomposing relations 24 Structure nodes : rel(node) edges : rel(src, dst, wt) FD {src, dst} → {wt} FK src → node FK dst → node Decomposition Methods edgeExists : contains(src, dst) removeNode : remove(node) findMin : map(src, out dst, out wt) { if (wt < minWeight) { lt = dst; minWeight = wt; }... other methods... nodes:Set node edges:List edgesOut:Map src succ:Map dstwt edgesIn:Map dstpred:Set src
decomposed representation 25 Decomposition nodes : Set(node) edges : List( edgesOut : Map(src, succ : Map(dst, wt)) edgesIn : Map(dst, pred : Set(src)) ) Methods edgeExists : contains(src, dst) removeNode : remove(node) findMin : map(src, out dst, out wt) { if (wt < minWeight) { lt = dst; minWeight = wt; }... other methods... a b c d e f srcsuccnode a b c d e f b5 c2 d4 a5 d7 a2 e1 b4 c7 f6 d1 dstwt a b c d e f dstpred b c src a d a d b c e d f e edgesOutedgesIn e6 nodesedges
findMin(a) 26 a b c d e f srcsuccnode a b c d e f b5 c2 d4 a5 d7 a2 e1 b4 c7 f6 d1 dstwt a b c d e f dstpred b c src a d a d b c e d f e edgesOutedgesIn e6 nodesedges Decomposition nodes : Set(node) edges : List( edgesOut : Map(src, succ : Map(dst, wt)) edgesIn : Map(dst, pred : Set(src)) ) Methods edgeExists : contains(src, dst) removeNode : remove(node) findMin : map(src, out dst, out wt) { if (wt < minWeight) { lt = dst; minWeight = wt; }... other methods...
findMin(a) 27 a b c d e f srcsuccnode a b c d e f b5 c2 d4 a5 d7 a2 e1 b4 c7 f6 d1 dstwt a b c d e f dstpred b c src a d a d b c e d f e edgesOutedgesIn e6 nodesedges Decomposition nodes : Set(node) edges : List( edgesOut : Map(src, succ : Map(dst, wt)) edgesIn : Map(dst, pred : Set(src)) ) Methods edgeExists : contains(src, dst) removeNode : remove(node) findMin : map(src, out dst, out wt) { if (wt < minWeight) { lt = dst; minWeight = wt; }... other methods...
findMin(a) 28 a b c d e f srcsuccnode a b c d e f b5 c2 d4 a5 d7 a2 e1 b4 c7 f6 d1 dstwt a b c d e f dstpred b c src a d a d b c e d f e edgesOutedgesIn e6 nodesedges Decomposition nodes : Set(node) edges : List( edgesOut : Map(src, succ : Map(dst, wt)) edgesIn : Map(dst, pred : Set(src)) ) Methods edgeExists : contains(src, dst) removeNode : remove(node) findMin : map(src, out dst, out wt) { if (wt < minWeight) { lt = dst; minWeight = wt; }... other methods...
findMin(a) 29 a b c d e f srcsuccnode a b c d e f b5 c2 d4 a5 d7 a2 e1 b4 c7 f6 d1 dstwt a b c d e f dstpred b c src a d a d b c e d f e edgesOutedgesIn e6 nodesedges Decomposition nodes : Set(node) edges : List( edgesOut : Map(src, succ : Map(dst, wt)) edgesIn : Map(dst, pred : Set(src)) ) Methods edgeExists : contains(src, dst) removeNode : remove(node) findMin : map(src, out dst, out wt) { if (wt < minWeight) { lt = dst; minWeight = wt; }... other methods...
findMin(a) 30 a b c d e f srcsuccnode a b c d e f b5 c2 d4 a5 d7 a2 e1 b4 c7 f6 d1 dstwt a b c d e f dstpred b c src a d a d b c e d f e edgesOutedgesIn e6 nodesedges Decomposition nodes : Set(node) edges : List( edgesOut : Map(src, succ : Map(dst, wt)) edgesIn : Map(dst, pred : Set(src)) ) Methods edgeExists : contains(src, dst) removeNode : remove(node) findMin : map(src, out dst, out wt) { if (wt < minWeight) { lt = dst; minWeight = wt; }... other methods...
findMin(a): abstract locks 31 a b c d e f srcsuccnode a b c d e f b5 c2 d4 a5 d7 a2 e1 b4 c7 f6 d1 dstwt a b c d e f dstpred b c src a d a d b c e d f e edgesOutedgesIn e6 nodesedges Decomposition nodes : Set(node) edges : List( edgesOut : Map(src, succ : Map(dst, wt)) edgesIn : Map(dst, pred : Set(src)) ) Methods edgeExists : contains(src, dst) removeNode : remove(node) findMin : map(src, out dst, out wt) { if (wt < minWeight) { lt = dst; minWeight = wt; }... other methods...
“tiles”: concretizing sub-relations 32 Structure nodes : rel(node) edges : rel(src, dst, wt) FD {src, dst} → {wt} FK src → node FK dst → node Decomposition Methods edgeExists : contains(src, dst) removeNode : remove(node) findMin : map(src, out dst, out wt) { if (wt < minWeight) { lt = dst; minWeight = wt; }... other methods... nodes:Set node edges:List edgesOut:Map src succ:Map dstwt edgesIn:Map dstpred:Set src Tiling edgestile ListTile nodestile AttArrLinkedSet edgesOuttile AttMap edgesIn tile AttMap succ tile DualArrayMap pred tile ArraySet
nodes tile AttArrLinkedSet 33 a b c d e f nodesedges thread id a b next dummy next f c dummy next d dummy e next dummy 0123 next in_flag=1
nodes tile AttLinkedSet 34 a b c d e f nodesedges a b f c d e next in_flag=1
optimizations 35 Customizing tiles Customize nodes set for concurrent deletions Customize successor/predecessor maps for primitive types Customize map operations Inlining Selecting relevant attributes Handling auxiliary state Loop fusion for read-only operations
organization 36 Speculative parallelism background Speculative parallelization of graph algorithms Data structures for speculative parallelism Autograph Specifying relational data structures Optimizations Empirical evaluation Related work + conclusion
experiments 37 Specified graph data structures Used Autograph to generates Java code Compared Generated data structures Library data structures (from Galois) Hand-written parallel benchmarks Show relative effect of different optimizations
Boruvka: running times comparison 38
Boruvka: running times comparison 39
Boruvka: effect of optimizations 40
Delaunay mesh refinement: times 41
Single-source shortest path: times 42
writing graph applications yesterday 43 Galois Runtime Graph Application Concurrent Data Structure Library Morph Graph LC Graph Set … Map Expert programmer Concurrency Expert Joe programmer + Correct ? Efficient (non-customizable)
writing graph applications today 44 Galois Runtime Data structure specification Autograph Graph Application Joe programmer Joe++ programmer + Correct + Customizable + Speedup over library data structures Data structure implementation
Grazie! Download Galois from Expect Autograph in next Galois release