Presentation is loading. Please wait.

Presentation is loading. Please wait.

Program Analysis and Synthesis of Parallel Systems Roman ManevichBen-Gurion University.

Similar presentations


Presentation on theme: "Program Analysis and Synthesis of Parallel Systems Roman ManevichBen-Gurion University."— Presentation transcript:

1 Program Analysis and Synthesis of Parallel Systems Roman ManevichBen-Gurion University

2 Three papers 1. A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11] 2. Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12] 3. Parameterized Verification of Transactional Memories [PLDI’10]

3 What’s the connection? A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11] Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12] Parameterized Verification of Transactional Memories [PLDI’10] From analysis to language design Creates opportunities for more optimizations. Requires other analyses Similarities between abstract domains

4 What’s the connection? A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11] Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12] Parameterized Verification of Transactional Memories [PLDI’10] From analysis to language design Creates opportunities for more optimizations. Requires other analyses Similarities between abstract domains

5 A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of Computer Science, The University of Texas at Austin 2: Institute for Computational Engineering and Sciences, The University of Texas at Austin

6 Motivation 6 Graph algorithms are ubiquitous Goal: Compiler analysis for optimization of parallel graph algorithms Computational biology Social Networks Computer Graphics

7 Minimum Spanning Tree Problem 7 cd ab ef g 24 6 5 3 7 4 1

8 8 cd ab ef g 24 6 5 3 7 4 1

9 Boruvka’s Minimum Spanning Tree Algorithm 9 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node cd ab ef g 24 6 5 3 7 4 1 d a,c b ef g 4 6 3 4 1 7 lt

10 Parallelism in Boruvka 10 cd ab ef g 24 6 5 3 7 4 1 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node

11 Non-conflicting iterations 11 cd ab 2 5 3 7 4 1 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node ef g 4 6

12 Non-conflicting iterations 12 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node d a,c b 3 4 1 7 e f,g 6

13 Conflicting iterations 13 cd ab ef g 24 6 5 3 7 4 1 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node

14 Algorithm = repeated application of operator to graph – Active node: Node where computation is needed – Activity: Application of operator to active node – Neighborhood: Sub-graph read/written to perform activity – Unordered algorithms: Active nodes can be processed in any order Amorphous data-parallelism – Parallel execution of activities, subject to neighborhood constraints Neighborhoods are functions of runtime values – Parallelism cannot be uncovered at compile time in general Parallelism in Boruvka i1i1 i2i2 i3i3

15 Optimistic parallelization in Galois Programming model – Client code has sequential semantics – Library of concurrent data structures Parallel execution model – Thread-level speculation (TLS) – Activities executed speculatively Conflict detection – Each node/edge has associated exclusive lock – Graph operations acquire locks on read/written nodes/edges – Lock owned by another thread  conflict  iteration rolled back – All locks released at the end Two main overheads – Locking – Undo actions i1i1 i2i2 i3i3

16 Generic optimization structure Program Annotated Program Program Analyzer Program Transformer Optimized Program

17 Overheads (I): locking Optimizations – Redundant locking elimination – Lock removal for iteration private data – Lock removal for lock domination ACQ(P): set of definitely acquired locks per program point P Given method call M at P: Locks(M)  ACQ(P)  Redundant Locking

18 Overheads (II): undo actions 18 Lockset Grows Lockset Stable Failsafe … foreach (Node a : wl) { … … } foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } Program point P is failsafe if:  Q : Reaches(P,Q)  Locks(Q)  ACQ(P)

19 GSet wl = new GSet (); wl.addAll(g.getNodes()); GBag mst = new GBag (); foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } Lockset analysis Redundant Locking Locks(M)  ACQ(P) Undo elimination  Q : Reaches(P,Q)  Locks(Q)  ACQ(P) Need to compute ACQ(P) : Runtime overhead

20 The optimization technically Each graph method m(arg 1,…,arg k, flag) contains optimization level flag flag=LOCK– acquire locks flag=UNDO– log undo (backup) data flag=LOCK_UNOD– (default) acquire locks and log undo flag=NONE– no extra work Example: Edge e = g.getEdge(lt, n, NONE)

21 Analysis challenges The usual suspects: – Unbounded Memory  Undecidability – Aliasing, Destructive updates Specific challenges: – Complex ADTs: unstructured graphs – Heap objects are locked – Adapt abstraction to ADTs We use Abstract Interpretation [CC’77] – Balance precision and realistic performance

22 Shape analysis overview HashMap-Graph Tree-based Set ………… Graph { @rep nodes @rep edges … } Graph Spec Concrete ADT Implementations in Galois library Predicate Discovery Shape Analysis Boruvka.java Optimized Boruvka.java Set { @rep cont … } Set Spec ADT Specifications

23 ADT specification Graph { @rep set nodes @rep set edges Set neighbors(Node n); } Graph Spec... Set S1 = g.neighbors(n);... Boruvka.java Abstract ADT state by virtual set fields @locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src) @op( nghbrs = n.rev(src).dst + n.rev(dst).src, ret = new Set >(cont=nghbrs) ) Assumption: Implementation satisfies Spec

24 Graph { @rep set nodes @rep set edges @locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src) @op( nghbrs = n.rev(src).dst + n.rev(dst).src, ret = new Set >(cont=nghbrs) ) Set neighbors(Node n); } Modeling ADTs c ab Graph Spec dst src dst src

25 Modeling ADTs c ab nodes edges Abstract State cont ret nghbrs Graph Spec dst src dst src Graph { @rep set nodes @rep set edges @locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src) @op( nghbrs = n.rev(src).dst + n.rev(dst).src, ret = new Set >(cont=nghbrs) ) Set neighbors(Node n); }

26 Abstraction scheme cont S1S2 L(S1.cont) L(S2.cont) (S1 ≠ S2) ∧ L(S1.cont) ∧ L(S2.cont) Parameterized by set of LockPaths: L(Path)   o. o ∊ Path  Locked(o) – Tracks subset of must-be-locked objects Abstract domain elements have the form: Aliasing-configs  2 LockPaths  … 

27   Joining abstract states 27 Aliasing is crucial for precision May-be-locked does not enable our optimizations #Aliasing-configs : small constant (  6)

28 lt GSet wl = new GSet (); wl.addAll(g.getNodes()); GBag mst = new GBag (); foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } Example invariant in Boruvka 28 The immediate neighbors of a and lt are locked a ( a ≠ lt ) ∧ L(a) ∧ L(a.rev(src)) ∧ L(a.rev(dst)) ∧ L(a.rev(src).dst) ∧ L(a.rev(dst).src) ∧ L(lt) ∧ L(lt.rev(dst)) ∧ L(lt.rev(src)) ∧ L(lt.rev(dst).src) ∧ L(lt.rev(src).dst) …..

29 Heuristics for finding LockPaths Hierarchy Summarization (HS) – x.( fld )* – Type hierarchy graph acyclic  bounded number of paths – Preflow-Push: L( S.cont) ∧ L(S.cont.nd) Nodes in set S and their data are locked Set S Node NodeData cont nd

30 Footprint graph heuristic Footprint Graphs (FG)[Calcagno et al. SAS’07] – All acyclic paths from arguments of ADT method to locked objects – x.( fld | rev(fld) )* – Delaunay Mesh Refinement: L(S.cont) ∧ L(S.cont.rev(src)) ∧ L(S.cont.rev(dst)) ∧ L(S.cont.rev(src).dst) ∧ L(S.cont.rev(dst).src) – Nodes in set S and all of their immediate neighbors are locked Composition of HS, FG – Preflow-Push: L(a.rev(src).ed) 30 FG HS

31 Experimental evaluation Implement on top of TVLA – Encode abstraction by 3-Valued Shape Analysis [SRW TOPLAS’02] Evaluation on 4 Lonestar Java benchmarks Inferred all available optimizations # abstract states practically linear in program size BenchmarkAnalysis Time (sec) Boruvka MST6 Preflow-Push Maxflow7 Survey Propagation12 Delaunay Mesh Refinement16

32 Impact of optimizations for 8 threads 8-core Intel Xeon @ 3.00 GHz

33 Note 1 How to map abstract domain presented so far to TVLA? – Example invariant: (x≠y  L(y.nd))  (x=y  L(x.nd)) – Unary abstraction predicate x(v) for pointer x – Unary non-abstraction predicate L[x.p] for pointer x and path p – Use partial join – Resulting abstraction similar to the one shown

34 Note 2 How to come up with abstraction for similar problems? 1.Start by constructing a manual proof Hoare Logic 2.Examine resulting invariants and generalize into a language of formulas May need to be further specialized for a given program – interesting problem (machine learning/refinement) – How to get sound transformers?

35 Note 3 How did we avoid considering all interleavings? Proved non-interference side theorem

36 Elixir : A System for Synthesizing Concurrent Graph Programs Dimitrios Prountzos 1 Roman Manevich 2 Keshav Pingali 1 1. The University of Texas at Austin 2. Ben-Gurion University of the Negev

37 Goal Allow programmer to easily implement correct and efficient parallel graph algorithms Graph algorithms are ubiquitous Social network analysis, Computer graphics, Machine learning, … Difficult to parallelize due to their irregular nature Best algorithm and implementation usually – Platform dependent – Input dependent Need to easily experiment with different solutions Focus: Fixed graph structure Only change labels on nodes and edges Each activity touches a fixed number of nodes

38 Problem Formulation – Compute shortest distance from source node S to every other node Many algorithms – Bellman-Ford (1957) – Dijkstra (1959) – Chaotic relaxation (Miranker 1969) – Delta-stepping (Meyer et al. 1998) Common structure – Each node has label dist with known shortest distance from S Key operation – relax-edge(u,v) Example: Single-Source Shortest-Path 2 5 1 7 A A B B C C D D E E F F G G S S 3 4 2 2 1 9 12 2 A A C C 3 if dist(A) + W AC < dist(C) dist(C) = dist(A) + W AC

39 Scheduling of relaxations: Use priority queue of nodes, ordered by label dist Iterate over nodes u in priority order On each step: relax all neighbors v of u – Apply relax-edge to all (u,v) Dijkstra’s algorithm 2 5 1 7 A A B B C C D D E E F F G G S S 3 4 2 2 1 9 7 5 3 6

40 Chaotic relaxation Scheduling of relaxations: Use unordered set of edges Iterate over edges (u,v) in any order On each step: – Apply relax-edge to edge (u,v) 40 2 5 1 7 A A B B C C D D E E F F G G S S 3 4 2 2 1 9 5 12 (S,A) (B,C) (C,D) (C,E)

41 Insights behind Elixir What should be done How it should be done Unordered/Ordered algorithms Operator Delta : activity Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Identify new activities Static Schedule Static Schedule Dynamic Schedule “TAO of parallelism” PLDI 2011

42 Insights behind Elixir Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Identify new activities Static Schedule Static Schedule Dynamic Schedule Dijkstra-style Algorithm q = new PrQueue q.enqueue(SRC) while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) }

43 Contributions Language – Operators/Schedule separation – Allows exploration of implementation space Operator Delta Inference – Precise Delta required for efficient fixpoint computations Automatic Parallelization – Inserts synchronization to atomically execute operators – Avoids data-races / deadlocks – Specializes parallelization based on scheduling constraints Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Static Schedule Dynamic Schedule Synchronization

44 SSSP in Elixir Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Graph type Operator Fixpoint Statement 44

45 Operators Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Redex pattern Guard Update b b a a if bd > ad + w ad w bd b b a a ad w ad+w Cautious by construction – easy to generalize

46 Fixpoint statement Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Apply operator until fixpoint Scheduling expression 46

47 Scheduling examples Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Locality enhanced Label-correcting group b ≫ unroll 2 ≫ approx metric ad Locality enhanced Label-correcting group b ≫ unroll 2 ≫ approx metric ad Dijkstra-style metric ad ≫ group b Dijkstra-style metric ad ≫ group b q = new PrQueue q.enqueue(SRC) while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) }

48 Operator Delta Inference Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Static Schedule Static Schedule Dynamic Schedule

49 Identifying the delta of an operator b b a a relax 1 ? ?

50 Delta Inference Example b b a a SMT Solver assume (da + w 1 < db) assume ¬(dc + w 2 < db) db_post = da + w 1 assert ¬(dc + w 2 < db_post) Query Program relax 1 c c w2w2 w1w1 relax 2 (c,b) does not become active

51 assume (da + w 1 < db) assume ¬(db + w 2 < dc) db_post = da + w 1 assert ¬(db_post + w 2 < dc) Query Program Delta inference example – active SMT Solver b b a a relax 1 c c w1w1 relax 2 w2w2 Apply relax on all outgoing edges (b,c) such that: dc > db +w 2 and c ≄ a Apply relax on all outgoing edges (b,c) such that: dc > db +w 2 and c ≄ a

52 Influence patterns b=c a a d d b b a=c d d a=d c c b b b=d a=c b=c a=d b=d a a c c

53 System architecture Elixir Galois/OpenMP Parallel Runtime Algorithm Spec Parallel Thread-Pool Graph Implementations Worklist Implementations Synthesize code Insert synchronization C++ Program

54 Experiments Explored Dimensions GroupingStatically group multiple instances of operator UnrollingStatically unroll operator applications by factor K Dynamic SchedulerChoose different policy/implementation for the dynamic worklist... Compare against hand-written parallel implementations

55 SSSP results 24 core Intel Xeon @ 2 GHz USA Florida Road Network (1 M nodes, 2.7 M Edges) Group + Unroll improve locality Implementation Variant

56 Breadth-First search results Scale-Free Graph 1 M nodes, 8 M edges USA road network 24 M nodes, 58 M edges

57 Conclusion Graph algorithm = Operators + Schedule – Elixir language : imperative operators + declarative schedule Allows exploring implementation space Automated reasoning for efficiently computing fixpoints Correct-by-construction parallelization Performance competitive with hand- parallelized code

58 Parameterized Verification of Software Transactional Memories Michael EmmiRupak Majumdar Roman Manevich

59 Motivation Transactional memories [Herlihy ‘93] – Programmer writes code w. coarse-grained atomic blocks – Transaction manager takes care of conflicts providing illusion of sequential execution Strict serializability – correctness criterion – Formalizes “illusion of sequential execution” Parameterized verification – Formal proof for given implementation – For every number of threads – For every number of memory objects – For every number and length of transactions 59

60 STM terminology Statements: reads, writes, commit, abort Transaction: reads and writes of variables followed by commit (committing transaction) or abort (aborting transaction) Word: interleaved sequence of transactions of different threads Conflict: two statements conflict if – One is a read of variable X and other is a commit of a transaction that writes to X – Both are commits of transactions that write to X 60

61 Safety property: strict serializability There is a serialization for the committing threads such that order of conflicts is preserved Order of non-overlapping transactions remains the same 61

62 Safety property: strict serializability There is a serialization for the committing threads such that order of conflicts is preserved Order of non-overlapping transactions remains the same Example word: (rd X t1), (rd Y t2), (wr X t2), (commit t2), (commit t1) => Can be serialized to : (rd X t1), (commit t1), (rd Y t2), (wr X t2), (commit t2) conflict 62

63 Main results First automatic verification of strict serializability for transactional memories – TPL, DSTM, TL2 New proof technique: – Template-based invisible invariant generation – Abstract checking algorithm to check inductive invariants 63 Challenging – requires reasoning on both universal and existential properties

64 Outline Strict serializability verification approach Automating the proof Experiments Conclusion Related work 64

65 Proof roadmap 1 Goal: prove model M is strictly serializable 1.Given a strict-serializability reference model RSS reduce checking strict-serializability to checking that M refines RSS 2.Reduce refinement to checking safety – Safety property SIM: whenever M can execute a statement so can RSS – Check SIM on product system M  RSS 65

66 Proof roadmap 2 3.Model STMs M and RSS in first-order logic TM models use set data structures and typestate bits 4.Check safety by generating strong enough candidate inductive invariant and checking inductiveness – Use observations on structure of transactional memories – Use sound first-order reasoning 66

67 Reference strict serializability model Guerraoui, Singh, Henzinger, Jobstmann [PLDI’08] RSS : Most liberal specification of strictly serializable system – Allows largest language of strictly-serializable executions M is strictly serializable iff every word of M is also a word of RSS – Language(M)  Language(RSS) – M refines RSS 67

68 Modeling transactional memories M n,k =(predicates,actions) – Predicate : ranked relation symbol p(t), q(t,v),… – Binary predicates used for sets so instead of rs(t,v) I’ll write v  rs(t) – Action : a(t,v) = if pre(a) then p’(v)=…, q’(u,v)=… Universe = set of k thread individuals and n memory individuals State S = a valuation to the predicates 68

69 Reference model (RSS) predicates Typestates: – RSS.finished(t), RSS.started(t), RSS.pending(t), RSS.invalid(t) Read/write sets – RSS.rs(t,v), RSS.ws(t,v) Prohibited read/write sets – RSS.prs(t,v), RSS.pws(t,v) Weak-predecessor – RSS.wp(t 1,t 2 ) 69

70 DSTM predicates Typestates: – DSTM.finished(t), DSTM.validated(t), DSTM.invalid(t), DSTM.aborted(t) Read/own sets – DSTM.rs(t,v), DSTM.os(t,v) 70

71 RSS commit(t) action 71 if  RSS.invalid(t)   RSS.wp(t,t) then  t 1,t 2. RSS.wp’(t 1,t 2 )  t 1  t  t 2  t  (RSS.wp(t 1,t 2 )  RSS.wp(t,t 2 )  (RSS.wp(t 1,t)   v. v  RSS.ws(t 1 )  v  RSS.ws(t))) … post-state predicate current-state predicate action precondition write-write conflict executing thread

72 DSTM commit(t) action if DSTM.validated(t) then  t 1. DSTM.validated’(t 1 )  t 1  t  DSTM.validated(t 1 )    v. v  DSTM.rs(t 1 )  v  DSTM.os(t 1 ) … 72 read-own conflict

73 FOTS states and execution v rd v t1 DSTM t1 t2 73 memory location individual thread individual state S1

74 FOTS states and execution v rd v t1 DSTM.rs predicate evaluation DSTM.rs(t1,v)=1 DSTM t1 t2 DSTM.started 74 predicate evaluation DSTM.started(t1)=1 state S2

75 FOTS states and execution v wr v t2 DSTM.rs DSTM t1 t2 DSTM.ws DSTM.started 75 state S3

76 Product system The product of two systems: A  B Predicates = A.predicates  B.predicates Actions = commit(t) = { if (A.pre  B.pre) then … } rd(t,v) = { if (A.pre  B.pre) then … } … M refines RSS iff on every execution SIM holds:  action a M.pre(a)  RSS.pre(a) 76

77 Checking DSTM refines RSS The only precondition in RSS is for commit(t) We need to check SIM =  t.DSTM.validated(t)   RSS.invalid(t)   RSS.wp(t,t) holds for DSTM  RSS for all reachable states Proof rule: 77 DSTM  RSS  SIM DSTM refines RSS how do we check this safety property?

78 Checking safety by invisible invariants How do we prove that property  holds for all reachable states of system M? Pnueli, Ruah, Zuck [TACAS’01] Come up with inductive invariant  that contains reachable states of M and strengthens SIM: 78 I1: Initial   I2:   transition   ’ I3:    M  

79 Strict serializability proof rule Proof roadmap: 1.Divine candidate invariant  2.Prove I1, I2, I3 I1: Initial   I2:   transition   ’ I3:   SIM DSTM  RSS  SIM 79

80 Two challenges Proof roadmap: 1.Divine candidate invariant  2.Prove I1, I2, I3 But how do we find a candidate  ? infinite space of possibilities given candidate  how do we check the proof rule? checking A  B is undecidable for first-order logic I1: Initial   I2:   transition   ’ I3:   SIM DSTM  RSS  SIM 80

81 Our solution Proof roadmap: 1.Divine candidate invariant  2.Prove I1, I2, I3 But how do we find a candidate  ? use templates and iterative weakening given candidate  how do we check the proof rule? use abstract checking I1: Initial   I2:   transition   ’ I3:   SIM DSTM  RSS  SIM 81 utilize insights on transactional memory implementations

82 Invariant for DSTM  RSS P1:  t, t 1. RSS.wp(t,t 1 )   RSS.invalid(t)   RSS.pending(t)   v. v  RSS.ws(t 1 )  v  RSS.ws(t) P2:  t, v. v  RSS.rs(t)   DSTM.aborted(t)  v  DSTM.rs(t) P3:  t, v. v  RSS.ws(t)   DSTM.aborted(t)  v  DSTM.os(t) P4:  t. DSTM.validated(t)   RSS.wp(t,t) P5:  t. DSTM.validated(t)   RSS.invalid(t) P6:  t. DSTM.validated(t)   RSS.pending(t) Inductive invariant involving only RSS – can use for all future proofs

83 Templates for DSTM  RSS P1:  t, t 1.  1 (t,t 1 )    2 (t)    3 (t)   v. v   4 (t 1 )  v   5 (t) P2:  t, v. v   1 (t)    2 (t)  v   3 (t) P3:  t, v. V   1 (t)    2 (t)  v   3 (t) P4:  t.  1 (t)    2 (t,t) P5:  t.  1 (t)    2 (t) P6:  t.  1 (t)    2 (t)

84 Templates for DSTM  RSS  t, t 1.  1 (t,t 1 )    2 (t)    3 (t)   v. v   4 (t 1 )  v   5 (t)  t, v. v   1 (t)    2 (t)  v   3 (t)  t.  1 (t)    2 (t,t) Why templates? Makes invariant separable Controls complexity of invariants Adding templates enables refinement

85 Mining candidate invariants Use predefined set of templates to specify structure of candidate invariants –  t,v.  1   2   3 –  1,  2,  3 are predicates of M or their negations – Existential formulas capturing 1-level conflicts  v. v   4 (t 1 )  v   5 (t 2 ) Mine candidate invariants from concrete execution 85

86 Iterative invariant weakening Initial candidate invariant C 0 =P1  P2  …  Pk Try to prove I2:   transition   ’ C 1 = { Pi | I0  transition  Pi for Pi  I0} If C 1 =C 0 then we have an inductive invariant Otherwise, compute C 2 = { Pi | C 1  transition  Pi for Pi  C 1 } Repeat until either – found inductive invariant – check I3: C k  SIM – Reached top {} – trivial inductive invariant I1: Initial   I2:   transition   ’ I3:   SIM DSTM  RSS  SIM 86

87 Weakening illustration P1P2 P3 P1P2 P3

88 Abstract proof rule I1:  (Initial)   I2: abs_transition(  (  ))   ’ I3:  (  )  SIM DSTM  RSS  SIM I1: Initial   I2:   transition   ’ I3:   SIM DSTM  RSS  SIM Formula abstraction Abstract transformer Approximate entailment 88

89 Conclusion Novel invariant generation using templates – extends applicability of invisible invariants Abstract domain and reasoning to check invariants without state explosion Proved strict-serializability for TPL, DSTM, TL2 – BLAST and TVLA failed 89

90 Verification results propertyTPLDSTMTL2RSS Bound for invariant gen. (2,1) No. cubes81843447296 Bounded time481023 Invariant mining time 6132657 #templates28 #candidates22539719 #proved22306814 #minimal485- avg, time per invariant 3.720.33643.4 avg. abs. size31.7256.91.19k2.86k Total time3.5m54.3m129.3m30.9m   90

91 Insights on transactional memories Transition relation is symmetric – thread identifiers not used p’(t,v)  …  t 1 …  t 2 Executing thread t interacts only with arbitrary thread or conflict-adjacent thread Arbitrary thread:  v. v  TL2.rs(t 1 )  v  TL2.ws(t 1 ) Conflict adjacent:  v. v  DSTM.rs(t 1 )  v  DSTM.ws(t) 91

92 v1v1 read set write set v2v2 t2t2 t t3t3 read-write conflict write-write conflict Conflict adjacency 92  v. v  rs(t)  v  DSTM.ws(t 2 )  v. v  ws(t 1 )  v  DSTM.ws(t 2 )

93 ... t2t2 t t3t3 Conflict adjacency 93

94 Related work Reduction theorems Guerarroui et al. [PLDI’08,CONCUR’08] Manually-supplied invariants – fixed number of threads and variables + PVS Cohen et al. [FMCAS’07] Predicate Abstraction + Shape Analysis – SLAM, BLAST, TVLA 94

95 Related work Invisible invariants Arons et al. [CAV’01] Pnueli et al. [TACAS’01] Templates – for arithmetic constraints Indexed predicate abstraction Shuvendu et al. [TOCL’07] Thread quantification Berdine et al. [CAV’08] Segalov et al. [APLAS’09] 95

96 Thank You! 96


Download ppt "Program Analysis and Synthesis of Parallel Systems Roman ManevichBen-Gurion University."

Similar presentations


Ads by Google