Download presentation
Presentation is loading. Please wait.
Published byOwen Poole Modified over 9 years ago
1
Program Analysis and Synthesis of Parallel Systems Roman ManevichBen-Gurion University
2
Three papers 1. A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11] 2. Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12] 3. Parameterized Verification of Transactional Memories [PLDI’10]
3
What’s the connection? A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11] Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12] Parameterized Verification of Transactional Memories [PLDI’10] From analysis to language design Creates opportunities for more optimizations. Requires other analyses Similarities between abstract domains
4
What’s the connection? A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11] Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12] Parameterized Verification of Transactional Memories [PLDI’10] From analysis to language design Creates opportunities for more optimizations. Requires other analyses Similarities between abstract domains
5
A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos 1 Keshav Pingali 1,2 Roman Manevich 2 Kathryn S. McKinley 1 1: Department of Computer Science, The University of Texas at Austin 2: Institute for Computational Engineering and Sciences, The University of Texas at Austin
6
Motivation 6 Graph algorithms are ubiquitous Goal: Compiler analysis for optimization of parallel graph algorithms Computational biology Social Networks Computer Graphics
7
Minimum Spanning Tree Problem 7 cd ab ef g 24 6 5 3 7 4 1
8
8 cd ab ef g 24 6 5 3 7 4 1
9
Boruvka’s Minimum Spanning Tree Algorithm 9 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node cd ab ef g 24 6 5 3 7 4 1 d a,c b ef g 4 6 3 4 1 7 lt
10
Parallelism in Boruvka 10 cd ab ef g 24 6 5 3 7 4 1 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node
11
Non-conflicting iterations 11 cd ab 2 5 3 7 4 1 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node ef g 4 6
12
Non-conflicting iterations 12 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node d a,c b 3 4 1 7 e f,g 6
13
Conflicting iterations 13 cd ab ef g 24 6 5 3 7 4 1 Build MST bottom-up repeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST } until graph is a single node
14
Algorithm = repeated application of operator to graph – Active node: Node where computation is needed – Activity: Application of operator to active node – Neighborhood: Sub-graph read/written to perform activity – Unordered algorithms: Active nodes can be processed in any order Amorphous data-parallelism – Parallel execution of activities, subject to neighborhood constraints Neighborhoods are functions of runtime values – Parallelism cannot be uncovered at compile time in general Parallelism in Boruvka i1i1 i2i2 i3i3
15
Optimistic parallelization in Galois Programming model – Client code has sequential semantics – Library of concurrent data structures Parallel execution model – Thread-level speculation (TLS) – Activities executed speculatively Conflict detection – Each node/edge has associated exclusive lock – Graph operations acquire locks on read/written nodes/edges – Lock owned by another thread conflict iteration rolled back – All locks released at the end Two main overheads – Locking – Undo actions i1i1 i2i2 i3i3
16
Generic optimization structure Program Annotated Program Program Analyzer Program Transformer Optimized Program
17
Overheads (I): locking Optimizations – Redundant locking elimination – Lock removal for iteration private data – Lock removal for lock domination ACQ(P): set of definitely acquired locks per program point P Given method call M at P: Locks(M) ACQ(P) Redundant Locking
18
Overheads (II): undo actions 18 Lockset Grows Lockset Stable Failsafe … foreach (Node a : wl) { … … } foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } Program point P is failsafe if: Q : Reaches(P,Q) Locks(Q) ACQ(P)
19
GSet wl = new GSet (); wl.addAll(g.getNodes()); GBag mst = new GBag (); foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } Lockset analysis Redundant Locking Locks(M) ACQ(P) Undo elimination Q : Reaches(P,Q) Locks(Q) ACQ(P) Need to compute ACQ(P) : Runtime overhead
20
The optimization technically Each graph method m(arg 1,…,arg k, flag) contains optimization level flag flag=LOCK– acquire locks flag=UNDO– log undo (backup) data flag=LOCK_UNOD– (default) acquire locks and log undo flag=NONE– no extra work Example: Edge e = g.getEdge(lt, n, NONE)
21
Analysis challenges The usual suspects: – Unbounded Memory Undecidability – Aliasing, Destructive updates Specific challenges: – Complex ADTs: unstructured graphs – Heap objects are locked – Adapt abstraction to ADTs We use Abstract Interpretation [CC’77] – Balance precision and realistic performance
22
Shape analysis overview HashMap-Graph Tree-based Set ………… Graph { @rep nodes @rep edges … } Graph Spec Concrete ADT Implementations in Galois library Predicate Discovery Shape Analysis Boruvka.java Optimized Boruvka.java Set { @rep cont … } Set Spec ADT Specifications
23
ADT specification Graph { @rep set nodes @rep set edges Set neighbors(Node n); } Graph Spec... Set S1 = g.neighbors(n);... Boruvka.java Abstract ADT state by virtual set fields @locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src) @op( nghbrs = n.rev(src).dst + n.rev(dst).src, ret = new Set >(cont=nghbrs) ) Assumption: Implementation satisfies Spec
24
Graph { @rep set nodes @rep set edges @locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src) @op( nghbrs = n.rev(src).dst + n.rev(dst).src, ret = new Set >(cont=nghbrs) ) Set neighbors(Node n); } Modeling ADTs c ab Graph Spec dst src dst src
25
Modeling ADTs c ab nodes edges Abstract State cont ret nghbrs Graph Spec dst src dst src Graph { @rep set nodes @rep set edges @locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src) @op( nghbrs = n.rev(src).dst + n.rev(dst).src, ret = new Set >(cont=nghbrs) ) Set neighbors(Node n); }
26
Abstraction scheme cont S1S2 L(S1.cont) L(S2.cont) (S1 ≠ S2) ∧ L(S1.cont) ∧ L(S2.cont) Parameterized by set of LockPaths: L(Path) o. o ∊ Path Locked(o) – Tracks subset of must-be-locked objects Abstract domain elements have the form: Aliasing-configs 2 LockPaths …
27
Joining abstract states 27 Aliasing is crucial for precision May-be-locked does not enable our optimizations #Aliasing-configs : small constant ( 6)
28
lt GSet wl = new GSet (); wl.addAll(g.getNodes()); GBag mst = new GBag (); foreach (Node a : wl) { Set aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } g.removeNode(lt); mst.add(minW); wl.add(a); } Example invariant in Boruvka 28 The immediate neighbors of a and lt are locked a ( a ≠ lt ) ∧ L(a) ∧ L(a.rev(src)) ∧ L(a.rev(dst)) ∧ L(a.rev(src).dst) ∧ L(a.rev(dst).src) ∧ L(lt) ∧ L(lt.rev(dst)) ∧ L(lt.rev(src)) ∧ L(lt.rev(dst).src) ∧ L(lt.rev(src).dst) …..
29
Heuristics for finding LockPaths Hierarchy Summarization (HS) – x.( fld )* – Type hierarchy graph acyclic bounded number of paths – Preflow-Push: L( S.cont) ∧ L(S.cont.nd) Nodes in set S and their data are locked Set S Node NodeData cont nd
30
Footprint graph heuristic Footprint Graphs (FG)[Calcagno et al. SAS’07] – All acyclic paths from arguments of ADT method to locked objects – x.( fld | rev(fld) )* – Delaunay Mesh Refinement: L(S.cont) ∧ L(S.cont.rev(src)) ∧ L(S.cont.rev(dst)) ∧ L(S.cont.rev(src).dst) ∧ L(S.cont.rev(dst).src) – Nodes in set S and all of their immediate neighbors are locked Composition of HS, FG – Preflow-Push: L(a.rev(src).ed) 30 FG HS
31
Experimental evaluation Implement on top of TVLA – Encode abstraction by 3-Valued Shape Analysis [SRW TOPLAS’02] Evaluation on 4 Lonestar Java benchmarks Inferred all available optimizations # abstract states practically linear in program size BenchmarkAnalysis Time (sec) Boruvka MST6 Preflow-Push Maxflow7 Survey Propagation12 Delaunay Mesh Refinement16
32
Impact of optimizations for 8 threads 8-core Intel Xeon @ 3.00 GHz
33
Note 1 How to map abstract domain presented so far to TVLA? – Example invariant: (x≠y L(y.nd)) (x=y L(x.nd)) – Unary abstraction predicate x(v) for pointer x – Unary non-abstraction predicate L[x.p] for pointer x and path p – Use partial join – Resulting abstraction similar to the one shown
34
Note 2 How to come up with abstraction for similar problems? 1.Start by constructing a manual proof Hoare Logic 2.Examine resulting invariants and generalize into a language of formulas May need to be further specialized for a given program – interesting problem (machine learning/refinement) – How to get sound transformers?
35
Note 3 How did we avoid considering all interleavings? Proved non-interference side theorem
36
Elixir : A System for Synthesizing Concurrent Graph Programs Dimitrios Prountzos 1 Roman Manevich 2 Keshav Pingali 1 1. The University of Texas at Austin 2. Ben-Gurion University of the Negev
37
Goal Allow programmer to easily implement correct and efficient parallel graph algorithms Graph algorithms are ubiquitous Social network analysis, Computer graphics, Machine learning, … Difficult to parallelize due to their irregular nature Best algorithm and implementation usually – Platform dependent – Input dependent Need to easily experiment with different solutions Focus: Fixed graph structure Only change labels on nodes and edges Each activity touches a fixed number of nodes
38
Problem Formulation – Compute shortest distance from source node S to every other node Many algorithms – Bellman-Ford (1957) – Dijkstra (1959) – Chaotic relaxation (Miranker 1969) – Delta-stepping (Meyer et al. 1998) Common structure – Each node has label dist with known shortest distance from S Key operation – relax-edge(u,v) Example: Single-Source Shortest-Path 2 5 1 7 A A B B C C D D E E F F G G S S 3 4 2 2 1 9 12 2 A A C C 3 if dist(A) + W AC < dist(C) dist(C) = dist(A) + W AC
39
Scheduling of relaxations: Use priority queue of nodes, ordered by label dist Iterate over nodes u in priority order On each step: relax all neighbors v of u – Apply relax-edge to all (u,v) Dijkstra’s algorithm 2 5 1 7 A A B B C C D D E E F F G G S S 3 4 2 2 1 9 7 5 3 6
40
Chaotic relaxation Scheduling of relaxations: Use unordered set of edges Iterate over edges (u,v) in any order On each step: – Apply relax-edge to edge (u,v) 40 2 5 1 7 A A B B C C D D E E F F G G S S 3 4 2 2 1 9 5 12 (S,A) (B,C) (C,D) (C,E)
41
Insights behind Elixir What should be done How it should be done Unordered/Ordered algorithms Operator Delta : activity Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Identify new activities Static Schedule Static Schedule Dynamic Schedule “TAO of parallelism” PLDI 2011
42
Insights behind Elixir Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Identify new activities Static Schedule Static Schedule Dynamic Schedule Dijkstra-style Algorithm q = new PrQueue q.enqueue(SRC) while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) }
43
Contributions Language – Operators/Schedule separation – Allows exploration of implementation space Operator Delta Inference – Precise Delta required for efficient fixpoint computations Automatic Parallelization – Inserts synchronization to atomically execute operators – Avoids data-races / deadlocks – Specializes parallelization based on scheduling constraints Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Static Schedule Dynamic Schedule Synchronization
44
SSSP in Elixir Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Graph type Operator Fixpoint Statement 44
45
Operators Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Redex pattern Guard Update b b a a if bd > ad + w ad w bd b b a a ad w ad+w Cautious by construction – easy to generalize
46
Fixpoint statement Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Apply operator until fixpoint Scheduling expression 46
47
Scheduling examples Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int) ] relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Locality enhanced Label-correcting group b ≫ unroll 2 ≫ approx metric ad Locality enhanced Label-correcting group b ≫ unroll 2 ≫ approx metric ad Dijkstra-style metric ad ≫ group b Dijkstra-style metric ad ≫ group b q = new PrQueue q.enqueue(SRC) while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) }
48
Operator Delta Inference Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Static Schedule Static Schedule Dynamic Schedule
49
Identifying the delta of an operator b b a a relax 1 ? ?
50
Delta Inference Example b b a a SMT Solver assume (da + w 1 < db) assume ¬(dc + w 2 < db) db_post = da + w 1 assert ¬(dc + w 2 < db_post) Query Program relax 1 c c w2w2 w1w1 relax 2 (c,b) does not become active
51
assume (da + w 1 < db) assume ¬(db + w 2 < dc) db_post = da + w 1 assert ¬(db_post + w 2 < dc) Query Program Delta inference example – active SMT Solver b b a a relax 1 c c w1w1 relax 2 w2w2 Apply relax on all outgoing edges (b,c) such that: dc > db +w 2 and c ≄ a Apply relax on all outgoing edges (b,c) such that: dc > db +w 2 and c ≄ a
52
Influence patterns b=c a a d d b b a=c d d a=d c c b b b=d a=c b=c a=d b=d a a c c
53
System architecture Elixir Galois/OpenMP Parallel Runtime Algorithm Spec Parallel Thread-Pool Graph Implementations Worklist Implementations Synthesize code Insert synchronization C++ Program
54
Experiments Explored Dimensions GroupingStatically group multiple instances of operator UnrollingStatically unroll operator applications by factor K Dynamic SchedulerChoose different policy/implementation for the dynamic worklist... Compare against hand-written parallel implementations
55
SSSP results 24 core Intel Xeon @ 2 GHz USA Florida Road Network (1 M nodes, 2.7 M Edges) Group + Unroll improve locality Implementation Variant
56
Breadth-First search results Scale-Free Graph 1 M nodes, 8 M edges USA road network 24 M nodes, 58 M edges
57
Conclusion Graph algorithm = Operators + Schedule – Elixir language : imperative operators + declarative schedule Allows exploring implementation space Automated reasoning for efficiently computing fixpoints Correct-by-construction parallelization Performance competitive with hand- parallelized code
58
Parameterized Verification of Software Transactional Memories Michael EmmiRupak Majumdar Roman Manevich
59
Motivation Transactional memories [Herlihy ‘93] – Programmer writes code w. coarse-grained atomic blocks – Transaction manager takes care of conflicts providing illusion of sequential execution Strict serializability – correctness criterion – Formalizes “illusion of sequential execution” Parameterized verification – Formal proof for given implementation – For every number of threads – For every number of memory objects – For every number and length of transactions 59
60
STM terminology Statements: reads, writes, commit, abort Transaction: reads and writes of variables followed by commit (committing transaction) or abort (aborting transaction) Word: interleaved sequence of transactions of different threads Conflict: two statements conflict if – One is a read of variable X and other is a commit of a transaction that writes to X – Both are commits of transactions that write to X 60
61
Safety property: strict serializability There is a serialization for the committing threads such that order of conflicts is preserved Order of non-overlapping transactions remains the same 61
62
Safety property: strict serializability There is a serialization for the committing threads such that order of conflicts is preserved Order of non-overlapping transactions remains the same Example word: (rd X t1), (rd Y t2), (wr X t2), (commit t2), (commit t1) => Can be serialized to : (rd X t1), (commit t1), (rd Y t2), (wr X t2), (commit t2) conflict 62
63
Main results First automatic verification of strict serializability for transactional memories – TPL, DSTM, TL2 New proof technique: – Template-based invisible invariant generation – Abstract checking algorithm to check inductive invariants 63 Challenging – requires reasoning on both universal and existential properties
64
Outline Strict serializability verification approach Automating the proof Experiments Conclusion Related work 64
65
Proof roadmap 1 Goal: prove model M is strictly serializable 1.Given a strict-serializability reference model RSS reduce checking strict-serializability to checking that M refines RSS 2.Reduce refinement to checking safety – Safety property SIM: whenever M can execute a statement so can RSS – Check SIM on product system M RSS 65
66
Proof roadmap 2 3.Model STMs M and RSS in first-order logic TM models use set data structures and typestate bits 4.Check safety by generating strong enough candidate inductive invariant and checking inductiveness – Use observations on structure of transactional memories – Use sound first-order reasoning 66
67
Reference strict serializability model Guerraoui, Singh, Henzinger, Jobstmann [PLDI’08] RSS : Most liberal specification of strictly serializable system – Allows largest language of strictly-serializable executions M is strictly serializable iff every word of M is also a word of RSS – Language(M) Language(RSS) – M refines RSS 67
68
Modeling transactional memories M n,k =(predicates,actions) – Predicate : ranked relation symbol p(t), q(t,v),… – Binary predicates used for sets so instead of rs(t,v) I’ll write v rs(t) – Action : a(t,v) = if pre(a) then p’(v)=…, q’(u,v)=… Universe = set of k thread individuals and n memory individuals State S = a valuation to the predicates 68
69
Reference model (RSS) predicates Typestates: – RSS.finished(t), RSS.started(t), RSS.pending(t), RSS.invalid(t) Read/write sets – RSS.rs(t,v), RSS.ws(t,v) Prohibited read/write sets – RSS.prs(t,v), RSS.pws(t,v) Weak-predecessor – RSS.wp(t 1,t 2 ) 69
70
DSTM predicates Typestates: – DSTM.finished(t), DSTM.validated(t), DSTM.invalid(t), DSTM.aborted(t) Read/own sets – DSTM.rs(t,v), DSTM.os(t,v) 70
71
RSS commit(t) action 71 if RSS.invalid(t) RSS.wp(t,t) then t 1,t 2. RSS.wp’(t 1,t 2 ) t 1 t t 2 t (RSS.wp(t 1,t 2 ) RSS.wp(t,t 2 ) (RSS.wp(t 1,t) v. v RSS.ws(t 1 ) v RSS.ws(t))) … post-state predicate current-state predicate action precondition write-write conflict executing thread
72
DSTM commit(t) action if DSTM.validated(t) then t 1. DSTM.validated’(t 1 ) t 1 t DSTM.validated(t 1 ) v. v DSTM.rs(t 1 ) v DSTM.os(t 1 ) … 72 read-own conflict
73
FOTS states and execution v rd v t1 DSTM t1 t2 73 memory location individual thread individual state S1
74
FOTS states and execution v rd v t1 DSTM.rs predicate evaluation DSTM.rs(t1,v)=1 DSTM t1 t2 DSTM.started 74 predicate evaluation DSTM.started(t1)=1 state S2
75
FOTS states and execution v wr v t2 DSTM.rs DSTM t1 t2 DSTM.ws DSTM.started 75 state S3
76
Product system The product of two systems: A B Predicates = A.predicates B.predicates Actions = commit(t) = { if (A.pre B.pre) then … } rd(t,v) = { if (A.pre B.pre) then … } … M refines RSS iff on every execution SIM holds: action a M.pre(a) RSS.pre(a) 76
77
Checking DSTM refines RSS The only precondition in RSS is for commit(t) We need to check SIM = t.DSTM.validated(t) RSS.invalid(t) RSS.wp(t,t) holds for DSTM RSS for all reachable states Proof rule: 77 DSTM RSS SIM DSTM refines RSS how do we check this safety property?
78
Checking safety by invisible invariants How do we prove that property holds for all reachable states of system M? Pnueli, Ruah, Zuck [TACAS’01] Come up with inductive invariant that contains reachable states of M and strengthens SIM: 78 I1: Initial I2: transition ’ I3: M
79
Strict serializability proof rule Proof roadmap: 1.Divine candidate invariant 2.Prove I1, I2, I3 I1: Initial I2: transition ’ I3: SIM DSTM RSS SIM 79
80
Two challenges Proof roadmap: 1.Divine candidate invariant 2.Prove I1, I2, I3 But how do we find a candidate ? infinite space of possibilities given candidate how do we check the proof rule? checking A B is undecidable for first-order logic I1: Initial I2: transition ’ I3: SIM DSTM RSS SIM 80
81
Our solution Proof roadmap: 1.Divine candidate invariant 2.Prove I1, I2, I3 But how do we find a candidate ? use templates and iterative weakening given candidate how do we check the proof rule? use abstract checking I1: Initial I2: transition ’ I3: SIM DSTM RSS SIM 81 utilize insights on transactional memory implementations
82
Invariant for DSTM RSS P1: t, t 1. RSS.wp(t,t 1 ) RSS.invalid(t) RSS.pending(t) v. v RSS.ws(t 1 ) v RSS.ws(t) P2: t, v. v RSS.rs(t) DSTM.aborted(t) v DSTM.rs(t) P3: t, v. v RSS.ws(t) DSTM.aborted(t) v DSTM.os(t) P4: t. DSTM.validated(t) RSS.wp(t,t) P5: t. DSTM.validated(t) RSS.invalid(t) P6: t. DSTM.validated(t) RSS.pending(t) Inductive invariant involving only RSS – can use for all future proofs
83
Templates for DSTM RSS P1: t, t 1. 1 (t,t 1 ) 2 (t) 3 (t) v. v 4 (t 1 ) v 5 (t) P2: t, v. v 1 (t) 2 (t) v 3 (t) P3: t, v. V 1 (t) 2 (t) v 3 (t) P4: t. 1 (t) 2 (t,t) P5: t. 1 (t) 2 (t) P6: t. 1 (t) 2 (t)
84
Templates for DSTM RSS t, t 1. 1 (t,t 1 ) 2 (t) 3 (t) v. v 4 (t 1 ) v 5 (t) t, v. v 1 (t) 2 (t) v 3 (t) t. 1 (t) 2 (t,t) Why templates? Makes invariant separable Controls complexity of invariants Adding templates enables refinement
85
Mining candidate invariants Use predefined set of templates to specify structure of candidate invariants – t,v. 1 2 3 – 1, 2, 3 are predicates of M or their negations – Existential formulas capturing 1-level conflicts v. v 4 (t 1 ) v 5 (t 2 ) Mine candidate invariants from concrete execution 85
86
Iterative invariant weakening Initial candidate invariant C 0 =P1 P2 … Pk Try to prove I2: transition ’ C 1 = { Pi | I0 transition Pi for Pi I0} If C 1 =C 0 then we have an inductive invariant Otherwise, compute C 2 = { Pi | C 1 transition Pi for Pi C 1 } Repeat until either – found inductive invariant – check I3: C k SIM – Reached top {} – trivial inductive invariant I1: Initial I2: transition ’ I3: SIM DSTM RSS SIM 86
87
Weakening illustration P1P2 P3 P1P2 P3
88
Abstract proof rule I1: (Initial) I2: abs_transition( ( )) ’ I3: ( ) SIM DSTM RSS SIM I1: Initial I2: transition ’ I3: SIM DSTM RSS SIM Formula abstraction Abstract transformer Approximate entailment 88
89
Conclusion Novel invariant generation using templates – extends applicability of invisible invariants Abstract domain and reasoning to check invariants without state explosion Proved strict-serializability for TPL, DSTM, TL2 – BLAST and TVLA failed 89
90
Verification results propertyTPLDSTMTL2RSS Bound for invariant gen. (2,1) No. cubes81843447296 Bounded time481023 Invariant mining time 6132657 #templates28 #candidates22539719 #proved22306814 #minimal485- avg, time per invariant 3.720.33643.4 avg. abs. size31.7256.91.19k2.86k Total time3.5m54.3m129.3m30.9m 90
91
Insights on transactional memories Transition relation is symmetric – thread identifiers not used p’(t,v) … t 1 … t 2 Executing thread t interacts only with arbitrary thread or conflict-adjacent thread Arbitrary thread: v. v TL2.rs(t 1 ) v TL2.ws(t 1 ) Conflict adjacent: v. v DSTM.rs(t 1 ) v DSTM.ws(t) 91
92
v1v1 read set write set v2v2 t2t2 t t3t3 read-write conflict write-write conflict Conflict adjacency 92 v. v rs(t) v DSTM.ws(t 2 ) v. v ws(t 1 ) v DSTM.ws(t 2 )
93
... t2t2 t t3t3 Conflict adjacency 93
94
Related work Reduction theorems Guerarroui et al. [PLDI’08,CONCUR’08] Manually-supplied invariants – fixed number of threads and variables + PVS Cohen et al. [FMCAS’07] Predicate Abstraction + Shape Analysis – SLAM, BLAST, TVLA 94
95
Related work Invisible invariants Arons et al. [CAV’01] Pnueli et al. [TACAS’01] Templates – for arithmetic constraints Indexed predicate abstraction Shuvendu et al. [TOCL’07] Thread quantification Berdine et al. [CAV’08] Segalov et al. [APLAS’09] 95
96
Thank You! 96
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.