Shape Analysis for Fine-Grained Concurrency using Thread Quantification Josh Berdine Microsoft Research Joint work with: Tal Lev-Ami, Roman Manevich, Mooly Sagiv (Tel Aviv), Ganesan Ramalingam (MSR India)
2 Non-blocking stack [Treiber,86] void push(Stack *S, data_type v) { [1] Node *x = alloc(sizeof(Node)); [2] x->d = v; [3] do { [4] Node *t = S->Top; [5] x->n = t; [6] } while (!CAS(&S->Top,t,x)); [7] } data_type pop(Stack *S){ [8] do { [9] Node *t = S->Top; [10] if (t == NULL) [11] return EMPTY; [12] Node *s = t->n; [13] data_type r = t->d; [14] } while (!CAS(&S->Top,t,s)); [15] return r; [16] } benign data races unbounded number of threads t points to valid memory? list remains acyclic? if (S->Top == t) S->Top = x; evaluate to true; else evaluate to false; Stack linearizable?
Linearizable data structure –Concurrent operations allowed to be interleaved –Operations appear to execute atomically External observer gets the illusion that each operation takes effect instantaneously at some point between its invocation and its response Order of operations of same thread preserved –Sequential specification defines legal sequential executions 3 time push(4) pop():4push(7) push(4) pop():4push(7) Last In First Out Concurrent LIFO stack T1T1 T2T2 Linearizability [Herlihy and Wing, TOPLAS'90]
push2(4,5) pop2():8,5push2(7,8) 4 void push2(Stack *S, data_type v1, data_type * v2) { push(s, v1); push(s, v2); } void pop2(Stack *S, data_type * v1, data_type * v2) { *v2 = pop(s); *v1 = pop(s); } time push2(4,5) pop2():8,5push2(7,8) illegal sequential execution Non-linearizable pairs stack
push2(4,5) pop2():8,5push2(7,8) 5 void push2(Stack *S, data_type v1, data_type * v2) { push(s, v1); push(s, v2); } void pop2(Stack *S, data_type * v1, data_type * v2) { *v2 = pop(s); *v1 = pop(s); } time push2(4,5) pop2():8,5push2(7,8) illegal sequential execution Non-linearizable pairs stack
Motivation + what is linearizability Universally quantified shape abstractions Checking linearizability Case studies 6 Outline
Heaps contain both threads and objects 7 Concurrent heaps [Yahav, POPL01] thread object with program counter thread-local variable list field list object pc=6 pc=2 x n x Top t global variable
Heaps contain both threads and objects –Logical structure, or –Formula in subset of FO TC [Yorsh et al., TOCL07] 8 Concurrent heaps [Yahav, POPL01] pc=6 pc=2 x n x Top t pc(tr 1 )=6 pc(tr 2 )=2 v 1,v 2,v 3. Top(v 1 ) x(tr 1,v 2 ) t(tr 1,v 1 ) x(tr 2,v 3 ) n(v 2,v 1 ) … v1v1 v3v3 v2v2 tr 1 tr 2
9 Unbounded concurrent heaps void push(Stack *S, data_type v) { [1] Node *x = alloc(sizeof(Node)); [2] x->d = v; [3] do { [4] Node *t = S->Top; [5] x->n = t; [6] } while (!CAS(&S->Top,t,x)); [7] } pc=6 pc=5 x n x Top pc=1 pc=2 x x t pc=5 x t pc=6 x n t t pc=1 Unbounded parallel composition: push(Top,?) ||... || push(Top,?) n n
Each subheap –Presents a view of heap relative to one thread –Can be instantiated 0 times 10 Thread-relative subheaps pc=5 t pc=2 x x pc=1 Top pc=6 t n x Top n n n n n n n n
Each subheap –Presents a view of heap relative to one thread –Can be instantiated 0 times –Bounded by finitary abstraction 11 Bounded thread-relative subheaps pc=4 t pc=2 x x pc=1 Top pc=6 t n x Top n n n n n n n n
12 Concurrent heap pc(tr 1 )=6 pc(tr 2 )=2 v 1,v 2,v 3. Top(v 1 ) x(tr 1,v 2 ) t(tr 1,v 1 ) x(tr 2,v 3 ) n(v 2,v 1 ) … pc=6 pc=2 x n x Top t v1v1 v3v3 v2v2 tr 1 tr 2
pc=2 x Top pc(t)=6 v 1,v 2. Top(v 1 ) x(t,v 2 ) t(t,v 1 ) n(v 2,v 1 ) … t. pc(t)=2 v 1,v 3. Top(v 1 ) x(t,v 3 ) … 13 Universally quantified local heaps pc=6 x n Top t t t v1v1 v1v1 v2v2 v3v3 symbolic thread
pc(t)=6 v 1,v 2. Top(v 1 ) x(t,v 2 ) t(t,v 1 ) n(v 2,v 1 ) … t. pc(t)=2 v 1,v 3. Top(v 1 ) x(t,v 3 ) … 14 Meaning of quantified invariant pc=6 x n Top t x pc=1 pc=6 pc=2 t Information maintained (dis)equalities between local variables of each thread and global variables Objects reachable from global variables Information lost (dis)equalities between local variables of different threads Number of threads pc=2 x Top x pc=1 pc=6 pc=3 t pc=1 ×m×m n×n×
Motivation + what is linearizability Universally quantified shape abstractions Checking linearizability Case studies 15 Outline
Linearizable data structure –Concurrent operations allowed to be interleaved –Operations appear to execute atomically External observer gets the illusion that each operation takes effect instantaneously at some point between its invocation and its response Order of operations of same thread preserved –Sequential specification defines legal sequential executions 16 time push(4) pop():4push(7) push(4) pop():4 push(7) Last In First Out Concurrent LIFO stack T1T1 T2T2 Linearizability [Herlihy and Wing, TOPLAS'90]
Compare each concurrent execution to a specific sequential execution Show that every (terminating) concurrent operation returns the same result as its sequential counterpart 17 Verification of fixed linearization points [Amit et al., CAV07] linearization point operation Concurrent Execution Sequential Execution compare results... linearization point Conjoined Execution compare results
Top pc=1 18 Conjoined execution for push concurrent state sequential view isomorphism relation Top void push(Stack *S, data_type v) { [1] Node *x = alloc(sizeof(Node)); [2] x->d = v; [3] do { [4] Node *t = S->Top; [5] x->n = t; [6] } while (!CAS(&S->Top,t,x)); on CAS [7] }
Top pc=1 19 Conjoined execution for push conjoined state duo-object void push(Stack *S, data_type v) { [1] Node *x = alloc(sizeof(Node)); [2] x->d = v; [3] do { [4] Node *t = S->Top; [5] x->n = t; [6] } while (!CAS(&S->Top,t,x)); on CAS [7] }
20 Conjoined execution for push Top pc=2 x delta object tracks differences between concurrent and sequential execution per thread Top pc=1 void push(Stack *S, data_type v) { [1] Node *x = alloc(sizeof(Node)); [2] x->d = v; [3] do { [4] Node *t = S->Top; [5] x->n = t; [6] } while (!CAS(&S->Top,t,x)); on CAS [7] }
21 Conjoined execution for push void push(Stack *S, data_type v) { [1] Node *x = alloc(sizeof(Node)); [2] x->d = v; [3] do { [4] Node *t = S->Top; [5] x->n = t; [6] } while (!CAS(&S->Top,t,x)); on CAS [7] } Top pc=2 x Top pc=1 Top pc=5 x t … Top pc=6 x t n Top pc=7 n if (S->Top == t) S->Top = x; evaluate to true; else evaluate to false;
22 Run operation sequentially void push(Stack *S, data_type v) { [1] Node *x = alloc(sizeof(Node)); [2] x->d = v; [3] do { [4] Node *t = S->Top; [5] x->n = t; [6] } while (!CAS(&S->Top,t,x)); on CAS [7] } Top pc=7 n Top pc=7 n x Top pc=7 n x t Top pc=7 n x t n Top pc=7 nn Top pc=7 n Check results: concurrent and sequential stacks are correlated
Observations used Unbounded number of heap objects –Number of delta objects created per thread is bounded –Objects in recursive data structures bounded by existing shape abstractions Delta objects always referenced by local or global variables –Captured by single threads view of heap Threads mutate data structures near global access points –Can precisely model success/failure of CAS without looking deep into heap Losing most inter-thread correlations is ok –Fine-grained programs must protect themselves from interference 23
Motivation + what is linearizability Universally quantified shape abstractions Checking linearizability Case studies 24 Outline
25 Case studies Verified Programs#statestime (sec.) Non-blocking stack [Treiber 1986] Two-lock queue [Michael & Scott, PODC 1996] 3, Non-blocking queue [Doherty & Groves, FORTE 2004] 10,
Related work [Gotsman et al., PLDI07] –Thread-modular shape analysis for coarse-grained concurrency [Vafeiadis et al.,06,07,08] –Linearizability for an unbounded number of threads with rely-guarantee & separation logic 26
Strengths –Parametric shape abstraction for an unbounded number of threads –Verifies linearizability of fine-grained concurrent implementations –Tunable scalability via thread-modular aspects –Tunable precision via abstract semantics using multiple-instantiations of invariants Limitations / Future work –Fixed, specified, linearization points –Setting the frameworks knobs optimally can be difficult, and require understanding program –Only as good as underlying heap abstraction –Does not prove encapsulation of data structure –May want to prove more than linearizability 27 Conclusion
28
29 An unbounded state void push(Stack *S, data_type v) { [1] Node *x = alloc(sizeof(Node)); [2] x->d = v; [3] do { [4] Node *t = S->Top; [5] x->n = t; [6] } while (!CAS(&S->Top,t,x)); on CAS [7] } pc=6 pc=4 x n x Top pc=1 pc=2 x x t pc=4 x t pc=6 x n t t pc=1 unbounded number of delta objects n n
Top pc=1 n n Top pc=2 x n n pc=4 x Top t n n pc=6 x n Top t n n 30 Bounded local states number of delta objects per local heap bounded
31 Loss of non-aliasing information pc(t)=6 v 1,v 2. Top(v 1 ) x(t,v 2 ) t(t,v 1 ) n(v 2,v 1 ) … t. pc=6 x n Top pc=6 x n t t x n t x t unwanted aliasing consider x->n=t Remedy: record non-aliasing information explicitly n
32 Adding non-aliasing information pc=6 P x n Top pc=6 P x n t t x n t x Referenced by exactly one thread pc(t)=6 v 1,v 2. Top(v 1 ) x(t,v 2 ) t(t,v 1 ) n(v 2,v 1 ) Private(v 1 ) Private(v 2 ) … t. P t n
33 Adding non-aliasing information pc(t)=6 v 1,v 2. Top(v 1 ) x(t,v 2 ) t(t,v 1 ) n(v 2,v 1 ) Private(v 1 ) Private(v 2 ) … t. pc=6 P x n Top pc=6 P x n t t x n t P x P t Operation on private objects invisible to other threads n
Add universal quantification on top of finitary heap abstractions –Handle unbounded number of threads Local heaps can overlap –Handle fine-grained concurrency Strengthen local heaps by Private predicate –Private objects cannot be affected by actions of other threads Missing: transformers (see paper) 34 Recap
Tracks bounded differences between concurrent and sequential execution per thread –Abstracts two heaps together –Handles unbounded number of threads Abstracts correlations between threads – Thread-modular characteristics 35 Shape analysis with delta abstraction for unbounded threads
36 Linearization points for Treibers stack void push(Stack *S, data_type v) { [1] Node *x = alloc(sizeof(Node)); [2] x->d = v; [3] do { [4] Node *t = S->Top; [5] x->n = t; [6] } while (!CAS(&S->Top,t,x)); on CAS [7] } data_type pop(Stack *S){ [8] do { [9] Node *t = S->Top; [10] if (t == NULL) [11] return EMPTY; [12] Node *s = t->n; [13] data_type r = t->d; [14] } while (!CAS(&S->Top,t,s)); on CAS [15] return r; [16] }
Generic technique for lifting abstract domains with universal quantifiers Abstract transformers –Thread instantiation Combining universal quantification with heap decomposition 37 Whats missing from the talk?
Can you handle mutex? Yes with Canonical Abstraction t 1. { …. t 2. … } Not with Boolean Heaps –Only one level of quantification 38
Support free variables (u,v,w) Support join and meet operations 39 Requirements from base domain
Incrementally constructed during execution Nodes allocated by matching push operations are correlated Correlated nodes have equal data values –Show that matching pops return data values of correlated nodes Constructing the correlation relation
Every operation has (user-specified) fixed linearization point –Statement at which the operation appears to take effect Show that these linearization points are correct for every concurrent execution User may specify –Several (alternative) linearization points –Certain types of conditional linearization points e.g., successful CAS operations 41 Fixed linearization points
Stack's most-general client void client (Stack S) { do { if (?) push(S, rand()); else pop(S); } while ( 1 ); }
New parametric shape analysis –Universally quantified shape abstractions Extra level of quantification over shape abstraction –Fine-grained concurrency –Unbounded number of threads –Thread-modular aspects Sound transformers Application –Checking linearizability of concurrent data structures 43 Main results