Inherent limitations facilitate design and verification of concurrent programs Hagit Attiya Technion
Concurrent Programs Core challenge is synchronization Correct synchronization is hard to get right Efficient synchronization is even harder Ad-hoc VS Principled Manual VS Automatic Ad-hoc VS Principled Manual VS Automatic
Work with Ramalingam and Rinetzky (POPL 2010) EXAMPLE I: VERIFYING LOCKING PROTOCOLS
The Goal: Sequential Reductions Verify concurrent data structures Pre-execution static analysis E.g., linked list with hand-over-hand locking no memory leaks, shape (its a list), serializability Find sequential reductions Consider only sequential executions But conclude that properties hold in all executions
Back-of-envelop estimate of gain Static analysis of a linked-list algorithm [Amit, Rinetzky, Reps, Sagiv, Yahav, CAV 2007] –Verifies e.g., memory safety, sortedness, pointed-to by a variable, heap sharing One thread (sequential) 10s3.6MB Two threads (interleaved)~4h886MB Three threads (interleaved)> 8h----
Serializability operation interleaved execution complete non-interleaved execution ~ ~ ~ ~ ~ ~ ~ ~ ~ to the thread locally [Papadimitriou 79]
If M is serializable Then Π φ cni-Π φ If M is serializable Then Π φ cni-Π φ Serializability assists verification Concurrent code M Π= all executions of M φ = a property local to the threads cni-Π: complete non-interleaved executions of M (small subset of Π) Easily derived from [Papadimitriou 79]
How do we know that M is serializable, w/o considering all executions? E.g., from only complete non interleaved executions If M is serializable Then Π φ cni-Π φ If M is serializable Then Π φ cni-Π φ
Special (and common) case: Disciplined programming with locks Guard access to data with locks –Lock() acquire the lock –Unlock() release the lock Only one process holds the lock at each time Follow a locking protocol that guarantees conflict serializability E.g., two-phase locking (2PL) or tree locking (TL)
Two-phase locking [Papadimitriou `79] Locks acquire (grow) phase followed by locks release (shrink) phase No lock is acquired after some lock is released t1t1 H t1t1 t1t1 t2t2 t1t1
Tree (hand-over-hand) locking [Kedem & Sliberschatz 76] [Smadi 76] [Bayer & Scholnick 77] Except for the first lock, acquire a lock only when holding the lock on its parent No lock is acquired after being released t1t1 H t1t1 t1t1 t2t2
Tree (hand-over-hand) locking [Kedem & Sliberschatz 76] [Smadi 76] [Bayer & Scholnick 77] Except for the first lock, acquire a lock only when holding the lock on its parent No lock is acquired after being released t1t1 t2t2 t2t2 H t1t1
void p() { acquire(B) B = 0 release(B) int b = B if (b) acquire(A) } void q() { acquire(B) B = 1 release(B) } Yes! –for databases –concurrency control monitor ensures that M follows the locking policy at run-time M is serializable No! –for static analysis –no central monitor Not two-phase locked But only in interleaved executions
Our Goal Statically verify that M follows a locking policy Applies to local conflict-serializable locking protocols –Depending only on threads local variables & global variables locked by it E.g., two phase locking, tree locking, (dynamic) DAG locking… But not protocols that rely on a concurrency control monitor!
Thread-local properties A thread-owned view contains the values of threads local variables & global variables locked by it A property φ is thread-local if it –Can be expressed in terms of thread-owned views –Is prefix closed A thread-local property of an execution holds in every execution indistinguishable from it
Our contribution: Easy step ni-Π: complete non-interleaved executions of M For any local conf serializable locking policy LP Π LP ni-Π LP For any local conf serializable locking policy LP Π LP ni-Π LP non-interleaved execution For any thread-local property φ Π φ ni-Π φ For any thread-local property φ Π φ ni-Π φ Two phase locking Tree locking Dynamic tree locking Dynamic DAG locking
Reduction to non-interleaved executions: Proof idea σ is the shortest execution that does not follow LP σ follows LP, guarantees conflict-serializability non interleaved execution equivalent to σ σ (t,e) σ
Reduction to non-interleaved executions: Proof idea σ is the shortest execution that does not follow LP σ follows LP, guarantees conflict-serializability non interleaved execution equivalent to σ σ (t,e) σ σ ni
Reduction to non-interleaved executions: Proof idea σ is the shortest execution that does not follow LP σ follows LP, guarantees conflict-serializability non interleaved execution similar to σ non interleaved execution similar to σ where LP is violated σ (t,e) σ σ ni (t,e)
Ni-reduction: Proof sketch there is a ni-execution that is equivalent to σ there is a ni-execution that is equivalent to σ where LP is violated σ σ ni (t,e)
Ni-reduction: Proof sketch There is a ni-execution σ ni with the same conflicts as in σ t can execute e also after σ ni Write σ ni = σ 1 σ t σ 2, σ t is the sub-exeuction by thread t t can execute e also after σ 1 σ t σ 1 σ t (t,e) is a ni-execution and it follows the locking protocol Since σ 1 σ t (t,e) and σ (t,e) are conflict equivalent, σ (t,e) follows the locking protocol
Further reduction acni-Π: almost-complete non-interleaved executions of M For any LCS locking policy LP Π LP acni-Π LP For any LCS locking policy LP Π LP acni-Π LP almost complete non-interleaved execution
Reduction to non-interleaved executions: A complication Need to argue about termination int X=0, Y=0 void p() { acquire(Y) y = Y release(Y); if (y 0) acquire(X) X = 3 release(X) } void q() { if (random(5) == 3){ acquire(Y) Y = 1 release(Y) while (true) nop } Y is set to 1 & the method enters an infinite loop Observe Y == 1 & violates 2PL
Reduction to non-interleaved executions: Termination Can use sequential reduction to verify termination For any terminating local conflict serializable locking policy LP Π LP acni-Π LP For any terminating local conflict serializable locking policy LP Π LP acni-Π LP
Acni-reduction: Proof ideas Start from a ni-execution (rely on the previous, ni-reduction to get there) Create its equivalent completion, if possible Not always possible, e.g., Does not access variables accessed by later threads t 1 :lock(v),t 1 :lock(u),t 2 :lock(u) u v
Implications for statis analysis Pessimistic analysis (over approximate) –Analyze a module from every possible state Semi-optimistic analysis –Analyze a module only from states that occur after a sequence of modules ran one after the other (not to completion) Optimistic analysis (precise) –Analyze a module only from states that occur after a sequence of modules ran to completion (one after the other) Acni-reduction Ni-reduction
Initial analysis results Shape analysis of hand-over-hand lists *Does not verify sortedness of list and fails to verify linearizability in some cases Shape analysis of hand-over-hand trees (for the first time) Our method 3.5s4.0MB TVLA prior596.1s90.3MB Separation logic* 0.4s0.2MB Our method 124.6s90.6MB
Whats next? Extend to shared (read) locks Extend to software transactional memory –aborted transactions –non-locking non-conflict based serializability (e.g., using timestamps) Combine with other reductions [Guerraoui, Henzinger, Jobstmann, Singh]
EXAMPLE II: REQUIRED MEMORY ORDERINGS Work with Guerraoui, Hendler, Kuznetsov, Michael and Vechev (POPL 2011)
Relaxed memory models Out of order execution of memory accesses, to compensate for slow writes Optimize to issue reads before following writes, if they access different locations Reordering may lead to inconsistency
Read-after-write (RAW) Reordering Process P: Write(X,1) Read(Y) Process P: Write(X,1) Read(Y) Process Q: Write(Y,1) Read(X) Process Q: Write(Y,1) Read(X) P Q W(Y,1) R(Y) W(X,1) R(X) W(X,1)
Avoiding out-of-order: Read-after-write (RAW) Fence Process P: Write(X,1) FENCE Read(Y) Process P: Write(X,1) FENCE Read(Y) Process Q: Write(Y,1) FENCE Read(X) Process Q: Write(Y,1) FENCE Read(X) P Q W(Y,1) R(Y) W(X,1) R(X)
Avoiding out-of-order: Atomic Operations Atomic operations: atomic-write-after-read (AWAR) E.g., CAS, TAS, Fetch&Add,… RAW fences / AWAR are ~60 slower than (remote) memory accesses atomic{ read(Y) … write(X,1) } atomic{ read(Y) … write(X,1) }
Our result 34 Any concurrent program in a certain class must use RAW/AWARs
Which programs? Concurrent data types: –queues, counters, hash tables, trees,… –Non-commutative operations –Linearizable solo-terminating implementations Mutual exclusion
Non-commutative operations Operation A is non-commutative if there is operation B where: A influences B and B influences A
Example: Queue enq(v) add v to the end of the queue deq() dequeues item at the head of the queue Q.deq():1;Q.deq():2 Q.deq():2;Q.deq():1 deq() influence each other Q.enq(3):ok;Q.deq():1 Q.deq():1;Q.enq(3):ok enq() is not non-commutative Q Q Q 3 3
Proof Intuition: Writing If an operation does not write, it does not influence anyone It would be commutative 38 no shared write 1 deq do not influence each other 1 deq
Proof Intuition: Read If an operation does not read, it is not influenced by anyone It would be commutative 39 1 deq do not influence each other 1 deq no shared read
40 Proof Intuition: RAW deq 1 1 W no RAW deq11 Linearization
Mutual exclusion (Mutex) Two processes do not hold lock at the same time (Deadlock-freedom) If a process calls Lock() then some process acquires the lock Two Lock() operations influence each other! Every successful lock acquire incurs a RAW/AWAR fence
Who should care? Concurrent programmers: when is it futile to avoid expensive synchronization Hardware designers: motivation to lower cost of specific synchronization constructs API designers: choice of API affects synchronization Verification engineers: declare incorrect when synchronization is missing 42 …although I hope that these shortcomings will be addressed, I hasten to add that they are insignificant compared to the huge step forward that this paper represents…. -- Linux Weekly News, Jan 26, 2011 …although I hope that these shortcomings will be addressed, I hasten to add that they are insignificant compared to the huge step forward that this paper represents…. -- Linux Weekly News, Jan 26, 2011
What else? Weaker operations? E.g., idempotent Work Stealing Tight lower bounds? Other patterns –Read-after-read, write-after-write, barriers
And beyond… The cost of verifying adherence to a locking policy (Semi-) Automatic insertion of lock acquire / release commands or fences
Thank you!