Bernd Fischer RW714: SAT/SMT-Based Bounded Model Checking of Software
Concurrency verification Writing concurrent programs is DIFFICULT programmers have to guarantee –correctness of sequential execution of each individual process –with nondeterministic interferences from other processes (schedules) rare schedules result in errors that are difficult to find, reproduce, and repair –testers can spend weeks chasing a single bug ⇒ huge productivity problem communication mechanism … P2P2 PNPN P2P2 processes
Concurrency verification Spot the errorr... void *threadA(void *arg) { lock(&mutex); x++; if (x == 1) lock(&lock); unlock(&mutex); lock(&mutex); x--; if (x == 0) unlock(&lock); unlock(&mutex); } void *threadB(void *arg) { lock(&mutex); y++; if (y == 1) lock(&lock); unlock(&mutex); lock(&mutex); y--; if (y == 0) unlock(&lock); unlock(&mutex); } (CS1) (CS2) Deadlock
Concurrency verification What happens here...??? int n=0; //shared variable int P(void) { int tmp, i=1; while (i<=10) { tmp = n; n = tmp + 1; i++; } int main (void) id1 = thread_create(P); id2 = thread_create(P); join( id1 ); join( id2 ); assert(n == 20); } Which values can n actually have?
Concurrency verification What happens here...??? int n=0; //shared variable int P(void) { int tmp, i=1; while (i<=10) { tmp = n; n = tmp + 1; i++; } int main (void) id1 = thread_create(P); id2 = thread_create(P); // join( id1 ); // join( id2 ); assert(n >= 10 && n <= 20); }
Concurrency errors There are two main kinds of concurrency errors: progress errors: deadlock, starvation,... –typically caused by wrong synchronization –requires modeling of synchronization primitives mutex locking / unlocking –requires modeling of (global) error condition safety errors: assertion violation,... –typically caused by data races (i.e., unsynchronized access to shared data) –requires modeling of synchronization primitives –can be checked locally ⇒ focus here on safety errors
Shared memory concurrent programs Concurrent programming styles: communication via message passing –“truly” parallel distributed systems –multiple computations advancing simultaneously communication via shared memory –multi-threaded programs –only one thread active at any given time (conceptually), but active thread can be changed at any given time active == uncontested access to shared memory can be single-core or multi-core ⇒ focus here on multi-threaded, shared memory programs
Multi-threaded programs typical C-implementation: pthreads formed of individual sequential programs (threads) –can be created and destroyed on the fly –typically for BMC: assume upper bound –each possibly with loops and recursive function calls –each with local variables each thread can read and write shared variables –assume sequential consistency: writes are immediately visible to all the other programs –weak memory models can be modeled execution is interleaving of thread executions –only valid for sequential consistency
Round-robin scheduling context: segment of a run of an active thread t i context switch: change of active thread from t i to t k –global state is passed on to t k –context switch back to t i resumes at old local state (incl. pc) round: formed of one context of each thread round robin schedule: same order of threads in each round can simulate all schedules by round robin schedules (l 1,s 1 ) t1t1 (l 1,s 3 ) t2t2 (l 2,s 1 ) t3t3 (l 3,s 2 ) (l 4,s 2 ) (l 5,s 3 ) (l 0,s 0 ) (l 3,s 4 )(l 5,s 5 )
Context-bounded analysis Important observation: i.e., require only few context switches ⇒ limit the search space by bounding the number of context switches rounds Most concurrency errors are shallow!
Context-bounded analysis But not all concurrency errors are shallow... int i=1, j=1; void * t1(void* arg) { for(int k=0; k<5; k++) i+=j; pthread_exit(NULL); } void * t2(void* arg) { for(int k=0; k<5; k++) j+=i; pthread_exit(NULL); } int main(int argc, char **argv) { pthread_t id1, id2; pthread_create(&id1, NULL, t1, NULL); pthread_create(&id2, NULL, t2, NULL); assert(i < 144 && j < 144); return 0; }
Concurrency verification approaches Explicit schedule exploration (ESBMC) –lazy exploration –schedule recording Partial order methods (CBMC) Sequentialization –KISS –Lal / Reps (eager sequentialization) –Lazy CSeq –memory unwinding
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); program counter: 0 mutexes: m1=0 m2=0 globals: val1=0 val2=0 locals: t1=0 t2=0 Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program state; (value of program counter and program variables) val1 and val2 should be updated synchronously Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); interleaving #1: 1 program counter: 1 mutexes: m1=1 m2=0 globals: val1=0 val2=0 locals: t1=0 t2=0 Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 2 mutexes: m1=1 m2=0 globals: val1=1 val2=0 locals: t1=0 t2=0 interleaving #1: 1-2 Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 3 mutexes: m1=0 m2=0 globals: val1=1 val2=0 locals: t1=0 t2=0 interleaving #1: Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 7 mutexes: m1=1 m2=0 globals: val1=1 val2=0 locals: t1=0 t2=0 CS1 interleaving #1: 1-2-3–7 Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 8 mutexes: m1=1 m2=0 globals: val1=1 val2=0 locals: t1=0 t2=0 CS1 interleaving #1: 1-2-3–7-8 Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 11 mutexes: m1=1 m2=0 globals: val1=1 val2=0 locals: t1=1 t2=0 CS1 interleaving #1: 1-2-3– Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 12 mutexes: m1=0 m2=0 globals: val1=1 val2=0 locals: t1=1 t2=0 CS1 interleaving #1: 1-2-3– Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 4 mutexes: m1=0 m2=1 globals: val1=1 val2=0 locals: t1=1 t2=0 CS1 CS2 interleaving #1: 1-2-3– –4 Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 5 mutexes: m1=0 m2=1 globals: val1=1 val2=2 locals: t1=1 t2=0 CS1 CS2 interleaving #1: 1-2-3– –4-5 Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 6 mutexes: m1=0 m2=0 globals: val1=1 val2=2 locals: t1=1 t2=0 CS1 CS2 interleaving #1: 1-2-3– –4-5-6 Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 13 mutexes: m1=0 m2=1 globals: val1=1 val2=2 locals: t1=1 t2=0 CS1 CS2 CS3 interleaving #1: 1-2-3– –4-5-6–13 Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 14 mutexes: m1=1 m2=1 globals: val1=1 val2=2 locals: t1=1 t2=2 CS1 CS2 CS3 interleaving #1: 1-2-3– –4-5-6–13-14 Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 15 mutexes: m1=1 m2=0 globals: val1=1 val2=2 locals: t1=1 t2=2 CS1 CS2 CS3 interleaving #1: 1-2-3– –4-5-6– Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 16 mutexes: m1=1 m2=0 globals: val1=1 val2=2 locals: t1=1 t2=2 CS1 CS2 CS3 interleaving #1: 1-2-3– –4-5-6– interleaving completed, so call single-threaded BMC QF formula is unsatisfiable, i.e., assertion holds...so try next interleaving Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); program counter: 0 mutexes: m1=0 m2=0 globals: val1=0 val2=0 locals: t1=0 t2=0 Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); interleaving #2: Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 3 mutexes: m1=0 m2=0 globals: val1=1 val2=0 locals: t1=0 t2=0 interleaving #2: Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 7 mutexes: m1=1 m2=0 globals: val1=1 val2=0 locals: t1=0 t2=0 CS1 interleaving #2: 1-2-3–7 Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 16 mutexes: m1=0 m2=0 globals: val1=1 val2=0 locals: t1=1 t2=0 CS1 interleaving #2: 1-2-3– Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 4 mutexes: m1=0 m2=1 globals: val1=1 val2=0 locals: t1=1 t2=0 CS1 Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); CS2 interleaving #2: 1-2-3– –4 Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 6 mutexes: m1=0 m2=0 globals: val1=1 val2=1 locals: t1=1 t2=0 CS1 Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); CS2 interleaving #2: 1-2-3– –4-5-6 interleaving completed, so call single-threaded BMC (again) QF formula is satisfiable, i.e., assertion fails...so found a bug for a specific interleaving Lazy exploration of interleavings
Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving... combines symbolic model checking: on each individual interleaving explicit state model checking: explore all interleavings Lazy exploration of interleavings
Lazy exploration of interleavings – Reachability Tree execution paths 0 : t main,0, val1=0, val2=0, m1=0, m2=0,… 1 : t twoStage,1, val1=0, val2=0, m1=1, m2=0,… 2 : t twoStage,2, val1=1, val2=0, m1=1, m2=0,… initial state global and local variables active thread, context bound CS1 expansion rules in paper CS2 interleaving completed, so call single-threaded BMC
execution paths blocked execution paths (eliminated) 0 : t main,0, val1=0, val2=0, m1=0, m2=0,… 1 : t twoStage,1, val1=0, val2=0, m1=1, m2=0,… 2 : t twoStage,2, val1=1, val2=0, m1=1, m2=0,… 3 : t reader,2, val1=0, val2=0, m1=1, m2=0,… initial state global and local variables active thread, context bound CS1 CS2 backtrack to last unexpanded node and continue symbolic execution can statically determine that path is blocked (encoded in instrumented mutex-op) Lazy exploration of interleavings – Reachability Tree
execution paths blocked execution paths (eliminated) 0 : t main,0, val1=0, val2=0, m1=0, m2=0,… 1 : t twoStage,1, val1=0, val2=0, m1=1, m2=0,… 4 : t reader,1, val1=0, val2=0, m1=1, m2=0,… 2 : t twoStage,2, val1=1, val2=0, m1=1, m2=0,… 3 : t reader,2, val1=0, val2=0, m1=1, m2=0,… 5 : t twoStage,2, val1=0, val2=0, m1=1, m2=0,… 6 : t reader,2, val1=0, val2=0, m1=1, m2=0,… initial state global and local variables active thread, context bound CS1 CS2 Lazy exploration of interleavings – Reachability Tree
use a reachability tree (RT) to describe reachable states of a multi-threaded program each node in the RT is a tuple for a given time step i, where: –A i represents the currently active thread –C i represents the context switch number –s i represents the current state – represents the current location of thread j – represents the control flow guards accumulated in thread j along the path from to expand the RT by executing symbolically each instruction of the multi-threaded program Exploring the Reachability Tree
R1 (assign): If I is an assignment, we execute I, which generates s i+1. We add as child to a new node ’ we have fully expanded if I is within an atomic block; or I contains no global variable; or the upper bound of context switches (C i = C) is reached if is not fully expanded, for each thread j A i where is enabled in s i+1, we thus create a new child node Expansion Rules of the RT
R2 (skip): If I is a skip-statement, we increment the location of the current thread and continue with it. We explore no context switches: R3 (unconditional goto): If I is an unconditional goto- statement with target l, we set the location of the current thread and continue with it. We explore no context switches: Expansion Rules of the RT
R4 (conditional goto): If I is a conditional goto-statement with test c and target l, we create two child nodes ’ and ’’. for ’, we assume that c is true and proceed with the target instruction of the jump: for ’’, we add c to the guards and continue with the next instruction in the current thread prune one of the nodes if the condition is determined statically Expansion Rules of the RT
R5 (assume): If I is an assume-statement with argument c, we proceed similar to R1. we continue with the unchanged state s i but add c to all guards, as described in R4 If evaluates to false, we prune the execution path R6 (assert): If I is an assert-statement with argument c, we proceed similar to R1. we continue with the unchanged state s i but add c to all guards, as described in R4 we generate a verification condition to check the validity of c Expansion Rules of the RT
R5 (start_thread): If I is a start_thread instruction, we add the indicated thread to the set of active threads: where is the initial location of the thread and the thread starts with the guards of the currently active thread R6 (join_thread): If I is a join_thread instruction with argument Id, we add a child node: where only if the joining thread Id has exited Expansion Rules of the RT
C/C++ source verification conditions SMT solver reused/extended from the CBMC model checker properties BMC IRep tree scan, parse, and type-check deadlock, atomicity and order violations, etc… multi-threaded goto programs symbolic execution engine QF formula generation check satisfiability using an SMT solver System Architecture
C/C++ source scan, parse, and type-check verification conditions SMT solver deadlock, atomicity and order violations, etc… guide the symbolic execution QF formula generation check satisfiability using an SMT solver stop the generate-and-test loop if there is an error scheduler multi-threaded goto programs properties IRep tree BMC symbolic execution engine reused/extended from the CBMC model checker System Architecture
Lazy approach is naïve but useful bugs usually manifest in few context switches [Qadeer&Rehof’05] bound the number of context switches allowed per thread –number of executions: O(n c ) exploit which transitions are enabled in a given state –reduces number of executions keep in memory the parent nodes of all unexplored paths only each formula corresponds to one possible interleaving only, its size is relatively small... but can suffer performance degradation: in particular for correct programs where we need to invoke the SMT solver once for each possible execution path
explore reachability tree in same way as lazy approach... but call SMT solver only once add a schedule guard ts i for each context switch block i (0 < ts i #threads) –record in which order the scheduler has executed the program –SMT solver determines the order in which threads are simulated add scheduler guards only to effective statements (assignments and assertions) –record effective context switches (ECS) –ECS block: sequence of program statements that are executed with no intervening ECS Idea: systematically encode all possible interleavings into one formula Schedule Recording
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: reader-ECS: ECS block Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: 1 twoStage-ECS: (1,1) reader-ECS: ts 1 == 1 guarded statement can only be executed if statement 1 is scheduled in ECS block 1 each program statement is then prefixed by a schedule guard ts i = j, where: i is the ECS block number j is the thread identifier Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: 1-2 twoStage-ECS: (1,1)-(2,2) reader-ECS: ts 1 == 1 ts 2 == 1 Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3) reader-ECS: ts 1 == 1 ts 3 == 1 ts 2 == 1 Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3) reader-ECS: (7,4) CS ts 4 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3) reader-ECS: (7,4)-(8,5) CS ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3) reader-ECS: (7,4)-(8,5)-(11,6) CS ts 4 == 2 ts 6 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7) CS ts 7 == 2 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 6 == 2 Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7) ts 8 == 1 CS ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 7 == 2 ts 6 == 2 Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8)-(5,9) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7) CS ts 8 == 1 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 9 == 1 ts 7 == 2 ts 6 == 2 Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8)-(5,9)-(6,10) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7) ts 10 == 1 CS ts 8 == 1 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 9 == 1 ts 7 == 2 ts 6 == 2 Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8)-(5,9)-(6,10) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7)-(13,11) CS ts 9 == 1 ts 8 == 1 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 10 == 1 ts 11 == 2 ts 7 == 2 ts 6 == 2 CS Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8)-(5,9)-(6,10) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7)-(13,11)-(14,12) ts 12 == 2 CS ts 9 == 1 ts 8 == 1 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 10 == 1 CS ts 11 == 2 ts 7 == 2 ts 6 == 2 Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8)-(5,9)-(6,10) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7)-(13,11)-(14,12)-(15,13) ts 13 == 2 CS ts 9 == 1 ts 8 == 1 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 10 == 1 CS ts 12 == 2 ts 11 == 2 ts 7 == 2 ts 6 == 2 Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8)-(5,9)-(6,10) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7)-(13,11)-(14,12)-(15,13)-(16,14) ts 14 == 2 CS ts 9 == 1 ts 8 == 1 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 10 == 1 CS ts 13 == 2 ts 12 == 2 ts 11 == 2 ts 7 == 2 ts 6 == 2 interleaving completed, so build constraints for interleaving (but do not call SMT solver) Schedule Recording – Interleaving #1
Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,3)-(3,4)-(4,12)-(5,13)-(6,14) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7)-(13,8)-(14,9)-(15,10)-(16,11) CS ts 11 == 2 ts 10 == 2 ts 9 == 2 ts 13 == 1 ts 12 == 1 ts 7 == 2 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 6 == 2 ts 14 == 1 ts 8 == 2 Schedule Recording – Interleaving #2
twoStage, reader ts 1 ==1 lock(m1) twoStage, reader ts 1 ==2 lock(m1) twoStage, reader ts 1 ==1 ts 2 ==1 val1=1 twoStage, reader ts 1 ==1 ts 2 ==2 lock(m1) twoStage, reader ts 1 ==2 ts 2 ==1 lock(m1) twoStage, reader ts 1 ==2 ts2==2 unlock(m1) CS1 CS2 SMT solver instantiates ts to evaluate all possible interleavings If the guard of the parent node is false then the guard of the child node is false as well thread identifiers program statement Schedule Recording: Execution Paths
systematically explore the thread interleavings as before, but: –add schedule guards to record in which order the scheduler has executed the program –encode all execution paths into one formula bound the number of context switches exploit which transitions are enabled in a given state number of threads and context switches grows very large quickly, and easily “blow-up” the solver: there is a clear trade-off between usage of time and memory resources Observations about the schedule recoding approach
Sequentialization Observation: Building verification tools for full-fledged concurrent languages is difficult and expensive... … but scalable verification techniques exist for sequential languages –Abstraction techniques –SAT/SMT techniques (i.e., bounded model checking) ⇒ How can we leverage these?
Sequentialization ⇒ How can we leverage these? Sequentialization: replace control non-determinism by data non-determinism P' simulates all computations (within certain bounds) of P source-to-source transformation: T 1 ∥ T 2 ↝ T ̕ 1 ; T ̕ 2 ⇒ reuse existing tools (largely) unchanged ⇒ easy to target multiple back-ends ⇒ easy to experiment with different approaches convert concurrent programs into sequential programs such that reachability is preserved
KISS: Keep It Simple and Sequential [Quadeer-Wu, PLDI’04] Under-approximation (subset of interleavings) Thread creation → function call –at context-switches either: the active thread is terminated or a not yet scheduled thread is started (by calling its main function) –when a thread is terminated either: the thread that has called it is resumed (if any) or a not yet scheduled thread is started A first sequentialization: KISS
(l 1,s 1 ) T1T1 (l 1,s 3 ) T2T2 (l 2,s 1 ) T3T3 (l 3,s 2 ) (l 4,s 2 ) (l 5,s 3 ) Schedule 1: 1. Start T 1 2. Start T 2 3. Terminate T 2 4. start T 3 5. terminate T 3 6. Resume T 1 T1T1 T2T2 T3T3 Schedule 2: 1. start T 1 2. start T 2 3. start T 3 4. terminate T 3 5. resume T 2 6. terminate T 2 7. resume T 1 T1T1 T2T2 T3T3 Schedule 3: 1. start T 1 2. start T 2 3. terminate T 2 4. resume T 1 5. start T 3 6. terminate T 3 7. resume T 1 KISS schedules
LR sequentialization first sequentialization of bounded number of context-switches (but finite number of threads) [Lal-Reps, CAV’08]
LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2, …, a k 2.Execute T 1 to completion Computes -local states l 1,.., l k, and -global states b 1, …, b k (l 1,b 1 ) T1T1 (l 1,a 2 ) (l 2,b 2 ) (l 2,a 3 ) (l 3,b 3 ) T2T2 T3T3
LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 (l 1,b 1 ) T1T1 (l 1,a 2 ) (l 2,b 2 ) (l 2,a 3 ) (l 3,b 3 ) T2T2 T3T3 b1b1 b2b2 b3b3
LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 We can dismiss locals of T 1 T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3
LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 4.Execute T 2 to completion Dismiss locals of T 2 Dismiss b 1,…,b k T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3 (l 1 ’,c 1 ) c3c3 (l 1 ’,b 2 ) (l 0 ’,b 1 ) (l 2 ’,c 2 ) (l 2 ’,b 3 )
LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 4.Execute T 2 to completion T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3 c1c1 c2c2 c3c3
LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 4.Execute T 2 to completion 5.Pass c 1,…,c k to T 3 6.Execute T 3 to completion Dismiss locals of T 3 Dismiss c 1,…,c k T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3 c1c1 c2c2 c3c3 d3d3 (l 0 ’’, c 1 ) (l 1 ’,d 1 ) (l 1 ’,c 2 ) (l 1 ’,d 2 ) (l 1 ’,c 3 )
LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 4.Execute T 2 to completion 5.Pass c 1,…,c k to T 3 6.Execute T 3 to completion T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3 c1c1 c2c2 c3c3 d1d1 d2d2 d3d3
LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 4.Execute T 2 to completion 5.Pass c 1,…,c k to T 3 6.Execute T 3 to completion 7.Computation iff d i = a i+1 i [1,k-1] T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3 c1c1 c2c2 c3c3 d1d1 d2d2 d3d3
LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 4.Execute T 2 to completion 5.Pass c 1,…,c k to T 3 6.Execute T 3 to completion 7.Computation iff d i = a i+1 i [1,k-1] 8.Report an error if a bug occurs during the simulation T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3 c1c1 c2c2 c3c3 d1d1 d2d2 d3d3
LR sequentialization - implementation considers only round-robin schedules with k rounds T0T0 T1T1 TnTn...
LR sequentialization - implementation considers only round-robin schedules with k rounds –thread → function, run to completion T0T0 T1T1 TnTn...
LR sequentialization - implementation considers only round-robin schedules with k rounds –thread → function, run to completion global memory copy for each round –scalar → array T0T0 S 2,0 S 0,0 S 1,0 S k,0 T1T1 S 2,1 S 0,1 S 1,1 S k,1 TnTn S 2,n S 0,n S 1,n S k,n...
LR sequentialization - implementation considers only round-robin schedules with k rounds –thread → function, run to completion global memory copy for each round –scalar → array context switch → round counter++ T0T0 S 2,0 S 0,0 S 1,0 S k,0 T1T1 S 2,1 S 0,1 S 1,1 S k,1 TnTn S 2,n S 0,n S 1,n S k,n...
LR sequentialization - implementation considers only round-robin schedules with k rounds –thread → function, run to completion global memory copy for each round –scalar → array context switch → round counter++ first thread starts with non-deterministic memory contents –other threads continue with content left by predecessor T0T0 T1T1 S 2,1 S 0,1 S 1,1 S k,1 TnTn... S 2,0 S 0,0 S 1,0 S k,0 S 2,n S 0,n S 1,n S k,n
LR sequentialization - implementation considers only round-robin schedules with k rounds –thread → function, run to completion global memory copy for each round –scalar → array context switch → round counter++ first thread starts with non-deterministic memory contents –other threads continue with content left by predecessor checker prunes away inconsistent simulations –assume( S k+1,0 == S k,n ); –requires second set of memory copies –errors can only be checked at end of simulation requires explicit error checks T0T0 T1T1 S 2,1 S 0,1 S 1,1 S k,1 TnTn... S 2,0 S 0,0 S 1,0 S k,0 S 2,n S 0,n S 1,n S k,n
LR sequentialization - implementation //shared vars type g1 g1; type g2 g2; … //thread functions t(){ type x1 x1; type x2 x2; … stmt 1 ; stmt 2 ; … } … main(){ … } //shared vars type g1 g1[K]; type g2 g2[K]; … uint round=0; bool ret=0; //aux vars // context-switch simulation cs() { unsigned int j; j= nondet(); assume(round +j < K); round+=j; if (round==K-1 && nondet()) ret=1; } //thread functions t(){ type x1 x1; type x2 x2; … cs(); if (ret) return; stmt 1 [round]; cs(); if (ret) return; stmt 2 [round]; … } … main_thread(){ … } main(){ … } //next slide
LR sequentialization - implementation main(){ type g1 _g1[K]; type g2 _g2[K]; … // first thread starts with non-deterministic memory contents for (i=1; i<K; i++){ _g1[i] = g1[i] = nondet(); _g2[i] = g2[i] = nondet(); … } // thread simulations t[0] = main_thread; round_born[0] = 0; is_created[0] = 1; for (i=0; i<N; i++){ if(is_created[i]){ ret=0; round = round_born[i]; t[i](); } } // consistency check for (i=0; i<K-1; i++){ assume(_g1[i+1] == g1[i]); assume(_g2[i+1] == g2[i]); … } // error detection assert(err == 0); }
LR sequentialization - implementations Corral (SMT-based analysis for Boogie programs –[ Lal–Qadeer–Lahiri, CAV’12 ] –[ Lal–Qadeer, FSE’14 ] CSeq (code-to-code translation for C + pthreads) –[ Fischer–Inverso–Parlato, ASE’13 ] Rek (for Real-time Embedded Software Systems) –[ Chaki–Gurfinkel–Strichman, FMCAD’11 ] Storm: implementation for C programs –[ Lahiri–Qadeer–Rakamaric, CAV’09 ] –[Rakamaric, ICSE’10]
LR sequentialization is “eager” T1T1 a2a2 T2T2 T3T3 b1b1 a3a3 b2b2 a 2 is guessed a 2 may be unreachable EAGER [Lal, Reps CAV’08]
LR sequentialization does not preserve assertions void thread1() { while (blocked); x = x/y; if (x%2==1) ERROR; } void thread2() { x=12; y=2; //unblock thread2 blocked=false; } // shared variables bool blocked=true; int x=0, y=0; Inv: y != 0 blocked=true blocked=false guess x=13 y=0
A lazy transformation is desirable A lazy sequential program explores only reachable states of the concurrent program Why is it desirable? In model-checking it can drastically reduce the explored state-space Better invariants for deductive verification / abstract interpretation We now illustrate a lazy transformation: [La Torre-Madhusudan-Parlato, CAV’09]
Lazy transformation: main idea Execute T 1 Context-switch: store s 1 and abort Execute T 2 from s 1 store s 2 and abort (l 1,s 1 ) (l’ 1,s 1 ) (l’ 2,s 2 ) T1T1 (l 0,s 0 ) T2T2 store s 1 & abort store s 2 & abort
Lazy transformation: main idea Re-execute T 1 till it reaches s 1 May reach a new local state! Anyway it is correct !! Restart from global s 2 and compute s 3 (l 1,s 1 ) (l’ 1,s 1 ) (l’ 2,s 2 ) T1T1 (l 0,s 0 ) T2T2 store s 1 & abort store s 2 & abort (l’’ 1,s 1 ) store s 3 & abort (l’’ 1,s 2 )
Lazy transformation: main idea Switch to T 2 Execute till it reaches s 2 Continue computation from global s 3 (l 1,s 1 ) (l’ 1,s 1 ) (l’ 2,s 2 ) T1T1 (l 0,s 0 ) T2T2 store s 1 & abort store s 2 & abort (l’’ 1,s 1 ) store s 3 & abort (l’’’ 1,s 2 ) (l’’ 1,s 2 ) (l’’’ 1,s 3 )
Lazy transformation: main idea T1T1 T2T2 store s 1 store s 2 store s 3 store s 4 store s 5 end s1s1 s2s2 s3s3 s4s4 s1s1 s2s2 s3s3 s4s4 s5s5
Lazy transformation: features Explores only reachable states Preserves invariants across the translation Tracks local state of one thread at any time Tracks values of shared variables at context switches (s 1, s 2, …, s k ) Requires recomputation of local states
Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } main driver a global pc for each thread thread locals thread global
Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } main driver for each round for each thread T i simulate T i
Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc k ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;... E XE.. M: CS(M); stmt M; main driver F i ()
Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc i ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;... E XE.. M: CS(M); stmt M; main driver F i ()... context-switch resume mechanism
Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc i ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i ()... Context-switch simulation: #define CS(j) if (*) { pc i =j; return; } Context-switch simulation: #define CS(j) if (*) { pc i =j; return; }
Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc i ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i ()...
Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0; pc 1 =0;... pc N =0; local 0 ; local 1 ;... local k ; main() { for (r=0; r<R; r++) for (k=0; k<N; k++) // simulate T k F k (); } switch(pc i ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver... Formula encoding: goto statement to formula add a guard for each crossing control-flow edge = O(M 2 ) guards Formula encoding: goto statement to formula add a guard for each crossing control-flow edge = O(M 2 ) guards
pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc i ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i ()... CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION
pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc k ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i ()... CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION
pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc k ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i ()... CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION
switch(pc k ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i ()... CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION #define CS(j) NEW if (j =nextCS) goto j+1; #define CS(j) NEW if (j =nextCS) goto j+1; pc 0 =0;... pc N =0; local 0 ;... local k ; nextCS; main() for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) nextCS = nondet; assume(nextCS>=pc i ) F i (); pc i = nextCS;
switch(pc k ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i () CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION #define CS(j) NEW if (j =nextCS) goto j+1; #define CS(j) NEW if (j =nextCS) goto j+1; pc 0 =0;... pc N =0; local 0 ;... local k ; nextCS; main() for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) nextCS = nondet; assume(nextCS>=pc i ) F i (); pc i = nextCS;
switch(pc k ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i () CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION #define CS(j) NEW if (j =nextCS) goto j+1; #define CS(j) NEW if (j =nextCS) goto j+1;... pc 0 =0;... pc N =0; local 0 ;... local k ; nextCS; main() for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) nextCS = nondet; assume(nextCS>=pc i ) F i (); pc i = nextCS;
resuming + context-switch 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2; EXECUTE M: CS(M); stmt M; nextCS... skip pc i... skip #define CS(j) NEW if (j =nextCS) goto j+1; #define CS(j) NEW if (j =nextCS) goto j+1; CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION
Individual Threads Sequentialization 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2; EXECUTE M: CS(M); stmt M;... Formula encoding: goto statement to formula add a guard for each crossing control-flow edge = O(M) guards Formula encoding: goto statement to formula add a guard for each crossing control-flow edge = O(M) guards
Individual Threads Sequentialization 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2; EXECUTE M: CS(M); stmt M;... #define CS(j) if (j =next_CS) goto pc+1; #define CS(j) if (j =next_CS) goto pc+1; inject light-weight, non- invasive control code no non-determinism no assignments no return inject light-weight, non- invasive control code no non-determinism no assignments no return
Sequentialization via Memory Unwinding Basic Idea: Mu-CSeq Approach: T ₁ ∥ T ₂ ↝ M ; T ̕₁ ; C ₁ ; T ̕₂ ; C ₂ –M makes consistent shared memory guess –checkers C ensure that threads T’ used memory consistently convert concurrent programs into equivalent sequential programs
Memory Unwinding xy... z “memory”
Memory Unwinding xy... z “memory” Guess and store sequence of individual write operations:
Memory Unwinding xy... z N “memory” pos Guess and store sequence of individual write operations:
Memory Unwinding xy... z N “memory” pos Guess and store sequence of individual write operations: add N copies of shared variables (“memory”) _memory[i,v] is value of v-th variable after i-th write
Memory Unwinding xy... z 1x 1y 2y 1z 2x var N thr “memory”“writes” pos Guess and store sequence of individual write operations: add N copies of shared variables (“memory”) _memory[i,v] is value of v-th variable after i-th write add array to record writes (“wirtes”) i-th write is by _thr[i], which has written to _var[i]
Memory Unwinding xy... z 1x 1y 2y 1z 2x var... next[0] N thr “memory”“writes” “barrier” pos Guess and store sequence of individual write operations: add N copies of shared variables (“memory”) _memory[i,v] is value of v-th variable after i-th write add array to record writes (“writes”) i-th write is by _thr[i], which has written to _var[i] add array to record validity of writes (“barrier”) next write (after i) by thread t is at _next[i,t] N next[K]... N+1 N 3 3 N... N+1
Basic Idea: Simulation simulate all executions compatible with guessed memory unwinding
Basic Idea: uses auxiliary variables – pos (current index into unwound memory) – thread (id of currently simulated thread) – end (guessed index of last write into unwound memory) Simulation simulate all executions compatible with guessed memory unwinding
Basic Idea: uses auxiliary variables – pos (current index into unwound memory) – thread (id of currently simulated thread) – end (guessed index of last write into unwound memory) every thread is translated into a function –simulation starts from main thread Simulation simulate all executions compatible with guessed memory unwinding
Basic Idea: uses auxiliary variables – pos (current index into unwound memory) – thread (id of currently simulated thread) – end (guessed index of last write into unwound memory) every thread is translated into a function –simulation starts from main thread each thread creation interrupts simulation of current thread –new thread is called with current value of pos Simulation simulate all executions compatible with guessed memory unwinding
Basic Idea: uses auxiliary variables – pos (current index into unwound memory) – thread (id of currently simulated thread) – end (guessed index of last write into unwound memory) every thread is translated into a function –simulation starts from main thread each thread creation interrupts simulation of current thread –new thread is called with current value of pos terminated threads must have completed their memory writes –pos > end Simulation simulate all executions compatible with guessed memory unwinding
Basic Idea: uses auxiliary variables – pos (current index into unwound memory) – thread (id of currently simulated thread) – end (guessed index of last write into unwound memory) every thread is translated into a function –simulation starts from main thread each thread creation interrupts simulation of current thread –new thread is called with current value of pos terminated threads must have completed their memory writes –pos > end old thread resumed with old memory position Simulation simulate all executions compatible with guessed memory unwinding
Basic Idea: every read / write is translated into a function –simulates valid access into unwound memory similar simulations for pthread functions Simulation simulate all executions compatible with guessed memory unwinding
Simulating reads and writes xy... z 1x 1y 2y 1z 2x var N thr int read(uint x) { uint jmp=*; assume(jmp>=pos && jmp<next[pos,thread]); pos=jmp; if (pos>end) return; return(_memory[pos, x]); } “memory”“writes” pos next[0] “barrier” N next[K]... N+1 N 3 3 N... N+1
Simulating reads and writes xy... z 1x 1y 2y 1z 2x var N thr int write(uint x, int val) { pos=next[pos,thread]; if(pos>end) return; assume( _var[pos]==x && _memory[pos,x]==val); } “memory”“writes” pos next[0] “barrier” N next[K]... N+1 N 3 3 N... N+1