Bernd Fischer RW714: SAT/SMT-Based Bounded Model Checking of Software.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Ermenegildo Tomasco University of Southampton, UK Omar Inverso University of Southampton, UK Bernd Fischer Stellenbosch University, South Africa Salvatore.
Jeremy Morse, Mikhail Ramalho, Lucas Cordeiro, Denis Nicole, Bernd Fischer ESBMC 1.22 (Competition Contribution)
Semantics Static semantics Dynamic semantics attribute grammars
CS 267: Automated Verification Lecture 8: Automata Theoretic Model Checking Instructor: Tevfik Bultan.
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Reducing Context-bounded Concurrent Reachability to Sequential Reachability Gennaro Parlato University of Illinois at Urbana-Champaign Salvatore La Torre.
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
1 Spin Model Checker Samaneh Navabpour Electrical and Computer Engineering Department University of Waterloo SE-464 Summer 2011.
The Software Model Checker BLAST by Dirk Beyer, Thomas A. Henzinger, Ranjit Jhala and Rupak Majumdar Presented by Yunho Kim Provable Software Lab, KAIST.
/ PSWLAB Atomizer: A Dynamic Atomicity Checker For Multithreaded Programs By Cormac Flanagan, Stephen N. Freund 24 th April, 2008 Hong,Shin.
On Sequentializing Concurrent Programs Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA, University of Paris 7, France Michael Emmi LIAFA, University.
Threading Part 2 CS221 – 4/22/09. Where We Left Off Simple Threads Program: – Start a worker thread from the Main thread – Worker thread prints messages.
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Verifying Concurrent Programs by Memory Unwinding Ermenegildo Tomasco University of Southampton, UK Omar Inverso University of Southampton, UK Bernd Fischer.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
Model Checking Embedded Systems
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Scope-bounded Multistack Pushdown Systems: - fixed-point - sequentialization - tree-width 1 Salvatore La Torre Gennaro Parlato (U. Salerno, Italy) (U.
On Sequentializing Concurrent Programs Gennaro Parlato University of Southampton, UK UPMARC 7 th Summer School on Multicore Computing, June 8-10, 2015.
CS6133 Software Specification and Verification
CS 363 Comparative Programming Languages Semantics.
CS333 Intro to Operating Systems Jonathan Walpole.
On Sequentializing Concurrent Programs (Bounded Model Checking) Gennaro Parlato University of Southampton, UK UPMARC 7 th Summer School on Multicore Computing,
CIS 842: Specification and Verification of Reactive Systems Lecture INTRO-Examples: Simple BIR-Lite Examples Copyright 2004, Matt Dwyer, John Hatcliff,
Compositionality Entails Sequentializability Pranav Garg, P. Madhusudan University of Illinois at Urbana-Champaign.
Getting Rid of Store-Buffers in TSO Analysis Mohamed Faouzi Atig Uppsala University, Sweden Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA,
Bounded Model Checking of Multi-Threaded C Programs via Lazy Sequentialization Omar Inverso University of Southampton, UK Ermenegildo Tomasco University.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Software Systems Verification and Validation Laboratory Assignment 4 Model checking Assignment date: Lab 4 Delivery date: Lab 4, 5.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
Agenda  Quick Review  Finish Introduction  Java Threads.
Bernd Fischer RW714: SAT/SMT-Based Bounded Model Checking of Software.
On Sequentializing Concurrent Programs
Recursion Topic 5.
Lazy Sequentialization via Shared Memory Abstractions
Verification for Concurrent Programs
Formal methods: Lecture
SS 2017 Software Verification Bounded Model Checking, Outlook
Background on the need for Synchronization
Sequentializing Parameterized Programs
Threads Cannot Be Implemented As a Library
Atomic Operations in Hardware
CS399 New Beginnings Jonathan Walpole.
Multithreading Tutorial
Atomic Operations in Hardware
SS 2017 Software Verification Software Model Checking 2 - Parallelism
Ermenegildo Tomasco1, Truc L
Chapter 5: Process Synchronization
Specifying Multithreaded Java semantics for Program Verification
Threads and Memory Models Hal Perkins Autumn 2011
Chapter 26 Concurrency and Thread
Chapter 6 Intermediate-Code Generation
Implementing synchronization
Over-Approximating Boolean Programs with Unbounded Thread Creation
Multithreading Tutorial
Threads and Memory Models Hal Perkins Autumn 2009
Lazy Sequentialization Unbounded Concurrent Programs
Code Optimization Overview and Examples Control Flow Graph
Multithreading Tutorial
Data Flow Analysis Compiler Design
Multithreading Tutorial
A Refinement Calculus for Promela
CS510 Operating System Foundations
CSE 153 Design of Operating Systems Winter 19
CUTE: A Concolic Unit Testing Engine for C
Chapter 6: Synchronization Tools
Foundations and Definitions
Predicate Abstraction
Presentation transcript:

Bernd Fischer RW714: SAT/SMT-Based Bounded Model Checking of Software

Concurrency verification Writing concurrent programs is DIFFICULT programmers have to guarantee –correctness of sequential execution of each individual process –with nondeterministic interferences from other processes (schedules) rare schedules result in errors that are difficult to find, reproduce, and repair –testers can spend weeks chasing a single bug ⇒ huge productivity problem communication mechanism … P2P2 PNPN P2P2 processes

Concurrency verification Spot the errorr... void *threadA(void *arg) { lock(&mutex); x++; if (x == 1) lock(&lock); unlock(&mutex); lock(&mutex); x--; if (x == 0) unlock(&lock); unlock(&mutex); } void *threadB(void *arg) { lock(&mutex); y++; if (y == 1) lock(&lock); unlock(&mutex); lock(&mutex); y--; if (y == 0) unlock(&lock); unlock(&mutex); } (CS1) (CS2) Deadlock

Concurrency verification What happens here...??? int n=0; //shared variable int P(void) { int tmp, i=1; while (i<=10) { tmp = n; n = tmp + 1; i++; } int main (void) id1 = thread_create(P); id2 = thread_create(P); join( id1 ); join( id2 ); assert(n == 20); } Which values can n actually have?

Concurrency verification What happens here...??? int n=0; //shared variable int P(void) { int tmp, i=1; while (i<=10) { tmp = n; n = tmp + 1; i++; } int main (void) id1 = thread_create(P); id2 = thread_create(P); // join( id1 ); // join( id2 ); assert(n >= 10 && n <= 20); }

Concurrency errors There are two main kinds of concurrency errors: progress errors: deadlock, starvation,... –typically caused by wrong synchronization –requires modeling of synchronization primitives  mutex locking / unlocking –requires modeling of (global) error condition safety errors: assertion violation,... –typically caused by data races (i.e., unsynchronized access to shared data) –requires modeling of synchronization primitives –can be checked locally ⇒ focus here on safety errors

Shared memory concurrent programs Concurrent programming styles: communication via message passing –“truly” parallel distributed systems –multiple computations advancing simultaneously communication via shared memory –multi-threaded programs –only one thread active at any given time (conceptually), but active thread can be changed at any given time  active == uncontested access to shared memory  can be single-core or multi-core ⇒ focus here on multi-threaded, shared memory programs

Multi-threaded programs typical C-implementation: pthreads formed of individual sequential programs (threads) –can be created and destroyed on the fly –typically for BMC: assume upper bound –each possibly with loops and recursive function calls –each with local variables each thread can read and write shared variables –assume sequential consistency: writes are immediately visible to all the other programs –weak memory models can be modeled execution is interleaving of thread executions –only valid for sequential consistency

Round-robin scheduling context: segment of a run of an active thread t i context switch: change of active thread from t i to t k –global state is passed on to t k –context switch back to t i resumes at old local state (incl. pc) round: formed of one context of each thread round robin schedule: same order of threads in each round can simulate all schedules by round robin schedules (l 1,s 1 ) t1t1 (l 1,s 3 ) t2t2 (l 2,s 1 ) t3t3 (l 3,s 2 ) (l 4,s 2 ) (l 5,s 3 ) (l 0,s 0 ) (l 3,s 4 )(l 5,s 5 )

Context-bounded analysis Important observation: i.e., require only few context switches ⇒ limit the search space by bounding the number of context switches rounds Most concurrency errors are shallow!

Context-bounded analysis But not all concurrency errors are shallow... int i=1, j=1; void * t1(void* arg) { for(int k=0; k<5; k++) i+=j; pthread_exit(NULL); } void * t2(void* arg) { for(int k=0; k<5; k++) j+=i; pthread_exit(NULL); } int main(int argc, char **argv) { pthread_t id1, id2; pthread_create(&id1, NULL, t1, NULL); pthread_create(&id2, NULL, t2, NULL); assert(i < 144 && j < 144); return 0; }

Concurrency verification approaches Explicit schedule exploration (ESBMC) –lazy exploration –schedule recording Partial order methods (CBMC) Sequentialization –KISS –Lal / Reps (eager sequentialization) –Lazy CSeq –memory unwinding

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); program counter: 0 mutexes: m1=0 m2=0 globals: val1=0 val2=0 locals: t1=0 t2=0 Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program state; (value of program counter and program variables) val1 and val2 should be updated synchronously Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); interleaving #1: 1 program counter: 1 mutexes: m1=1 m2=0 globals: val1=0 val2=0 locals: t1=0 t2=0 Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 2 mutexes: m1=1 m2=0 globals: val1=1 val2=0 locals: t1=0 t2=0 interleaving #1: 1-2 Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 3 mutexes: m1=0 m2=0 globals: val1=1 val2=0 locals: t1=0 t2=0 interleaving #1: Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 7 mutexes: m1=1 m2=0 globals: val1=1 val2=0 locals: t1=0 t2=0 CS1 interleaving #1: 1-2-3–7 Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 8 mutexes: m1=1 m2=0 globals: val1=1 val2=0 locals: t1=0 t2=0 CS1 interleaving #1: 1-2-3–7-8 Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 11 mutexes: m1=1 m2=0 globals: val1=1 val2=0 locals: t1=1 t2=0 CS1 interleaving #1: 1-2-3– Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 12 mutexes: m1=0 m2=0 globals: val1=1 val2=0 locals: t1=1 t2=0 CS1 interleaving #1: 1-2-3– Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 4 mutexes: m1=0 m2=1 globals: val1=1 val2=0 locals: t1=1 t2=0 CS1 CS2 interleaving #1: 1-2-3– –4 Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 5 mutexes: m1=0 m2=1 globals: val1=1 val2=2 locals: t1=1 t2=0 CS1 CS2 interleaving #1: 1-2-3– –4-5 Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 6 mutexes: m1=0 m2=0 globals: val1=1 val2=2 locals: t1=1 t2=0 CS1 CS2 interleaving #1: 1-2-3– –4-5-6 Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 13 mutexes: m1=0 m2=1 globals: val1=1 val2=2 locals: t1=1 t2=0 CS1 CS2 CS3 interleaving #1: 1-2-3– –4-5-6–13 Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 14 mutexes: m1=1 m2=1 globals: val1=1 val2=2 locals: t1=1 t2=2 CS1 CS2 CS3 interleaving #1: 1-2-3– –4-5-6–13-14 Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 15 mutexes: m1=1 m2=0 globals: val1=1 val2=2 locals: t1=1 t2=2 CS1 CS2 CS3 interleaving #1: 1-2-3– –4-5-6– Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 16 mutexes: m1=1 m2=0 globals: val1=1 val2=2 locals: t1=1 t2=2 CS1 CS2 CS3 interleaving #1: 1-2-3– –4-5-6– interleaving completed, so call single-threaded BMC QF formula is unsatisfiable, i.e., assertion holds...so try next interleaving Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); program counter: 0 mutexes: m1=0 m2=0 globals: val1=0 val2=0 locals: t1=0 t2=0 Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); interleaving #2: Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 3 mutexes: m1=0 m2=0 globals: val1=1 val2=0 locals: t1=0 t2=0 interleaving #2: Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 7 mutexes: m1=1 m2=0 globals: val1=1 val2=0 locals: t1=0 t2=0 CS1 interleaving #2: 1-2-3–7 Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 16 mutexes: m1=0 m2=0 globals: val1=1 val2=0 locals: t1=1 t2=0 CS1 interleaving #2: 1-2-3– Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 4 mutexes: m1=0 m2=1 globals: val1=1 val2=0 locals: t1=1 t2=0 CS1 Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); CS2 interleaving #2: 1-2-3– –4 Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); program counter: 6 mutexes: m1=0 m2=0 globals: val1=1 val2=1 locals: t1=1 t2=0 CS1 Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); CS2 interleaving #2: 1-2-3– –4-5-6 interleaving completed, so call single-threaded BMC (again) QF formula is satisfiable, i.e., assertion fails...so found a bug for a specific interleaving Lazy exploration of interleavings

Idea: iteratively generate all possible interleavings and call the BMC procedure on each interleaving... combines symbolic model checking: on each individual interleaving explicit state model checking: explore all interleavings Lazy exploration of interleavings

Lazy exploration of interleavings – Reachability Tree execution paths  0 : t main,0, val1=0, val2=0, m1=0, m2=0,…  1 : t twoStage,1, val1=0, val2=0, m1=1, m2=0,…  2 : t twoStage,2, val1=1, val2=0, m1=1, m2=0,… initial state global and local variables active thread, context bound CS1 expansion rules in paper CS2 interleaving completed, so call single-threaded BMC

execution paths blocked execution paths (eliminated)  0 : t main,0, val1=0, val2=0, m1=0, m2=0,…  1 : t twoStage,1, val1=0, val2=0, m1=1, m2=0,…  2 : t twoStage,2, val1=1, val2=0, m1=1, m2=0,…  3 : t reader,2, val1=0, val2=0, m1=1, m2=0,… initial state global and local variables active thread, context bound CS1 CS2 backtrack to last unexpanded node and continue symbolic execution can statically determine that path is blocked (encoded in instrumented mutex-op) Lazy exploration of interleavings – Reachability Tree

execution paths blocked execution paths (eliminated)  0 : t main,0, val1=0, val2=0, m1=0, m2=0,…  1 : t twoStage,1, val1=0, val2=0, m1=1, m2=0,…  4 : t reader,1, val1=0, val2=0, m1=1, m2=0,…  2 : t twoStage,2, val1=1, val2=0, m1=1, m2=0,…  3 : t reader,2, val1=0, val2=0, m1=1, m2=0,…  5 : t twoStage,2, val1=0, val2=0, m1=1, m2=0,…  6 : t reader,2, val1=0, val2=0, m1=1, m2=0,… initial state global and local variables active thread, context bound CS1 CS2 Lazy exploration of interleavings – Reachability Tree

use a reachability tree (RT) to describe reachable states of a multi-threaded program each node in the RT is a tuple for a given time step i, where: –A i represents the currently active thread –C i represents the context switch number –s i represents the current state – represents the current location of thread j – represents the control flow guards accumulated in thread j along the path from to expand the RT by executing symbolically each instruction of the multi-threaded program Exploring the Reachability Tree

R1 (assign): If I is an assignment, we execute I, which generates s i+1. We add as child to  a new node  ’ we have fully expanded  if  I is within an atomic block; or  I contains no global variable; or  the upper bound of context switches (C i = C) is reached if  is not fully expanded, for each thread j  A i where is enabled in s i+1, we thus create a new child node Expansion Rules of the RT

R2 (skip): If I is a skip-statement, we increment the location of the current thread and continue with it. We explore no context switches: R3 (unconditional goto): If I is an unconditional goto- statement with target l, we set the location of the current thread and continue with it. We explore no context switches: Expansion Rules of the RT

R4 (conditional goto): If I is a conditional goto-statement with test c and target l, we create two child nodes  ’ and  ’’.  for  ’, we assume that c is true and proceed with the target instruction of the jump:  for  ’’, we add  c to the guards and continue with the next instruction in the current thread  prune one of the nodes if the condition is determined statically Expansion Rules of the RT

R5 (assume): If I is an assume-statement with argument c, we proceed similar to R1.  we continue with the unchanged state s i but add c to all guards, as described in R4  If evaluates to false, we prune the execution path R6 (assert): If I is an assert-statement with argument c, we proceed similar to R1.  we continue with the unchanged state s i but add c to all guards, as described in R4  we generate a verification condition to check the validity of c Expansion Rules of the RT

R5 (start_thread): If I is a start_thread instruction, we add the indicated thread to the set of active threads:  where is the initial location of the thread and  the thread starts with the guards of the currently active thread R6 (join_thread): If I is a join_thread instruction with argument Id, we add a child node:  where only if the joining thread Id has exited Expansion Rules of the RT

C/C++ source verification conditions SMT solver reused/extended from the CBMC model checker properties BMC IRep tree scan, parse, and type-check deadlock, atomicity and order violations, etc… multi-threaded goto programs symbolic execution engine QF formula generation check satisfiability using an SMT solver System Architecture

C/C++ source scan, parse, and type-check verification conditions SMT solver deadlock, atomicity and order violations, etc… guide the symbolic execution QF formula generation check satisfiability using an SMT solver stop the generate-and-test loop if there is an error scheduler multi-threaded goto programs properties IRep tree BMC symbolic execution engine reused/extended from the CBMC model checker System Architecture

Lazy approach is naïve but useful bugs usually manifest in few context switches [Qadeer&Rehof’05] bound the number of context switches allowed per thread –number of executions: O(n c ) exploit which transitions are enabled in a given state –reduces number of executions keep in memory the parent nodes of all unexplored paths only each formula corresponds to one possible interleaving only, its size is relatively small... but can suffer performance degradation:  in particular for correct programs where we need to invoke the SMT solver once for each possible execution path

explore reachability tree in same way as lazy approach... but call SMT solver only once add a schedule guard ts i for each context switch block i (0 < ts i  #threads) –record in which order the scheduler has executed the program –SMT solver determines the order in which threads are simulated add scheduler guards only to effective statements (assignments and assertions) –record effective context switches (ECS) –ECS block: sequence of program statements that are executed with no intervening ECS Idea: systematically encode all possible interleavings into one formula Schedule Recording

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: reader-ECS: ECS block Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: 1 twoStage-ECS: (1,1) reader-ECS: ts 1 == 1 guarded statement can only be executed if statement 1 is scheduled in ECS block 1 each program statement is then prefixed by a schedule guard ts i = j, where: i is the ECS block number j is the thread identifier Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: 1-2 twoStage-ECS: (1,1)-(2,2) reader-ECS: ts 1 == 1 ts 2 == 1 Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3) reader-ECS: ts 1 == 1 ts 3 == 1 ts 2 == 1 Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3) reader-ECS: (7,4) CS ts 4 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3) reader-ECS: (7,4)-(8,5) CS ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3) reader-ECS: (7,4)-(8,5)-(11,6) CS ts 4 == 2 ts 6 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7) CS ts 7 == 2 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 6 == 2 Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7) ts 8 == 1 CS ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 7 == 2 ts 6 == 2 Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8)-(5,9) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7) CS ts 8 == 1 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 9 == 1 ts 7 == 2 ts 6 == 2 Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8)-(5,9)-(6,10) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7) ts 10 == 1 CS ts 8 == 1 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 9 == 1 ts 7 == 2 ts 6 == 2 Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8)-(5,9)-(6,10) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7)-(13,11) CS ts 9 == 1 ts 8 == 1 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 10 == 1 ts 11 == 2 ts 7 == 2 ts 6 == 2 CS Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8)-(5,9)-(6,10) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7)-(13,11)-(14,12) ts 12 == 2 CS ts 9 == 1 ts 8 == 1 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 10 == 1 CS ts 11 == 2 ts 7 == 2 ts 6 == 2 Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8)-(5,9)-(6,10) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7)-(13,11)-(14,12)-(15,13) ts 13 == 2 CS ts 9 == 1 ts 8 == 1 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 10 == 1 CS ts 12 == 2 ts 11 == 2 ts 7 == 2 ts 6 == 2 Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,2)-(3,3)-(4,8)-(5,9)-(6,10) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7)-(13,11)-(14,12)-(15,13)-(16,14) ts 14 == 2 CS ts 9 == 1 ts 8 == 1 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 10 == 1 CS ts 13 == 2 ts 12 == 2 ts 11 == 2 ts 7 == 2 ts 6 == 2 interleaving completed, so build constraints for interleaving (but do not call SMT solver) Schedule Recording – Interleaving #1

Thread twoStage 1: lock(m1); 2: val1 = 1; 3: unlock(m1); 4: lock(m2); 5: val2 = val1 + 1; 6: unlock(m2); Thread reader 7: lock(m1); 8: if (val1 == 0) { 9: unlock(m1); 10: return NULL; } 11: t1 = val1; 12: unlock(m1); 13: lock(m2); 14: t2 = val2; 15: unlock(m2); 16: assert(t2==(t1+1)); statements: twoStage-ECS: (1,1)-(2,3)-(3,4)-(4,12)-(5,13)-(6,14) reader-ECS: (7,4)-(8,5)-(11,6)-(12,7)-(13,8)-(14,9)-(15,10)-(16,11) CS ts 11 == 2 ts 10 == 2 ts 9 == 2 ts 13 == 1 ts 12 == 1 ts 7 == 2 ts 4 == 2 ts 5 == 2 ts 1 == 1 ts 3 == 1 ts 2 == 1 ts 6 == 2 ts 14 == 1 ts 8 == 2 Schedule Recording – Interleaving #2

twoStage, reader ts 1 ==1  lock(m1) twoStage, reader ts 1 ==2  lock(m1) twoStage, reader ts 1 ==1  ts 2 ==1  val1=1 twoStage, reader ts 1 ==1  ts 2 ==2  lock(m1) twoStage, reader ts 1 ==2  ts 2 ==1  lock(m1) twoStage, reader ts 1 ==2  ts2==2  unlock(m1) CS1 CS2 SMT solver instantiates ts to evaluate all possible interleavings If the guard of the parent node is false then the guard of the child node is false as well thread identifiers program statement Schedule Recording: Execution Paths

systematically explore the thread interleavings as before, but: –add schedule guards to record in which order the scheduler has executed the program –encode all execution paths into one formula  bound the number of context switches  exploit which transitions are enabled in a given state number of threads and context switches grows very large quickly, and easily “blow-up” the solver:  there is a clear trade-off between usage of time and memory resources Observations about the schedule recoding approach

Sequentialization Observation: Building verification tools for full-fledged concurrent languages is difficult and expensive... … but scalable verification techniques exist for sequential languages –Abstraction techniques –SAT/SMT techniques (i.e., bounded model checking) ⇒ How can we leverage these?

Sequentialization ⇒ How can we leverage these? Sequentialization: replace control non-determinism by data non-determinism P' simulates all computations (within certain bounds) of P source-to-source transformation: T 1 ∥ T 2 ↝ T ̕ 1 ; T ̕ 2 ⇒ reuse existing tools (largely) unchanged ⇒ easy to target multiple back-ends ⇒ easy to experiment with different approaches convert concurrent programs into sequential programs such that reachability is preserved

KISS: Keep It Simple and Sequential [Quadeer-Wu, PLDI’04] Under-approximation (subset of interleavings) Thread creation → function call –at context-switches either:  the active thread is terminated or  a not yet scheduled thread is started (by calling its main function) –when a thread is terminated either:  the thread that has called it is resumed (if any) or  a not yet scheduled thread is started A first sequentialization: KISS

(l 1,s 1 ) T1T1 (l 1,s 3 ) T2T2 (l 2,s 1 ) T3T3 (l 3,s 2 ) (l 4,s 2 ) (l 5,s 3 ) Schedule 1: 1. Start T 1 2. Start T 2 3. Terminate T 2 4. start T 3 5. terminate T 3 6. Resume T 1 T1T1 T2T2 T3T3 Schedule 2: 1. start T 1 2. start T 2 3. start T 3 4. terminate T 3 5. resume T 2 6. terminate T 2 7. resume T 1 T1T1 T2T2 T3T3 Schedule 3: 1. start T 1 2. start T 2 3. terminate T 2 4. resume T 1 5. start T 3 6. terminate T 3 7. resume T 1 KISS schedules

LR sequentialization first sequentialization of bounded number of context-switches (but finite number of threads) [Lal-Reps, CAV’08]

LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2, …, a k 2.Execute T 1 to completion  Computes -local states l 1,.., l k, and -global states b 1, …, b k (l 1,b 1 ) T1T1 (l 1,a 2 ) (l 2,b 2 ) (l 2,a 3 ) (l 3,b 3 ) T2T2 T3T3

LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 (l 1,b 1 ) T1T1 (l 1,a 2 ) (l 2,b 2 ) (l 2,a 3 ) (l 3,b 3 ) T2T2 T3T3 b1b1 b2b2 b3b3

LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2  We can dismiss locals of T 1 T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3

LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 4.Execute T 2 to completion  Dismiss locals of T 2  Dismiss b 1,…,b k T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3 (l 1 ’,c 1 ) c3c3 (l 1 ’,b 2 ) (l 0 ’,b 1 ) (l 2 ’,c 2 ) (l 2 ’,b 3 )

LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 4.Execute T 2 to completion T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3 c1c1 c2c2 c3c3

LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 4.Execute T 2 to completion 5.Pass c 1,…,c k to T 3 6.Execute T 3 to completion  Dismiss locals of T 3  Dismiss c 1,…,c k T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3 c1c1 c2c2 c3c3 d3d3 (l 0 ’’, c 1 ) (l 1 ’,d 1 ) (l 1 ’,c 2 ) (l 1 ’,d 2 ) (l 1 ’,c 3 )

LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 4.Execute T 2 to completion 5.Pass c 1,…,c k to T 3 6.Execute T 3 to completion T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3 c1c1 c2c2 c3c3 d1d1 d2d2 d3d3

LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 4.Execute T 2 to completion 5.Pass c 1,…,c k to T 3 6.Execute T 3 to completion 7.Computation iff d i = a i+1  i  [1,k-1] T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3 c1c1 c2c2 c3c3 d1d1 d2d2 d3d3

LR sequentialization - schema Sequential program (k-rounds) 1.Guess a 2,…,a k 2.Execute T 1 to completion 3.Pass b 1,…,b k to T 2 4.Execute T 2 to completion 5.Pass c 1,…,c k to T 3 6.Execute T 3 to completion 7.Computation iff d i = a i+1  i  [1,k-1] 8.Report an error if a bug occurs during the simulation T1T1 a2a2 a3a3 T2T2 T3T3 b1b1 b2b2 b3b3 c1c1 c2c2 c3c3 d1d1 d2d2 d3d3

LR sequentialization - implementation considers only round-robin schedules with k rounds T0T0 T1T1 TnTn...

LR sequentialization - implementation considers only round-robin schedules with k rounds –thread → function, run to completion T0T0 T1T1 TnTn...

LR sequentialization - implementation considers only round-robin schedules with k rounds –thread → function, run to completion global memory copy for each round –scalar → array T0T0 S 2,0 S 0,0 S 1,0 S k,0 T1T1 S 2,1 S 0,1 S 1,1 S k,1 TnTn S 2,n S 0,n S 1,n S k,n...

LR sequentialization - implementation considers only round-robin schedules with k rounds –thread → function, run to completion global memory copy for each round –scalar → array context switch → round counter++ T0T0 S 2,0 S 0,0 S 1,0 S k,0 T1T1 S 2,1 S 0,1 S 1,1 S k,1 TnTn S 2,n S 0,n S 1,n S k,n...

LR sequentialization - implementation considers only round-robin schedules with k rounds –thread → function, run to completion global memory copy for each round –scalar → array context switch → round counter++ first thread starts with non-deterministic memory contents –other threads continue with content left by predecessor T0T0 T1T1 S 2,1 S 0,1 S 1,1 S k,1 TnTn... S 2,0 S 0,0 S 1,0 S k,0 S 2,n S 0,n S 1,n S k,n

LR sequentialization - implementation considers only round-robin schedules with k rounds –thread → function, run to completion global memory copy for each round –scalar → array context switch → round counter++ first thread starts with non-deterministic memory contents –other threads continue with content left by predecessor checker prunes away inconsistent simulations –assume( S k+1,0 == S k,n ); –requires second set of memory copies –errors can only be checked at end of simulation  requires explicit error checks T0T0 T1T1 S 2,1 S 0,1 S 1,1 S k,1 TnTn... S 2,0 S 0,0 S 1,0 S k,0 S 2,n S 0,n S 1,n S k,n

LR sequentialization - implementation //shared vars type g1 g1; type g2 g2; … //thread functions t(){ type x1 x1; type x2 x2; … stmt 1 ; stmt 2 ; … } … main(){ … } //shared vars type g1 g1[K]; type g2 g2[K]; … uint round=0; bool ret=0; //aux vars // context-switch simulation cs() { unsigned int j; j= nondet(); assume(round +j < K); round+=j; if (round==K-1 && nondet()) ret=1; } //thread functions t(){ type x1 x1; type x2 x2; … cs(); if (ret) return; stmt 1 [round]; cs(); if (ret) return; stmt 2 [round]; … } … main_thread(){ … } main(){ … } //next slide

LR sequentialization - implementation main(){ type g1 _g1[K]; type g2 _g2[K]; … // first thread starts with non-deterministic memory contents for (i=1; i<K; i++){ _g1[i] = g1[i] = nondet(); _g2[i] = g2[i] = nondet(); … } // thread simulations t[0] = main_thread; round_born[0] = 0; is_created[0] = 1; for (i=0; i<N; i++){ if(is_created[i]){ ret=0; round = round_born[i]; t[i](); } } // consistency check for (i=0; i<K-1; i++){ assume(_g1[i+1] == g1[i]); assume(_g2[i+1] == g2[i]); … } // error detection assert(err == 0); }

LR sequentialization - implementations Corral (SMT-based analysis for Boogie programs –[ Lal–Qadeer–Lahiri, CAV’12 ] –[ Lal–Qadeer, FSE’14 ] CSeq (code-to-code translation for C + pthreads) –[ Fischer–Inverso–Parlato, ASE’13 ] Rek (for Real-time Embedded Software Systems) –[ Chaki–Gurfinkel–Strichman, FMCAD’11 ] Storm: implementation for C programs –[ Lahiri–Qadeer–Rakamaric, CAV’09 ] –[Rakamaric, ICSE’10]

LR sequentialization is “eager” T1T1 a2a2 T2T2 T3T3 b1b1 a3a3 b2b2 a 2 is guessed a 2 may be unreachable EAGER [Lal, Reps CAV’08]

LR sequentialization does not preserve assertions void thread1() { while (blocked); x = x/y; if (x%2==1) ERROR; } void thread2() { x=12; y=2; //unblock thread2 blocked=false; } // shared variables bool blocked=true; int x=0, y=0; Inv: y != 0 blocked=true blocked=false guess x=13 y=0

A lazy transformation is desirable A lazy sequential program explores only reachable states of the concurrent program Why is it desirable? In model-checking it can drastically reduce the explored state-space Better invariants for deductive verification / abstract interpretation We now illustrate a lazy transformation: [La Torre-Madhusudan-Parlato, CAV’09]

Lazy transformation: main idea  Execute T 1  Context-switch: store s 1 and abort  Execute T 2 from s 1  store s 2 and abort (l 1,s 1 ) (l’ 1,s 1 ) (l’ 2,s 2 ) T1T1 (l 0,s 0 ) T2T2 store s 1 & abort store s 2 & abort

Lazy transformation: main idea  Re-execute T 1 till it reaches s 1 May reach a new local state! Anyway it is correct !!  Restart from global s 2 and compute s 3 (l 1,s 1 ) (l’ 1,s 1 ) (l’ 2,s 2 ) T1T1 (l 0,s 0 ) T2T2 store s 1 & abort store s 2 & abort (l’’ 1,s 1 ) store s 3 & abort (l’’ 1,s 2 )

Lazy transformation: main idea  Switch to T 2  Execute till it reaches s 2  Continue computation from global s 3 (l 1,s 1 ) (l’ 1,s 1 ) (l’ 2,s 2 ) T1T1 (l 0,s 0 ) T2T2 store s 1 & abort store s 2 & abort (l’’ 1,s 1 ) store s 3 & abort (l’’’ 1,s 2 ) (l’’ 1,s 2 ) (l’’’ 1,s 3 )

Lazy transformation: main idea T1T1 T2T2 store s 1 store s 2 store s 3 store s 4 store s 5 end s1s1 s2s2 s3s3 s4s4 s1s1 s2s2 s3s3 s4s4 s5s5

Lazy transformation: features Explores only reachable states Preserves invariants across the translation Tracks local state of one thread at any time Tracks values of shared variables at context switches (s 1, s 2, …, s k ) Requires recomputation of local states

Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } main driver a global pc for each thread thread locals  thread global

Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } main driver for each round for each thread T i simulate T i

Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc k ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;... E XE.. M: CS(M); stmt M; main driver F i ()

Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc i ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;... E XE.. M: CS(M); stmt M; main driver F i ()... context-switch resume mechanism

Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc i ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i ()... Context-switch simulation: #define CS(j) if (*) { pc i =j; return; } Context-switch simulation: #define CS(j) if (*) { pc i =j; return; }

Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc i ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i ()...

Naïve Lazy Sequentialization: CROSS PRODUCT SIMULATION pc 0 =0; pc 1 =0;... pc N =0; local 0 ; local 1 ;... local k ; main() { for (r=0; r<R; r++) for (k=0; k<N; k++) // simulate T k F k (); } switch(pc i ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver... Formula encoding: goto statement to formula add a guard for each crossing control-flow edge = O(M 2 ) guards Formula encoding: goto statement to formula add a guard for each crossing control-flow edge = O(M 2 ) guards

pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc i ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i ()... CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION

pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc k ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i ()... CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION

pc 0 =0;... pc N =0; local 0 ;... local k ; main() { for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) F i (); } switch(pc k ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i ()... CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION

switch(pc k ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i ()... CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION #define CS(j) NEW if (j =nextCS) goto j+1; #define CS(j) NEW if (j =nextCS) goto j+1; pc 0 =0;... pc N =0; local 0 ;... local k ; nextCS; main() for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) nextCS = nondet; assume(nextCS>=pc i ) F i (); pc i = nextCS;

switch(pc k ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i () CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION #define CS(j) NEW if (j =nextCS) goto j+1; #define CS(j) NEW if (j =nextCS) goto j+1; pc 0 =0;... pc N =0; local 0 ;... local k ; nextCS; main() for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) nextCS = nondet; assume(nextCS>=pc i ) F i (); pc i = nextCS;

switch(pc k ) { case 0: goto 0; case 1: goto 1; case 2: goto 2;... case M: goto M; } 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2;.. E XE... M: CS(M); stmt M; main driver F i () CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION #define CS(j) NEW if (j =nextCS) goto j+1; #define CS(j) NEW if (j =nextCS) goto j+1;... pc 0 =0;... pc N =0; local 0 ;... local k ; nextCS; main() for (r=0; r<K; r++) for (i=0; i<N; i++) // simulate T i if (active i ) nextCS = nondet; assume(nextCS>=pc i ) F i (); pc i = nextCS;

resuming + context-switch 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2; EXECUTE M: CS(M); stmt M; nextCS... skip pc i... skip #define CS(j) NEW if (j =nextCS) goto j+1; #define CS(j) NEW if (j =nextCS) goto j+1; CSeq-Lazy Sequentialization: CROSS PRODUCT SIMULATION

Individual Threads Sequentialization 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2; EXECUTE M: CS(M); stmt M;... Formula encoding: goto statement to formula add a guard for each crossing control-flow edge = O(M) guards Formula encoding: goto statement to formula add a guard for each crossing control-flow edge = O(M) guards

Individual Threads Sequentialization 0: CS(0); stmt0; 1: CS(1); stmt1; 2: CS(2); stmt2; EXECUTE M: CS(M); stmt M;... #define CS(j) if (j =next_CS) goto pc+1; #define CS(j) if (j =next_CS) goto pc+1; inject light-weight, non- invasive control code no non-determinism no assignments no return inject light-weight, non- invasive control code no non-determinism no assignments no return

Sequentialization via Memory Unwinding Basic Idea: Mu-CSeq Approach: T ₁ ∥ T ₂ ↝ M ; T ̕₁ ; C ₁ ; T ̕₂ ; C ₂ –M makes consistent shared memory guess –checkers C ensure that threads T’ used memory consistently convert concurrent programs into equivalent sequential programs

Memory Unwinding xy... z “memory”

Memory Unwinding xy... z “memory” Guess and store sequence of individual write operations:

Memory Unwinding xy... z N “memory” pos Guess and store sequence of individual write operations:

Memory Unwinding xy... z N “memory” pos Guess and store sequence of individual write operations: add N copies of shared variables (“memory”) _memory[i,v] is value of v-th variable after i-th write

Memory Unwinding xy... z 1x 1y 2y 1z 2x var N thr “memory”“writes” pos Guess and store sequence of individual write operations: add N copies of shared variables (“memory”) _memory[i,v] is value of v-th variable after i-th write add array to record writes (“wirtes”) i-th write is by _thr[i], which has written to _var[i]

Memory Unwinding xy... z 1x 1y 2y 1z 2x var... next[0] N thr “memory”“writes” “barrier” pos Guess and store sequence of individual write operations: add N copies of shared variables (“memory”) _memory[i,v] is value of v-th variable after i-th write add array to record writes (“writes”) i-th write is by _thr[i], which has written to _var[i] add array to record validity of writes (“barrier”) next write (after i) by thread t is at _next[i,t] N next[K]... N+1 N 3 3 N... N+1

Basic Idea: Simulation simulate all executions compatible with guessed memory unwinding

Basic Idea: uses auxiliary variables – pos (current index into unwound memory) – thread (id of currently simulated thread) – end (guessed index of last write into unwound memory) Simulation simulate all executions compatible with guessed memory unwinding

Basic Idea: uses auxiliary variables – pos (current index into unwound memory) – thread (id of currently simulated thread) – end (guessed index of last write into unwound memory) every thread is translated into a function –simulation starts from main thread Simulation simulate all executions compatible with guessed memory unwinding

Basic Idea: uses auxiliary variables – pos (current index into unwound memory) – thread (id of currently simulated thread) – end (guessed index of last write into unwound memory) every thread is translated into a function –simulation starts from main thread each thread creation interrupts simulation of current thread –new thread is called with current value of pos Simulation simulate all executions compatible with guessed memory unwinding

Basic Idea: uses auxiliary variables – pos (current index into unwound memory) – thread (id of currently simulated thread) – end (guessed index of last write into unwound memory) every thread is translated into a function –simulation starts from main thread each thread creation interrupts simulation of current thread –new thread is called with current value of pos terminated threads must have completed their memory writes –pos > end Simulation simulate all executions compatible with guessed memory unwinding

Basic Idea: uses auxiliary variables – pos (current index into unwound memory) – thread (id of currently simulated thread) – end (guessed index of last write into unwound memory) every thread is translated into a function –simulation starts from main thread each thread creation interrupts simulation of current thread –new thread is called with current value of pos terminated threads must have completed their memory writes –pos > end old thread resumed with old memory position Simulation simulate all executions compatible with guessed memory unwinding

Basic Idea: every read / write is translated into a function –simulates valid access into unwound memory similar simulations for pthread functions Simulation simulate all executions compatible with guessed memory unwinding

Simulating reads and writes xy... z 1x 1y 2y 1z 2x var N thr int read(uint x) { uint jmp=*; assume(jmp>=pos && jmp<next[pos,thread]); pos=jmp; if (pos>end) return; return(_memory[pos, x]); } “memory”“writes” pos next[0] “barrier” N next[K]... N+1 N 3 3 N... N+1

Simulating reads and writes xy... z 1x 1y 2y 1z 2x var N thr int write(uint x, int val) { pos=next[pos,thread]; if(pos>end) return; assume( _var[pos]==x && _memory[pos,x]==val); } “memory”“writes” pos next[0] “barrier” N next[K]... N+1 N 3 3 N... N+1