Verification for Concurrent Programs Part 3: Incomplete techniques and bug finding
Context bounding and Sequentialization
Context bounding Folk knowledge: Most concurrency bugs are shallow in terms of required context-switches Most bugs require very few context-switches Most concurrency bugs are atomicity violations or order violations For an empirical study, see Shan Lu et al. 2006…2008 Why not check concurrent programs only up to a few context switches? Much more efficient
CHESS: Systematic exploration Culmination of techniques proposed by Qadeer et al in 2004 Correctness primarily given by assertions in the code Can also use monitors Can detect data-races, deadlocks, etc Main idea: Use a scheduler that explores traces of the program deterministically, prioritizing traces having few context-switches
CHESS: Controlling scheduler Non-determinism source: Input Scheduling Timing and library Input non-determinism controlled by specifying fixed inputs Scheduling non-determinism controlled by writing deterministic scheduler Library non-determinism: model library code
State-space explosion Thread1: x = 1 … y = k Threadn: x = 1 … y = k Exploring k steps in each of the n threads Number of executions is O(nnk) Exploring k steps in each thread, but only c context-switches Number of executions is O((n2k)c.n!) Not exponential in k … Additionally, scheduler can use polynomial amount of space Remember c spots for context switches Permutations of the n+c atomic blocks
Scheduling: Picking pre-emption points void Deposit100() { ChessSchedule(); EnterCriticalSection(&cs); balance += 100; LeaveCriticalSection(&cs); } void Withdraw100() { int t; ChessSchedule(); EnterCriticalSection(&cs); t = balance; LeaveCriticalSection(&cs); balance = t -100; } Heuristics: More pre-emption points in critical code, etc Coverage guarantee: When 2 context-switches are explored, every remaining bug requires at least 3 context-switches
CHESS: Summary Build a deterministic scheduler Advantages: Complications: Fairness and Live locks, weak memory models Advantages: Runs real code on real systems Only scheduler has been replaced Disadvantages: Is mostly program agnostic Exhaustive testing
Sequentialization CHESS approach: Concurrent program + bound on context switches explore all interleavings General sequentialization approach: Concurrent program + bound on context switches Sequential program Then, verify sequential program using your favourite verification technique Many flavours of context-bounded analysis: PDS based (Qadeer et al.) Transformation based sequentialization: Eager, Lazy (Lal et al.) BMC based (Parlato et al.)
Sequentialization: Basic idea What is hard about sequentialization? Have to remember local variables across phases (though they don’t change) If exploring T1 T2 T1, have to remember locals of T1 across phase of T2 Lal-Reps 2008: Instead, do a source to source transformation Copy each statement and global variable c times Now, we can explore T1 T1 T2 instead of T1 T2 T1 Only one threads local variables relevant at each stage
Sequentialization: Basic idea Replace each global variable X by X[tid][0..K] X[tid][i] represents the value of the global variable X the ith time thread tid is scheduled if (phase = 0) X[tid][0] := X[tid][0] + 1 else if (phase = 1) X[tid][1] := X[tid][1] + 1 … else if (phase = K) X[tid][K] = X[tid][K] + 1 if (phase < K && *) phase++; if phase == K + 1 phase = 1 Thread[tid+1]() X := X + 1
Sequentialization: Basic idea A program (T1||T2) is rewritten into Seq(T1); Seq(T2); check() Roughly, Execute each thread sequentially But, at random points, guess new values for global variables In the end, check the guessed new values are consistent for phase = 0 to K if (phase > 0) assume (X[0][phase] == X[N][phase – 1] for tid = 1 to N assume (X[tid][phase] == X[tid-1][phase])
Each green arrow is one part of the check! Sequentialization Each green arrow is one part of the check! Thread 0: Thread 1: … X[0][0] := X[0][0] + 1 … X[1][0] := X[1][0] + 1 … X[0][1] := X[0][1] + 1 … X[1][1] := X[1][1] + 1 … X[0][2] := X[0][2] + 1 … X[1][2] := X[1][2] + 1
Sequentialization The original Lal/Reps technique uses summarization for verification of the sequential program Compute summaries for the relation of initial and final values of global variables Extremely powerful idea Advantage: Reduces the need to reason about locals of different threads No need to reason explicitly about interleavings Interleavings encoded into data (variables) Scales linearly with number of threads
Sequentialization and BMC Currently, the best tools in the concurrency verification competitions use “sequentialization + BMC” The previous sequentialization technique is better suited for analysis techniques, not model checking No additional advantage using additional globals and then checking for consistency Instead, just explicitly use non-determinism
BMC for concurrency First, rewrite threads by unrolling loops and inlining function calls No loops No function calls Forward only control flow Write a driver “main” function to schedule the threads one by one
Naïve sequentialization for BMC threadi(): switch(pci) { case 0: goto 0; case 1: goto 1; … } 0: CS(0); stmt0; 1: CS(1); stmt1; M: CS(M); stmtm; CS(j) := if(*) { pci = j; return Main driver: pc0 = 0, … , pcn = 0 main() { for (r = 0; r < K, r++) for (i = 0; i < n; i++) threadi(); } The resume mechanism jumps into “right” spot in the thread There is a potential CS before each statement What’s the problem? Lots of jumps in the control flow Bad for SMT encoding
Better sequentialization for BMC threadi(): 0: CS(0); stmt0; 1: CS(1); stmt1; … M: CS(M); stmtm; CS(j) := if(j < pci || j >= nextCS) { goto j+1; } Main driver: pc0 = 0, … , pcn = 0 main() { for (r = 0; r < K, r++) for (i = 0; i < n; i++) nextCS = * assume (nextCS >= pci) threadi(); pci = nextCS } Avoid the multiple control flow breaking jumps Restricted non-determinism to one spot
Context bounding and Sequentialization: Summary Host of related techniques Can be adapted for analysis, model checking, testing, etc Different techniques need different kinds of tuning Basic idea: Most bugs require few context switches to turn up Can leverage standard sequential program analysis techniques
Odds and Ends Things we didn’t cover
Specification-free correctness In many cases we don’t want to write assertions Just want concurrent program to do the same thing as a sequential program is doing Standard correctness conditions Linearizability [Herlihy/Wing 91] Serializability [Papadimitrou and others 70s] Conc. Exec Method 0 Method 2 Method 1 Method 3 Method 0 Method 3 Method 1 Seq. Exec Method 2
Testing for concurrency Root cause of bugs Ordering violations Atomicity violations Data races Coverage metrics and coverage guided search Define use pairs [Tasirin et al] Find ordering violations based on define use orderings HaPSet [ Wang et al] Find interesting interleavings by trying to cover all “immediate histories” of events Cute/JCute [Sen et al] Concolic testing: Accumulate constraints along test run to guide future test runs
(Symbolic) Predictive Analysis Analyze variations of the given concurrent trace Run a test and record information Build a predictive model by relaxing scheduling constraints Analyze predictive model for alternate inter-leavings Can flag false bugs Symbolic predictive analysis From a trace, build precise predictive model (as SMT formula) No false bugs
This is the End Brief overview of concurrent verification techniques Lecture 1: Race detection Lecture 2: Full proof techniques Lecture 3: Bug finding What did we learn? Full verification is hard, not many techniques for weak- memory architectures Use light-weight and incomplete techniques to detect shallow bugs Code using a strict concurrency discipline is more likely to be correct, easier to verify