CHESS: Systematic Concurrency Testing Tom Ball, Sebastian Burckhardt, Madan Musuvathi, Shaz Qadeer Microsoft Research
Testing concurrent programs is HARD Rare thread interleavings expose bugs Coverage problem Testing misses thread interleavings that expose errors Reproducibility problem Concurrency bugs == Heisenbugs Not reproducible hard to debug Crash dumps don’t help
Thread interleavings x++; x*=2;
Concurrency testing today Concurrency testing == stress testing Example: testing a concurrent queue Create 100 threads performing queue operations Run for days/weeks Stress increases the interleaving variety, but Not systematic: might miss interleavings Not predictable: cannot find the same error again Makes any error found hard to debug
1 Why stress is not sufficient
Concurrency testing : what we need Methodology and tools to systematically and predictably test thread interleavings
CHESS in a nutshell Replace the OS scheduler with a demonic scheduler Systematically explore all scheduling choices Concurrent Program Win32 API Kernel Scheduler Kernel Scheduler Demonic Scheduler Demonic Scheduler
CHESS will run this program 6 times exploring all the different interleavings x++; x*=2;
2 Don’t stress, use CHESS
CHESS architecture Kernel: Threads, Scheduler, Synchronization Objects Kernel: Threads, Scheduler, Synchronization Objects While(not done) { TestScenario() } While(not done) { TestScenario() } TestScenario() { … } Program CHESS CHESS runs the scenario in a loop Every run takes a different interleaving Every run is repeatable Win32 API Intercept synch. & threading calls To control and introduce nondeterminism Detect Assertion violations Deadlocks Dataraces Livelocks
CHESS methodology generalizes Need wrappers for every concurrency API CHESS has wrappers for Win32,.NET, Singularity Wrappers understand the semantics of the API Expose nondeterminism in the API Looking for volunteers to build wrappers for Linux and Java.NET Program.NET Program.NET CLR CHESS Win32 Program Win32 Program Win32 / OS CHESS Singularity Program Singularity Program Singularity CHESS
CHESS clients PCP = Parallel Computing Platform (for multi/many-cores) PLINQ: Parallel LINQ CDS: Concurrent Data Structures STM: Software Transactional Memory TPL: Task Parallel Library ConcRT: Concurrency RunTime CCR: Concurrency Coordination Runtime Dryad Part of COSMOS Singularity/Midori CHESS can systematically test the boot and shutdown process
Stateless model checking [Verisoft ‘97] Systematically enumerate all paths in a state-space graph Don’t capture program states Capturing states is extremely hard for large programs Effective for message-passing programs CHESS applies stateless model checking for shared- memory multithreaded programs
Outline Preemption bounding [PLDI ‘07] Fair stateless model checking [PLDI ‘08] Sober [CAV ’08, EC2 ‘08] FeatherLite Concurrency Explorer [EC2 ‘08]
Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Sober FeatherLite Concurrency Explorer
x = 1; … y = k; x = 1; … y = k; State space explosion x = 1; … y = k; x = 1; … y = k; … n threads k steps each Number of executions = O( n nk ) Exponential in both n and k Typically: n 100 Limits scalability to large programs Goal: Scale CHESS to large programs (large k)
x = 1; if (p != 0) { x = p->f; } x = 1; if (p != 0) { x = p->f; } Preemption bounding Prioritize executions with small number of preemptions Two kinds of context switches: Preemptions – forced by the scheduler e.g. Time-slice expiration Non-preemptions – a thread voluntarily yields e.g. Blocking on an unavailable lock, thread end x = p->f; } x = p->f; } x = 1; if (p != 0) { x = 1; if (p != 0) { p = 0; preemption non-preemption
Polynomial state space Terminating program with fixed inputs and deterministic threads n threads, k steps each, c preemptions Number of executions <= nk C c. (n+c)! = O( (n 2 k) c. n! ) Exponential in n and c, but not in k x = 1; … y = k; x = 1; … y = k; x = 1; … y = k; x = 1; … y = k; x = 1; … x = 1; … x = 1; … x = 1; … y = k; … y = k; … y = k; Choose c preemption points Permute n+c atomic blocks
3 Preemption bounding
Find lots of bugs with 2 preemptions ProgramLines of codeBugs Work Stealing Q4K4 CDS6K1 CCR9K3 ConcRT16K4 Dryad18K7 APE19K4 STM20K2 TPL24K9 PLINQ24K1 Singularity175K2 37 (total) Acknowledgement: testers from PCP team
So, is CHESS is unsound? Soundness: prove that the program is correct for a given input test harness Need to exhaustively explore all interleavings For small programs, CHESS is sound Iteratively increase the preemption bound Preemption bounding helps scale to large programs A good “knob” to trade resources for coverage Better search algorithms more coverage faster Partial-order reduction Modular testing of loosely-coupled programs
Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Makes CHESS effective on cyclic state spaces Enables CHESS to find liveness violations (livelocks) Sober FeatherLite Concurrency Explorer
Concurrent programs have cyclic state spaces Spinlocks Non-blocking algorithms Implementations of synchronization primitives Periodic timers … L1: while( ! done) { L2: Sleep(); } L1: while( ! done) { L2: Sleep(); } M1: done = 1; ! done L2 ! done L2 ! done L1 ! done L1 done L2 done L2 done L1 done L1
A demonic scheduler unrolls any cycle ad-infinitum ! done done ! done done ! done done while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1; ! done
Depth bounding ! done done ! done done ! done done ! done Prune executions beyond a bounded number of steps Depth bound
Problem 1: Ineffective state coverage ! done Bound has to be large enough to reach the deepest bug Typically, greater than 100 synchronization operations Every unrolling of a cycle redundantly explores reachable state space Depth bound
Problem 2: Cannot find livelocks Livelocks : lack of progress in a program temp = done; while( ! temp) { Sleep(); } temp = done; while( ! temp) { Sleep(); } done = 1;
Key idea This test terminates only when the scheduler is fair Fairness is assumed by programmers All cycles in correct programs are unfair A fair cycle is a livelock while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1; ! done done
We need a fair demonic scheduler Avoid unrolling unfair cycles Effective state coverage Detect fair cycles Find livelocks (violations of fair termination) Concurrent Program Test Harness Win32 API Demonic Scheduler Demonic Scheduler Fair Demonic Scheduler Fair Demonic Scheduler
Fair termination allows CHESS to check for arbitrary liveness properties Example: Good Samaritan assumption Forall threads t : GF scheduled(t) GF yield(t) A thread when scheduled infinitely often yields the processor infinitely often Examples of yield: Sleep(), ScheduleThread(), asm {rep nop;} Thread completion while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1;
Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Makes CHESS effective on cyclic state spaces Enables CHESS to find liveness violations (livelocks) Sober Detect relaxed-memory model errors Do not miss behaviors only possible in a relaxed memory model FeatherLite Concurrency Explorer
C# Example volatile bool isIdling; volatile bool hasWork; //Consumer thread void BlockOnIdle(){ lock (condVariable){ isIdling = true; if (!hasWork) Monitor.Wait(condVariable); isIdling = false; } //Producer thread void NotifyPotentialWork(){ hasWork = true; if (isIdling) lock (condVariable) { Monitor.Pulse(condVariable); } 32
Key pieces of code on previous slide: On x86, hardware may perform store late Bug: Producer thread does not notice waiting Consumer, does not send signal Store ii, 1 Example: Store Buffer Vulnerability Store ii, 1 volatile int ii = 0; volatile int hw = 0; Load hw, 0 Load ii, 1 Store hw, 1 ConsumerProducer 00 33
Sober algorithm Programmers assume sequential-consistency (SC) Insert synchronizations & fences to counter memory- model relaxations Sober checks if a program is memory-model safe i.e., program has only SC executions in a memory model Reports any such violation as an error Sober is a dynamic monitor that checks if any SC execution can be extended to a non-SC execution Theorem: CHESS + Sober guarantees memory-model safety
Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Makes CHESS effective on cyclic state spaces Enables CHESS to find liveness violations (livelocks) Sober Detect relaxed-memory model errors Do not miss behaviors only possible in a relaxed memory model FeatherLite A light-weight data-race detection engine (<20% overhead) Concurrency Explorer
Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Makes CHESS effective on cyclic state spaces Enables CHESS to find liveness violations (livelocks) Sober Detect relaxed-memory model errors Do not miss behaviors only possible in a relaxed memory model FeatherLite A light-weight data-race detection engine (<20% overhead) Concurrency Explorer First-class concurrency debugging
Conclusion Don’t stress, use CHESS CHESS binary and papers available at Stateless model checking is very effective Preemption bounding to scale to deep state spaces Fair demonic scheduler to handle nonterminating programs Need better testing and debugging methodologies for concurrent programs
Questions