Download presentation
Presentation is loading. Please wait.
1
CHESS: Find and Reproduce Heisenbugs in Concurrent Programs Tom Ball, Sebastian Burckhardt, Peli de Halleux, Madan Musuvathi, Shaz Qadeer Microsoft Research
2
The Heisenbug problem Concurrent executions are highly nondeterminisitic Rare thread interleavings result in Heisenbugs Difficult to find, reproduce, and debug Observing the bug can “fix” it Likelihood of interleavings changes, say, when you add printfs A huge productivity problem Developers and testers can spend weeks chasing a single Heisenbug
3
CHESS in a nutshell CHESS is a user-mode scheduler Controls all scheduling nondeterminism Replace the OS scheduler Guarantees: Every program run takes a different thread interleaving Reproduce the interleaving for every run
4
CHESS Demo Find a simple Heisenbug
5
CHESS architecture CHESS Scheduler CHESS Scheduler Unmanaged Program Unmanaged Program Windows Managed Program Managed Program CLR Windows Kernel Windows Kernel Kernel Sync. Every run takes a different interleaving Reproduce the interleaving for every run CHESS Exploration Engine CHESS Exploration Engine Win32 Wrappers.NET Wrappers
6
High level goals Scale to large programs Any error found by CHESS is possible in the wild CHESS does not introduce any new behaviors Any error found in the wild can be found by CHESS Need to capture all sources of nondeterminism Exhaustively explore the nondeterminism (state explosion) e.g. Enumerate all thread interleavings Hard to achieve Practical goal: beat stress
7
Errors that CHESS can find Assertions in the code Any dynamic monitor that you run Memory leaks, double-free detector, … Deadlocks Program enters a state where no thread is enabled Livelocks Program runs for a long time without making progress
8
CHESS architecture CHESS Scheduler CHESS Scheduler Unmanaged Program Unmanaged Program Windows Managed Program Managed Program CLR CHESS Exploration Engine CHESS Exploration Engine Win32 Wrappers.NET Wrappers Capture scheduling nondeterminism Drive the program along an interleaving of choice
9
Running Example Lock (l); bal += x; Unlock(l); Lock (l); bal += x; Unlock(l); Lock (l); t = bal; Unlock(l); Lock (l); bal = t - y; Unlock(l); Lock (l); t = bal; Unlock(l); Lock (l); bal = t - y; Unlock(l); Thread 1Thread 2
10
Introduce Schedule() points Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Thread 1Thread 2 Instrument calls to the CHESS scheduler Each call is a potential preemption point
11
First-cut solution: Random sleeps Introduce random sleep at schedule points Does not introduce new behaviors Sleep models a possible preemption at each location Sleeping for a finite amount guarantees starvation-freedom Sleep(rand()); Lock (l); bal += x; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); bal += x; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); t = bal; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); bal = t - y; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); t = bal; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); bal = t - y; Sleep(rand()); Unlock(l); Thread 1Thread 2
12
Improvement 1: Capture the “happens-before” graph Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Thread 1Thread 2 Delays that result in the same “happens-before” graph are equivalent Avoid exploring equivalent interleavings Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Sleep(5)
13
Improvement 2: Understand synchronization semantics Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Thread 1Thread 2 Avoid exploring delays that are impossible Identify when threads can make progress CHESS maintains a run queue and a wait queue Mimics OS scheduler state Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Lock (l); t = bal;
14
Emulate execution on a uniprocessor Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Thread 1Thread 2 Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Enable only one thread at a time Linearizes a partial-order into a total-order Controls the order of data- races
15
CHESS modes: speed vs coverage Fast-mode Introduce schedule points before synchronizations, volatile accesses, and interlocked operations Finds many bugs in practice Data-race mode Introduce schedule points before memory accesses Finds race-conditions due to data races Captures all sequentially consistent (SC) executions
16
Capture all sources of nondeterminism? No. Scheduling nondeterminism? Yes Timing nondeterminism? Yes Controls when and in what order the timers fire Nondeterministic system calls? Mostly CHESS uses precise abstractions for many system calls Input nondeterminism? No Rely on users to provide inputs Program inputs, return values of system calls, files read, packets received,… Good tradeoff in the short term But can’t find race-conditions on error handling code Future extensions using symbolic execution? (DART, jCUTE, SAGE, PEX)
17
Capture all sources of nondeterminism? No. Hardware relaxations? Yes Hardware can reorder instructions Non-SC executions possible in programs with data races Sober [CAV ‘08] can detect and explore such non-SC executions Compiler relaxations? No Non-SC executions possible in Java programs with data races For C# programs, I don’t know Extending Sober to handle this is an ongoing effort
18
CHESS architecture CHESS Scheduler CHESS Scheduler Unmanaged Program Unmanaged Program Windows Managed Program Managed Program CLR CHESS Exploration Engine CHESS Exploration Engine Win32 Wrappers.NET Wrappers
19
CHESS wrappers Translate Win32/.NET synchronizations Into CHESS scheduler abstractions Tasks : schedulable entities Threads, threadpool work items, async. callbacks, timer functions SyncVars : resources used by tasks Generate happens-before edges during execution Executable specification for complex APIs Most time consuming and error-prone part of CHESS Enables CHESS to handle multiple platforms
20
CHESS Wrapper example Asynchronous ReadFile 1. Fork a child task 2. Child synchronously waits for the read to complete 3. And then queues an APC to the parent CHESS scheduler interleaves the parent task and the child task Handles the synchronous case when the child finishes before the parent returns from ReadFile
21
CHESS architecture CHESS Scheduler CHESS Scheduler Unmanaged Program Unmanaged Program Windows Managed Program Managed Program CLR CHESS Exploration Engine CHESS Exploration Engine Win32 Wrappers.NET Wrappers
22
x = 1; … y = k; x = 1; … y = k; State space explosion x = 1; … y = k; x = 1; … y = k; … n threads k steps each Number of executions = O( n nk ) Exponential in both n and k Typically: n 100 Limits scalability to large programs Goal: Scale CHESS to large programs (large k)
23
x = 1; if (p != 0) { x = p->f; } x = 1; if (p != 0) { x = p->f; } Preemption bounding CHESS, by default, is a non-preemptive, starvation-free scheduler Execute huge chunks of code atomically Systematically insert a small number preemptions Preemptions are context switches forced by the scheduler e.g. Time-slice expiration Non-preemptions – a thread voluntarily yields e.g. Blocking on an unavailable lock, thread end x = p->f; } x = p->f; } x = 1; if (p != 0) { x = 1; if (p != 0) { p = 0; preemption non-preemption
24
Polynomial state space Terminating program with fixed inputs and deterministic threads n threads, k steps each, c preemptions Number of executions <= nk C c. (n+c)! = O( (n 2 k) c. n! ) Exponential in n and c, but not in k x = 1; … y = k; x = 1; … y = k; x = 1; … y = k; x = 1; … y = k; x = 1; … x = 1; … x = 1; … x = 1; … y = k; … y = k; … y = k; Choose c preemption points Permute n+c atomic blocks
25
Advantages of preemption bounding Most errors are caused by few (<2) preemptions Generates an easy to understand error trace Preemption points almost always point to the root-cause of the bug Leads to good heuristics Insert more preemptions in code that needs to be tested Avoid preemptions in libraries Insert preemptions in recently modified code A good coverage guarantee to the user When CHESS finishes exploration with 2 preemptions, any remaining bug requires 3 preemptions or more
26
Does CHESS scale? The scheduler definitely does Can attach to programs like Singularity, IE, Windows Graphics framework Found and reproduced (unkown) bugs in all of them The exploration engine? yes. Preemption bounding with heuristics does a good job CHESS has reproduced any Heisenbug reported to us so far Can also be because of “low hanging fruits” Better heuristics, reduction strategies, and massive parallelization will help
27
CHESS Demo Find and reproduce a Heisenbug in CCR CCR = Concurrency Coordination Runtime
28
CCR is prevalently used web request load balancing & IO handling real time inversion of seismic data to control their drilling security systems package sorting system out-of-stock shelf inspection system law enforcement intercept Supply Chain Modeling Trading systems
29
A stress test fails…
30
Bugs found and reproduced with CHESS ProgramLines of codeBugs Work Stealing Q4K4 CDS6K1 CCR9K3 ConcRT16K4 Dryad18K7 APE19K4 STM20K2 TPL24K9 PLINQ24K1 Singularity175K2
31
Bugs found and reproduced with CHESS ProgramLines of codeBugs Work Stealing Q4K4 CDS6K1 CCR9K3 ConcRT16K4 Dryad18K7 APE19K4 STM20K2 TPL24K9 PLINQ24K1 Singularity175K2 Acknowledgement: testers from PCP team
32
Current status CHESS will be shipped as an add-on to Visual Studio http://msdn.microsoft.com/devlabs Command line version with academic license at http://research.microsoft.com/CHESS
33
Conclusions Don’t stress, use CHESS Systematic exploration of scheduling nondeterminism can be more effective than stress Biggest bottlenecks: Supporting CHESS for new platforms/APIs APIs should specify the nondeterminism exposed Writing test harness and generating inputs Currently done manually
34
Questions
35
Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Sober FeatherLite Concurrency Explorer
36
Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Makes CHESS effective on cyclic state spaces Enables CHESS to find liveness violations (livelocks) Sober FeatherLite Concurrency Explorer
37
Concurrent programs have cyclic state spaces Spinlocks Non-blocking algorithms Implementations of synchronization primitives Periodic timers … L1: while( ! done) { L2: Sleep(); } L1: while( ! done) { L2: Sleep(); } M1: done = 1; ! done L2 ! done L2 ! done L1 ! done L1 done L2 done L2 done L1 done L1
38
A demonic scheduler unrolls any cycle ad-infinitum ! done done ! done done ! done done while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1; ! done
39
Depth bounding ! done done ! done done ! done done ! done Prune executions beyond a bounded number of steps Depth bound
40
Problem 1: Ineffective state coverage ! done Bound has to be large enough to reach the deepest bug Typically, greater than 100 synchronization operations Every unrolling of a cycle redundantly explores reachable state space Depth bound
41
Problem 2: Cannot find livelocks Livelocks : lack of progress in a program temp = done; while( ! temp) { Sleep(); } temp = done; while( ! temp) { Sleep(); } done = 1;
42
Key idea This test terminates only when the scheduler is fair Fairness is assumed by programmers All cycles in correct programs are unfair A fair cycle is a livelock while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1; ! done done
43
We need a fair demonic scheduler Avoid unrolling unfair cycles Effective state coverage Detect fair cycles Find livelocks Concurrent Program Test Harness Win32 API Demonic Scheduler Demonic Scheduler Fair Demonic Scheduler Fair Demonic Scheduler
44
What notion of “fairness” do we use?
45
Weak fairness Forall t :: GF ( enabled(t) scheduled(t) ) A thread that remains enabled should eventually be scheduled A weakly-fair scheduler will eventually schedule Thread 2 Example: round-robin while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1;
46
Weak fairness does not suffice Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); done = 1; Unlock( l ); Lock( l ); done = 1; Unlock( l ); en = {T1, T2} T1: Sleep() T2: Lock( l ) en = {T1, T2} T1: Lock( l ) T2: Lock( l ) en = { T1 } T1: Unlock( l ) T2: Lock( l ) en = {T1, T2} T1: Sleep() T2: Lock( l )
47
Strong Fairness Forall t :: GF enabled(t) GF scheduled(t) A thread that is enabled infinitely often is scheduled infinitely often Thread 2 is enabled and competes for the lock infinitely often Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); done = 1; Unlock( l ); Lock( l ); done = 1; Unlock( l );
48
Good Samaritan violation Thread yield the processor when not making progress Forall threads t : GF scheduled(t) GF yield(t) Found many such violations, including one in the Singularity boot process Results in “sluggish I/O” behavior during bootup while( ! done) { ; } while( ! done) { ; } done = 1;
49
Results: Achieves more coverage faster With fairness Without fairness, with depth bound 2030405060 States Explored 1726871150517261307683 Percentage Coverage 100%50%87%100%76%40% Time (secs) 143977632531>5000 Work stealing queue with one stealer
50
Finding livelocks and finding (not missing) safety violations ProgramLines of codeSafety BugsLivelocks Work Stealing Q4K4 CDS6K1 CCR9K12 ConcRT16K22 Dryad18K7 APE19K4 STM20K2 TPL24K45 PLINQ24K1 Singularity175K2 26 (total)11 (total) Acknowledgement: testers from PCP team
51
Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Makes CHESS effective on cyclic state spaces Enables CHESS to find liveness violations (livelocks) Sober Detect relaxed-memory model errors Do not miss behaviors only possible in a relaxed memory model FeatherLite Concurrency Explorer
52
Single slide on Sober Relaxed memory verification problem Is P correct on a relaxed memory model Sober: split the problem into two parts Is P correct on a sequentially consistent (SC) machine Is P sequentially consistent on a relaxed memory model Check this while only exploring SC executions CAV ‘08 solves the problem for a memory model with store buffers (TSO) EC2 ‘08 extends this approach to a general class of memory models
53
Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Makes CHESS effective on cyclic state spaces Enables CHESS to find liveness violations (livelocks) Sober Detect relaxed-memory model errors Do not miss behaviors only possible in a relaxed memory model FeatherLite A light-weight data-race detection engine (<20% overhead) Concurrency Explorer
54
Single slide on FeatherLite Current data-race detection tools are slow Process every memory access done by the program One in 5 instructions access memory 1 billion accesses/sec Key idea: Do smart adaptive sampling of memory accesses Naïve sampling does not work, need to sample both racing instructions Cold-path hypothesis: At least one of the racing instructions occurs in a cold path Races between fast-paths are most probably benign FeatherLite adaptively samples cold-paths at 100% rate and hot-paths at 0.1% rate Finds 70% of the data-races with <20% runtime overhead Existing data-race detection tools >10X overhead
55
Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Makes CHESS effective on cyclic state spaces Enables CHESS to find liveness violations (livelocks) Sober Detect relaxed-memory model errors Do not miss behaviors only possible in a relaxed memory model FeatherLite A light-weight data-race detection engine (<20% overhead) Concurrency Explorer First-class concurrency debugging
56
Concurrency explorer Single-step over a thread interleaving Inspect program states at each step Program state = Stack of all threads + globals Limited bi-directional debugging Interleaving slices for better understanding Working on: Closer integration with the Visual Studio debugger Explore neighborhood interleavings
57
Conclusion Don’t stress, use CHESS CHESS binary and papers available at http://research.microsoft.com/CHESS http://research.microsoft.com/CHESS
58
Points to get across Capturing non-determinism Sync-orders, data-races, hardware interleavings Adding elastic delay Soundness & completeness Scoping Preemptions
59
Questions Did you find new bugs How is this different from your previous papers How is this different from previous mc efforts How is this different from
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.