CHESS: Find and Reproduce Heisenbugs in Concurrent Programs Tom Ball, Sebastian Burckhardt, Peli de Halleux, Madan Musuvathi, Shaz Qadeer Microsoft Research.

CHESS: Find and Reproduce Heisenbugs in Concurrent Programs Tom Ball, Sebastian Burckhardt, Peli de Halleux, Madan Musuvathi, Shaz Qadeer Microsoft Research

The Heisenbug problem Concurrent executions are highly nondeterminisitic Rare thread interleavings result in Heisenbugs Difficult to find, reproduce, and debug Observing the bug can “fix” it Likelihood of interleavings changes, say, when you add printfs A huge productivity problem Developers and testers can spend weeks chasing a single Heisenbug

CHESS in a nutshell CHESS is a user-mode scheduler Controls all scheduling nondeterminism Replace the OS scheduler Guarantees: Every program run takes a different thread interleaving Reproduce the interleaving for every run

CHESS Demo Find a simple Heisenbug

CHESS architecture CHESS Scheduler CHESS Scheduler Unmanaged Program Unmanaged Program Windows Managed Program Managed Program CLR Windows Kernel Windows Kernel Kernel Sync. Every run takes a different interleaving Reproduce the interleaving for every run CHESS Exploration Engine CHESS Exploration Engine Win32 Wrappers.NET Wrappers

High level goals Scale to large programs Any error found by CHESS is possible in the wild CHESS does not introduce any new behaviors Any error found in the wild can be found by CHESS Need to capture all sources of nondeterminism Exhaustively explore the nondeterminism (state explosion) e.g. Enumerate all thread interleavings Hard to achieve Practical goal: beat stress

Errors that CHESS can find Assertions in the code Any dynamic monitor that you run Memory leaks, double-free detector, … Deadlocks Program enters a state where no thread is enabled Livelocks Program runs for a long time without making progress

CHESS architecture CHESS Scheduler CHESS Scheduler Unmanaged Program Unmanaged Program Windows Managed Program Managed Program CLR CHESS Exploration Engine CHESS Exploration Engine Win32 Wrappers.NET Wrappers Capture scheduling nondeterminism Drive the program along an interleaving of choice

Running Example Lock (l); bal += x; Unlock(l); Lock (l); bal += x; Unlock(l); Lock (l); t = bal; Unlock(l); Lock (l); bal = t - y; Unlock(l); Lock (l); t = bal; Unlock(l); Lock (l); bal = t - y; Unlock(l); Thread 1Thread 2

Introduce Schedule() points Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Thread 1Thread 2 Instrument calls to the CHESS scheduler Each call is a potential preemption point

First-cut solution: Random sleeps Introduce random sleep at schedule points Does not introduce new behaviors Sleep models a possible preemption at each location Sleeping for a finite amount guarantees starvation-freedom Sleep(rand()); Lock (l); bal += x; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); bal += x; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); t = bal; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); bal = t - y; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); t = bal; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); bal = t - y; Sleep(rand()); Unlock(l); Thread 1Thread 2

Improvement 1: Capture the “happens-before” graph Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Thread 1Thread 2 Delays that result in the same “happens-before” graph are equivalent Avoid exploring equivalent interleavings Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Sleep(5)

Improvement 2: Understand synchronization semantics Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Thread 1Thread 2 Avoid exploring delays that are impossible Identify when threads can make progress CHESS maintains a run queue and a wait queue Mimics OS scheduler state Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Lock (l); t = bal;

Emulate execution on a uniprocessor Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Thread 1Thread 2 Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Enable only one thread at a time Linearizes a partial-order into a total-order Controls the order of data- races

CHESS modes: speed vs coverage Fast-mode Introduce schedule points before synchronizations, volatile accesses, and interlocked operations Finds many bugs in practice Data-race mode Introduce schedule points before memory accesses Finds race-conditions due to data races Captures all sequentially consistent (SC) executions

Capture all sources of nondeterminism? No. Scheduling nondeterminism? Yes Timing nondeterminism? Yes Controls when and in what order the timers fire Nondeterministic system calls? Mostly CHESS uses precise abstractions for many system calls Input nondeterminism? No Rely on users to provide inputs Program inputs, return values of system calls, files read, packets received,… Good tradeoff in the short term But can’t find race-conditions on error handling code Future extensions using symbolic execution? (DART, jCUTE, SAGE, PEX)

Capture all sources of nondeterminism? No. Hardware relaxations? Yes Hardware can reorder instructions Non-SC executions possible in programs with data races Sober [CAV ‘08] can detect and explore such non-SC executions Compiler relaxations? No Non-SC executions possible in Java programs with data races For C# programs, I don’t know Extending Sober to handle this is an ongoing effort

CHESS architecture CHESS Scheduler CHESS Scheduler Unmanaged Program Unmanaged Program Windows Managed Program Managed Program CLR CHESS Exploration Engine CHESS Exploration Engine Win32 Wrappers.NET Wrappers

CHESS wrappers Translate Win32/.NET synchronizations Into CHESS scheduler abstractions Tasks : schedulable entities Threads, threadpool work items, async. callbacks, timer functions SyncVars : resources used by tasks Generate happens-before edges during execution Executable specification for complex APIs Most time consuming and error-prone part of CHESS Enables CHESS to handle multiple platforms

CHESS Wrapper example Asynchronous ReadFile 1. Fork a child task 2. Child synchronously waits for the read to complete 3. And then queues an APC to the parent CHESS scheduler interleaves the parent task and the child task Handles the synchronous case when the child finishes before the parent returns from ReadFile

CHESS architecture CHESS Scheduler CHESS Scheduler Unmanaged Program Unmanaged Program Windows Managed Program Managed Program CLR CHESS Exploration Engine CHESS Exploration Engine Win32 Wrappers.NET Wrappers

x = 1; … y = k; x = 1; … y = k; State space explosion x = 1; … y = k; x = 1; … y = k; … n threads k steps each Number of executions = O( n nk ) Exponential in both n and k Typically: n 100 Limits scalability to large programs Goal: Scale CHESS to large programs (large k)

x = 1; if (p != 0) { x = p->f; } x = 1; if (p != 0) { x = p->f; } Preemption bounding CHESS, by default, is a non-preemptive, starvation-free scheduler Execute huge chunks of code atomically Systematically insert a small number preemptions Preemptions are context switches forced by the scheduler e.g. Time-slice expiration Non-preemptions – a thread voluntarily yields e.g. Blocking on an unavailable lock, thread end x = p->f; } x = p->f; } x = 1; if (p != 0) { x = 1; if (p != 0) { p = 0; preemption non-preemption

Polynomial state space Terminating program with fixed inputs and deterministic threads n threads, k steps each, c preemptions Number of executions <= nk C c. (n+c)! = O( (n 2 k) c. n! ) Exponential in n and c, but not in k x = 1; … y = k; x = 1; … y = k; x = 1; … y = k; x = 1; … y = k; x = 1; … x = 1; … x = 1; … x = 1; … y = k; … y = k; … y = k; Choose c preemption points Permute n+c atomic blocks

Advantages of preemption bounding Most errors are caused by few (<2) preemptions Generates an easy to understand error trace Preemption points almost always point to the root-cause of the bug Leads to good heuristics Insert more preemptions in code that needs to be tested Avoid preemptions in libraries Insert preemptions in recently modified code A good coverage guarantee to the user When CHESS finishes exploration with 2 preemptions, any remaining bug requires 3 preemptions or more

Does CHESS scale? The scheduler definitely does Can attach to programs like Singularity, IE, Windows Graphics framework Found and reproduced (unkown) bugs in all of them The exploration engine? yes. Preemption bounding with heuristics does a good job CHESS has reproduced any Heisenbug reported to us so far Can also be because of “low hanging fruits” Better heuristics, reduction strategies, and massive parallelization will help

CHESS Demo Find and reproduce a Heisenbug in CCR CCR = Concurrency Coordination Runtime

CCR is prevalently used web request load balancing & IO handling real time inversion of seismic data to control their drilling security systems package sorting system out-of-stock shelf inspection system law enforcement intercept Supply Chain Modeling Trading systems

A stress test fails…

Bugs found and reproduced with CHESS ProgramLines of codeBugs Work Stealing Q4K4 CDS6K1 CCR9K3 ConcRT16K4 Dryad18K7 APE19K4 STM20K2 TPL24K9 PLINQ24K1 Singularity175K2

Bugs found and reproduced with CHESS ProgramLines of codeBugs Work Stealing Q4K4 CDS6K1 CCR9K3 ConcRT16K4 Dryad18K7 APE19K4 STM20K2 TPL24K9 PLINQ24K1 Singularity175K2 Acknowledgement: testers from PCP team

Current status CHESS will be shipped as an add-on to Visual Studio http://msdn.microsoft.com/devlabs Command line version with academic license at http://research.microsoft.com/CHESS

Conclusions Don’t stress, use CHESS Systematic exploration of scheduling nondeterminism can be more effective than stress Biggest bottlenecks: Supporting CHESS for new platforms/APIs APIs should specify the nondeterminism exposed Writing test harness and generating inputs Currently done manually

Questions

Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Sober FeatherLite Concurrency Explorer

Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Makes CHESS effective on cyclic state spaces Enables CHESS to find liveness violations (livelocks) Sober FeatherLite Concurrency Explorer

Concurrent programs have cyclic state spaces Spinlocks Non-blocking algorithms Implementations of synchronization primitives Periodic timers … L1: while( ! done) { L2: Sleep(); } L1: while( ! done) { L2: Sleep(); } M1: done = 1; ! done L2 ! done L2 ! done L1 ! done L1 done L2 done L2 done L1 done L1

A demonic scheduler unrolls any cycle ad-infinitum ! done done ! done done ! done done while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1; ! done

Depth bounding ! done done ! done done ! done done ! done Prune executions beyond a bounded number of steps Depth bound

Problem 1: Ineffective state coverage ! done Bound has to be large enough to reach the deepest bug Typically, greater than 100 synchronization operations Every unrolling of a cycle redundantly explores reachable state space Depth bound

Problem 2: Cannot find livelocks Livelocks : lack of progress in a program temp = done; while( ! temp) { Sleep(); } temp = done; while( ! temp) { Sleep(); } done = 1;

Key idea This test terminates only when the scheduler is fair Fairness is assumed by programmers All cycles in correct programs are unfair A fair cycle is a livelock while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1; ! done done

We need a fair demonic scheduler Avoid unrolling unfair cycles Effective state coverage Detect fair cycles Find livelocks Concurrent Program Test Harness Win32 API Demonic Scheduler Demonic Scheduler Fair Demonic Scheduler Fair Demonic Scheduler

What notion of “fairness” do we use?

Weak fairness Forall t :: GF ( enabled(t)  scheduled(t) ) A thread that remains enabled should eventually be scheduled A weakly-fair scheduler will eventually schedule Thread 2 Example: round-robin while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1;

Weak fairness does not suffice Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); done = 1; Unlock( l ); Lock( l ); done = 1; Unlock( l ); en = {T1, T2} T1: Sleep() T2: Lock( l ) en = {T1, T2} T1: Lock( l ) T2: Lock( l ) en = { T1 } T1: Unlock( l ) T2: Lock( l ) en = {T1, T2} T1: Sleep() T2: Lock( l )

Strong Fairness Forall t :: GF enabled(t)  GF scheduled(t) A thread that is enabled infinitely often is scheduled infinitely often Thread 2 is enabled and competes for the lock infinitely often Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); done = 1; Unlock( l ); Lock( l ); done = 1; Unlock( l );

Good Samaritan violation Thread yield the processor when not making progress Forall threads t : GF scheduled(t)  GF yield(t) Found many such violations, including one in the Singularity boot process Results in “sluggish I/O” behavior during bootup while( ! done) { ; } while( ! done) { ; } done = 1;

Results: Achieves more coverage faster With fairness Without fairness, with depth bound 2030405060 States Explored 1726871150517261307683 Percentage Coverage 100%50%87%100%76%40% Time (secs) 143977632531>5000 Work stealing queue with one stealer

Finding livelocks and finding (not missing) safety violations ProgramLines of codeSafety BugsLivelocks Work Stealing Q4K4 CDS6K1 CCR9K12 ConcRT16K22 Dryad18K7 APE19K4 STM20K2 TPL24K45 PLINQ24K1 Singularity175K2 26 (total)11 (total) Acknowledgement: testers from PCP team

Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Makes CHESS effective on cyclic state spaces Enables CHESS to find liveness violations (livelocks) Sober Detect relaxed-memory model errors Do not miss behaviors only possible in a relaxed memory model FeatherLite Concurrency Explorer

Single slide on Sober Relaxed memory verification problem Is P correct on a relaxed memory model Sober: split the problem into two parts Is P correct on a sequentially consistent (SC) machine Is P sequentially consistent on a relaxed memory model Check this while only exploring SC executions CAV ‘08 solves the problem for a memory model with store buffers (TSO) EC2 ‘08 extends this approach to a general class of memory models

Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Makes CHESS effective on cyclic state spaces Enables CHESS to find liveness violations (livelocks) Sober Detect relaxed-memory model errors Do not miss behaviors only possible in a relaxed memory model FeatherLite A light-weight data-race detection engine (<20% overhead) Concurrency Explorer

Single slide on FeatherLite Current data-race detection tools are slow Process every memory access done by the program One in 5 instructions access memory  1 billion accesses/sec Key idea: Do smart adaptive sampling of memory accesses Naïve sampling does not work, need to sample both racing instructions Cold-path hypothesis: At least one of the racing instructions occurs in a cold path Races between fast-paths are most probably benign FeatherLite adaptively samples cold-paths at 100% rate and hot-paths at 0.1% rate Finds 70% of the data-races with <20% runtime overhead Existing data-race detection tools >10X overhead

Outline Preemption bounding Makes CHESS effective on deep state spaces Fair stateless model checking Makes CHESS effective on cyclic state spaces Enables CHESS to find liveness violations (livelocks) Sober Detect relaxed-memory model errors Do not miss behaviors only possible in a relaxed memory model FeatherLite A light-weight data-race detection engine (<20% overhead) Concurrency Explorer First-class concurrency debugging

Concurrency explorer Single-step over a thread interleaving Inspect program states at each step Program state = Stack of all threads + globals Limited bi-directional debugging Interleaving slices for better understanding Working on: Closer integration with the Visual Studio debugger Explore neighborhood interleavings

Conclusion Don’t stress, use CHESS CHESS binary and papers available at http://research.microsoft.com/CHESS http://research.microsoft.com/CHESS

Points to get across Capturing non-determinism Sync-orders, data-races, hardware interleavings Adding elastic delay Soundness & completeness Scoping Preemptions

Questions Did you find new bugs How is this different from your previous papers How is this different from previous mc efforts How is this different from

CHESS: Find and Reproduce Heisenbugs in Concurrent Programs Tom Ball, Sebastian Burckhardt, Peli de Halleux, Madan Musuvathi, Shaz Qadeer Microsoft Research.

Similar presentations

Presentation on theme: "CHESS: Find and Reproduce Heisenbugs in Concurrent Programs Tom Ball, Sebastian Burckhardt, Peli de Halleux, Madan Musuvathi, Shaz Qadeer Microsoft Research."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CHESS: Find and Reproduce Heisenbugs in Concurrent Programs Tom Ball, Sebastian Burckhardt, Peli de Halleux, Madan Musuvathi, Shaz Qadeer Microsoft Research.

Similar presentations

Presentation on theme: "CHESS: Find and Reproduce Heisenbugs in Concurrent Programs Tom Ball, Sebastian Burckhardt, Peli de Halleux, Madan Musuvathi, Shaz Qadeer Microsoft Research."— Presentation transcript:

Similar presentations

About project

Feedback