CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer Microsoft Research Interns: Gerard Basler (ETH Zurich),

CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer Microsoft Research Interns: Gerard Basler (ETH Zurich), Katie Coons (U. T. Austin), Tayfun Elmas (Koc University), P. Arumuga Nainar (U. Wisc. Madison), Iulian Neamtiu (U. Maryland, U.C. Riverside)

Concurrency is HARD Rare thread interleavings can result in bugs These bugs are hard to find, reproduce, and debug Heisenbugs: Observing the bug can “fix” it ! A huge productivity problem Developers and testers can spend weeks chasing a single Heisenbug

Demo Let’s find a simple concurrency bug

CHESS motivation Today: concurrency testing == stress testing Stress increases the interleaving variety, but Not predictable → Heisenbugs Not systematic → poor coverage of interleavings

Don’t stress, use CHESS Basic primitive: Drive a program along an interleaving of choice Interleaving can be decided by a program or a user Doing this today is surprisingly hard Use model checking techniques to systematically enumerate thread interleavings

CHESS architecture CHESS Scheduler CHESS Scheduler Memory Model bugs Memory Model bugs Monitors Coverage Repro Testing Data races Data races Debugging Visualization Unmanaged Program Unmanaged Program Windows Managed Program Managed Program.NET CLR Record the interleaving executed Drive the program along an interleaving Record the interleaving executed Drive the program along an interleaving

Talk outline Introduction Preemption bounding [PLDI ‘07] Tackling state space explosion Fair stateless model checking [PLDI ‘08] Handling cycles in states spaces CHESS architecture details [OSDI ‘08]

Enumerating thread interleavings x = 1; y = 1; x = 1; y = 1; x = 2; y = 2; x = 2; y = 2; 2,1 1,0 0,0 1,1 2,2 2,1 2,0 2,1 2,2 1,2 2,0 2,2 1,1 1,2 1,0 1,2 1,1 y = 1; x = 1; y = 2; x = 2;

Stateless model checking [Verisoft] Systematically enumerate all paths in a state-space graph Don’t capture program states Capturing states is extremely hard for large programs State = globals, heap, stack, registers, kernel, filesystem, other processes, other machines,… Very effective on acyclic state spaces Termination is guaranteed Potentially revisits program states Partial-order reduction alleviates redundant exploration

x = 1; … y = k; x = 1; … y = k; State space explosion x = 1; … y = k; x = 1; … y = k; … n threads k steps each Number of executions = O( n nk ) Exponential in both n and k Typically: n 100 Limits scalability to large programs Goal: Scale CHESS to large programs (large k)

x = 1; if (p != 0) { x = p->f; } x = 1; if (p != 0) { x = p->f; } Preemption bounding Prioritize executions with small number of preemptions Preemption is a context switch forced by the scheduler Unexpected by the programmer e.g. Time-slice expiration Hypothesis: most concurrency bugs result from few preemptions x = p->f; } x = p->f; } x = 1; if (p != 0) { x = 1; if (p != 0) { p = 0; preemption non-preemption

Polynomial state space Terminating program with fixed inputs and deterministic threads n threads, k steps each, c preemptions Number of executions <= nk C c. (n+c)! = O( (n 2 k) c. n! ) Exponential in n and c, but not in k x = 1; … y = k; x = 1; … y = k; x = 1; … y = k; x = 1; … y = k; x = 1; … x = 1; … x = 1; … x = 1; … y = k; … y = k; … y = k; Choose c preemption points Permute n+c atomic blocks

Find lots of bugs with 2 preemptions ProgramLines of codeBugs Work Stealing Q4K4 CDS6K1 CCR9K3 ConcRT16K4 Dryad18K7 APE19K4 STM20K2 TPL24K9 PLINQ24K1 Singularity175K2 37 (total) Acknowledgement: testers from PCP team

Good coverage metric When CHESS completes search with c preemptions Any remaining bug requires c+1 or more preemptions Two preemptions sufficient to reproduced all stress- test failures, reported so far

Concurrent programs have cyclic state spaces Spinlocks Non-blocking algorithms Implementations of synchronization primitives Periodic timers … L1: while( ! done) { L2: Sleep(); } L1: while( ! done) { L2: Sleep(); } M1: done = 1; ! done L2 ! done L2 ! done L1 ! done L1 done L2 done L2 done L1 done L1

A demonic scheduler unrolls any cycle ad-infinitum ! done done ! done done ! done done while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1; ! done

Depth bounding ! done done ! done done ! done done ! done Prune executions beyond a bounded number of steps Depth bound

Problem 1: Ineffective state coverage ! done Bound has to be large enough to reach the deepest bug Typically, greater than 100 synchronization operations Every unrolling of a cycle redundantly explores reachable state space Depth bound

Problem 2: Cannot find livelocks Livelocks : lack of progress in a program temp = done; while( ! temp) { Sleep(); } temp = done; while( ! temp) { Sleep(); } done = 1;

Fair stateless model checking Make stateless model checking effective on cyclic state spaces Effective state coverage Detect livelocks

Key idea This test terminates only when the scheduler is fair Fairness is assumed by programmers All cycles in correct programs are unfair A fair cycle is a livelock while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1; ! done done

Key idea This test terminates only when the scheduler is fair Fairness is assumed by programmers CHESS should only explore fair schedules while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1; ! done done

What notion of fairness?

Weak fairness Forall t :: GF ( enabled(t)  scheduled(t) ) A thread that remains enabled should eventually be scheduled A weakly-fair scheduler will eventually schedule Thread 2 Example: round-robin, FIFO wait queues while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1;

Weak fairness does not suffice Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); done = 1; Unlock( l ); Lock( l ); done = 1; Unlock( l ); en = {T1, T2} T1: Sleep() T2: Lock( l ) en = {T1, T2} T1: Lock( l ) T2: Lock( l ) en = { T1 } T1: Unlock( l ) T2: Lock( l ) en = {T1, T2} T1: Sleep() T2: Lock( l )

Strong Fairness Forall t :: GF enabled(t)  GF scheduled(t) A thread that is enabled infinitely often is scheduled infinitely often Thread 2 is enabled and competes for the lock infinitely often Example: a round-robin scheduler with priorities [Apt & Olderog ‘83] Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); done = 1; Unlock( l ); Lock( l ); done = 1; Unlock( l );

Constructing a strongly fair scheduler A round-robin scheduler is not strongly fair It is only weakly fair Extend a round-robin scheduler with priorities [Apt & Olderog ‘83]

CHESS also needs to be demonic Cannot generate all fair schedules There are infinitely many, even for simple programs It is sufficient to generate enough fair schedules to Explore all states (safety coverage) Explore at least one fair cycle, if any (livelock coverage) Do it without capturing the program states

Fair stateless model checking Given a concurrent program Q and a safety property P Q does not necessarily have an acyclic state space Determine Q satisfies P and Q is fair-terminating (livelock-free) Without capturing program states

(Good) Programs indicate lack of progress Good Samaritan assumption: Forall threads t : GF scheduled(t)  GF yield(t) A thread when scheduled infinitely often yields the processor infinitely often Examples of yield: Sleep(), ScheduleThread(), asm {rep nop;} Thread completion while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1;

Robustness of the Good Samaritan assumption A violation of the Good Samaritan assumption is a performance error Programs are parsimonious in the use of yields A Sleep() almost always indicates a lack of progress Implies that the thread is stuck in a state-space cycle while( ! done) { ; } while( ! done) { ; } done = 1;

Fair demonic scheduler (outline) Maintain a priority-order (a partial-order) on threads A < B means that A will not be scheduled in a state where B is enabled Threads get a lower priority only when they yield Scheduler is fully demonic on yield-free paths A thread loses its priority once it executes Remove all edges t < A when A executes

Four outcomes of the semi-algorithm Terminates without finding any errors Terminates with a safety violation Diverges with an infinite execution that violates the GS assumption (a performance error) that is strongly-fair (a livelock) In practice: detect infinite executions by a very long execution

Coverage Theorem: The algorithm achieves full coverage if every state is reachable by a yield-free path, and Exists a fair cycle iff exists a fair cycle with at most one yield per thread

Results: Achieves more coverage faster With fairness Without fairness, with depth bound 2030405060 States Explored 1726871150517261307683 Percentage Coverage 100%50%87%100%76%40% Time (secs) 143977632531>5000 Work stealing queue with one stealer

Livelocks in Singularity(both fixed) A thread needlessly burns its CPU quantum in a spin- loop “it's a bug that we think we have seen in practice, but that would have been very difficult to find through normal means” [Dean Tribble] An infinite loop in the Promise implementation Manifested as a non-reproducible problem in an existing stress-test CHESS found the bug in a simple test harness with a repeatable error-trace

CHESS architecture recap CHESS Scheduler CHESS Scheduler Memory Model bugs Memory Model bugs Monitors Coverage Repro Testing Data races Data races Debugging Visualization Unmanaged Program Unmanaged Program Windows Managed Program Managed Program.NET CLR Record the interleaving executed Drive the program along an interleaving Record the interleaving executed Drive the program along an interleaving

Capture the ‘happens-before’ graph Happens-before graph captures all communication between threads in a concurrent execution Abstracts time: For a given input, two executions that result in the same happens-before graph are behaviorally equivalent x = 1 t = x; wait(e) setEvent(e)

Enforce a single-threaded execution Happens-before graph is a partial-order can be converted to a totally-ordered single threaded execution Big performance win Data-accesses are automatically ordered by synchronization events Don’t need to instrument data-accesses Cannot explore non-sequentially consistent executions of a program Resulting from relaxed memory model of the hardware Sober: A tool that detects the presence of such executions [CAV ‘08]

Directing the execution Given a happens-before graph Block the execution of a synchronization if it produces an edge not in the graph Need to understand the precise semantics of synchronization operations

Conclusion Message to concurrency programmers Think seriously about interleaving coverage Message to system/PL researchers Concurrency APIs should have a clear specification of the nondeterminism exposed Don’t stress, use CHESS CHESS binary available for academic use http://research.microsoft.com/CHESS CHESS will be shipped for commercial use, very soon http://msdn.microsoft.com/devlabs CHESS is extensible Use CHESS scheduler for concurrency tools Plug in new search algorithms

Questions

A stress test fails…

CHESS reproduces the bug in 2 mins

CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer Microsoft Research Interns: Gerard Basler (ETH Zurich),

Similar presentations

Presentation on theme: "CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer Microsoft Research Interns: Gerard Basler (ETH Zurich),"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer Microsoft Research Interns: Gerard Basler (ETH Zurich),

Similar presentations

Presentation on theme: "CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer Microsoft Research Interns: Gerard Basler (ETH Zurich),"— Presentation transcript:

Similar presentations

About project

Feedback