CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer Microsoft Research Interns: Gerard Basler (ETH Zurich),

Slides:



Advertisements
Similar presentations
CHESS : Systematic Testing of Concurrent Programs
Advertisements

A Technique for Parallel Reachability Analysis of Java Programs Raghuraman R. Sridhar Iyer G. Sajith.
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Partial Order Reduction: Main Idea
The complexity of predicting atomicity violations Azadeh Farzan Univ of Toronto P. Madhusudan Univ of Illinois at Urbana Champaign.
1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.
Testing Concurrent/Distributed Systems Review of Final CEN 5076 Class 14 – 12/05.
PSWLAB S PIN Search Algorithm from “THE SPIN MODEL CHECKER” by G Holzmann Presented by Hong,Shin 9 th Nov SPIN Search Algorithm.
1 Temporal Claims A temporal claim is defined in Promela by the syntax: never { … body … } never is a keyword, like proctype. The body is the same as for.
CHESS: Systematic Concurrency Testing Tom Ball, Sebastian Burckhardt, Madan Musuvathi, Shaz Qadeer Microsoft Research
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
CHESS: A Systematic Testing Tool for Concurrent Software CSCI6900 George.
Tom Ball, Sebastian Burckhardt, Madan Musuvathi, Shaz Qadeer Microsoft Research.
 Thomas Ball Principal Researcher Microsoft Corporation  Sebastian Burckhardt Researcher Microsoft Corporation  Madan Musuvathi Researcher Microsoft.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
API Design CPSC 315 – Programming Studio Fall 2008 Follows Kernighan and Pike, The Practice of Programming and Joshua Bloch’s Library-Centric Software.
Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.
CHESS: Find and Reproduce Heisenbugs in Concurrent Programs Tom Ball, Sebastian Burckhardt, Peli de Halleux, Madan Musuvathi, Shaz Qadeer Microsoft Research.
Ordering and Consistent Cuts Presented By Biswanath Panda.
Tools for Investigating Graphics System Performance
Chapter 8 – Processor Scheduling Outline 8.1 Introduction 8.2Scheduling Levels 8.3Preemptive vs. Nonpreemptive Scheduling 8.4Priorities 8.5Scheduling Objectives.
1 Today Another approach to “coverage” Cover “everything” – within a well-defined, feasible limit Bounded Exhaustive Testing.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
CPS110: Implementing threads/locks on a uni-processor Landon Cox.
Applying Dynamic Analysis to Test Corner Cases First Penka Vassileva Markova Madanlal Musuvathi.
Concurrency Testing Challenges, Algorithms, and Tools Madan Musuvathi Microsoft Research.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
Taming Concurrency: A Program Verification Perspective Shaz Qadeer Microsoft Research.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear.
CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Thread Synchronization: Too Much Milk. 2 Implementing Critical Sections in Software Hard The following example will demonstrate the difficulty of providing.
1 VeriSoft A Tool for the Automatic Analysis of Concurrent Reactive Software Represents By Miller Ofer.
Lecture 2 Foundations and Definitions Processes/Threads.
Yang Liu, Jun Sun and Jin Song Dong School of Computing National University of Singapore.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Inferring Synchronization under Limited Observability Martin Vechev, Eran Yahav, Greta Yorsh IBM T.J. Watson Research Center (work in progress)
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Hwajung Lee. Well, you need to capture the notions of atomicity, non-determinism, fairness etc. These concepts are not built into languages like JAVA,
Hwajung Lee. Why do we need these? Don’t we already know a lot about programming? Well, you need to capture the notions of atomicity, non-determinism,
CS265: Dynamic Partial Order Reduction Koushik Sen UC Berkeley.
CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov August 30, 2011.
Detecting Atomicity Violations via Access Interleaving Invariants
CSE 153 Design of Operating Systems Winter 2015 Midterm Review.
Lecture Topics: 11/15 CPU scheduling: –Scheduling goals and algorithms.
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
/ PSWLAB Thread Modular Model Checking by Cormac Flanagan and Shaz Qadeer (published in Spin’03) Hong,Shin Thread Modular Model.
Slides created by: Professor Ian G. Harris Operating Systems  Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled.
CHESS Finding and Reproducing Heisenbugs in Concurrent Programs
Random Test Generation of Unit Tests: Randoop Experience
Agenda  Quick Review  Finish Introduction  Java Threads.
Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.
Testing Concurrent Programs Sri Teja Basava Arpit Sud CSCI 5535: Fundamentals of Programming Languages University of Colorado at Boulder Spring 2010.
Introduction to operating systems What is an operating system? An operating system is a program that, from a programmer’s perspective, adds a variety of.
Diagnostic Information for Control-Flow Analysis of Workflow Graphs (aka Free-Choice Workflow Nets) Cédric Favre(1,2), Hagen Völzer(1), Peter Müller(2)
Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.
References [1] LEAP:The Lightweight Deterministic Multi-processor Replay of Concurrent Java Programs [2] CLAP:Recording Local Executions to Reproduce.
Why Threads Are A Bad Idea (for most purposes)
CSE 153 Design of Operating Systems Winter 19
Foundations and Definitions
CS703 - Advanced Operating Systems
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
CSE 153 Design of Operating Systems Winter 2019
Presentation transcript:

CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer Microsoft Research Interns: Gerard Basler (ETH Zurich), Katie Coons (U. T. Austin), Tayfun Elmas (Koc University), P. Arumuga Nainar (U. Wisc. Madison), Iulian Neamtiu (U. Maryland, U.C. Riverside)

Concurrency is HARD Rare thread interleavings can result in bugs These bugs are hard to find, reproduce, and debug Heisenbugs: Observing the bug can “fix” it ! A huge productivity problem Developers and testers can spend weeks chasing a single Heisenbug

Demo Let’s find a simple concurrency bug

CHESS motivation Today: concurrency testing == stress testing Stress increases the interleaving variety, but Not predictable → Heisenbugs Not systematic → poor coverage of interleavings

Don’t stress, use CHESS Basic primitive: Drive a program along an interleaving of choice Interleaving can be decided by a program or a user Doing this today is surprisingly hard Use model checking techniques to systematically enumerate thread interleavings

CHESS architecture CHESS Scheduler CHESS Scheduler Memory Model bugs Memory Model bugs Monitors Coverage Repro Testing Data races Data races Debugging Visualization Unmanaged Program Unmanaged Program Windows Managed Program Managed Program.NET CLR Record the interleaving executed Drive the program along an interleaving Record the interleaving executed Drive the program along an interleaving

Talk outline Introduction Preemption bounding [PLDI ‘07] Tackling state space explosion Fair stateless model checking [PLDI ‘08] Handling cycles in states spaces CHESS architecture details [OSDI ‘08]

Enumerating thread interleavings x = 1; y = 1; x = 1; y = 1; x = 2; y = 2; x = 2; y = 2; 2,1 1,0 0,0 1,1 2,2 2,1 2,0 2,1 2,2 1,2 2,0 2,2 1,1 1,2 1,0 1,2 1,1 y = 1; x = 1; y = 2; x = 2;

Stateless model checking [Verisoft] Systematically enumerate all paths in a state-space graph Don’t capture program states Capturing states is extremely hard for large programs State = globals, heap, stack, registers, kernel, filesystem, other processes, other machines,… Very effective on acyclic state spaces Termination is guaranteed Potentially revisits program states Partial-order reduction alleviates redundant exploration

x = 1; … y = k; x = 1; … y = k; State space explosion x = 1; … y = k; x = 1; … y = k; … n threads k steps each Number of executions = O( n nk ) Exponential in both n and k Typically: n 100 Limits scalability to large programs Goal: Scale CHESS to large programs (large k)

x = 1; if (p != 0) { x = p->f; } x = 1; if (p != 0) { x = p->f; } Preemption bounding Prioritize executions with small number of preemptions Preemption is a context switch forced by the scheduler Unexpected by the programmer e.g. Time-slice expiration Hypothesis: most concurrency bugs result from few preemptions x = p->f; } x = p->f; } x = 1; if (p != 0) { x = 1; if (p != 0) { p = 0; preemption non-preemption

Polynomial state space Terminating program with fixed inputs and deterministic threads n threads, k steps each, c preemptions Number of executions <= nk C c. (n+c)! = O( (n 2 k) c. n! ) Exponential in n and c, but not in k x = 1; … y = k; x = 1; … y = k; x = 1; … y = k; x = 1; … y = k; x = 1; … x = 1; … x = 1; … x = 1; … y = k; … y = k; … y = k; Choose c preemption points Permute n+c atomic blocks

Find lots of bugs with 2 preemptions ProgramLines of codeBugs Work Stealing Q4K4 CDS6K1 CCR9K3 ConcRT16K4 Dryad18K7 APE19K4 STM20K2 TPL24K9 PLINQ24K1 Singularity175K2 37 (total) Acknowledgement: testers from PCP team

Good coverage metric When CHESS completes search with c preemptions Any remaining bug requires c+1 or more preemptions Two preemptions sufficient to reproduced all stress- test failures, reported so far

Talk outline Introduction Preemption bounding [PLDI ‘07] Tackling state space explosion Fair stateless model checking [PLDI ‘08] Handling cycles in states spaces CHESS architecture details [OSDI ‘08]

Concurrent programs have cyclic state spaces Spinlocks Non-blocking algorithms Implementations of synchronization primitives Periodic timers … L1: while( ! done) { L2: Sleep(); } L1: while( ! done) { L2: Sleep(); } M1: done = 1; ! done L2 ! done L2 ! done L1 ! done L1 done L2 done L2 done L1 done L1

A demonic scheduler unrolls any cycle ad-infinitum ! done done ! done done ! done done while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1; ! done

Depth bounding ! done done ! done done ! done done ! done Prune executions beyond a bounded number of steps Depth bound

Problem 1: Ineffective state coverage ! done Bound has to be large enough to reach the deepest bug Typically, greater than 100 synchronization operations Every unrolling of a cycle redundantly explores reachable state space Depth bound

Problem 2: Cannot find livelocks Livelocks : lack of progress in a program temp = done; while( ! temp) { Sleep(); } temp = done; while( ! temp) { Sleep(); } done = 1;

Fair stateless model checking Make stateless model checking effective on cyclic state spaces Effective state coverage Detect livelocks

Key idea This test terminates only when the scheduler is fair Fairness is assumed by programmers All cycles in correct programs are unfair A fair cycle is a livelock while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1; ! done done

Key idea This test terminates only when the scheduler is fair Fairness is assumed by programmers CHESS should only explore fair schedules while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1; ! done done

What notion of fairness?

Weak fairness Forall t :: GF ( enabled(t)  scheduled(t) ) A thread that remains enabled should eventually be scheduled A weakly-fair scheduler will eventually schedule Thread 2 Example: round-robin, FIFO wait queues while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1;

Weak fairness does not suffice Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); done = 1; Unlock( l ); Lock( l ); done = 1; Unlock( l ); en = {T1, T2} T1: Sleep() T2: Lock( l ) en = {T1, T2} T1: Lock( l ) T2: Lock( l ) en = { T1 } T1: Unlock( l ) T2: Lock( l ) en = {T1, T2} T1: Sleep() T2: Lock( l )

Strong Fairness Forall t :: GF enabled(t)  GF scheduled(t) A thread that is enabled infinitely often is scheduled infinitely often Thread 2 is enabled and competes for the lock infinitely often Example: a round-robin scheduler with priorities [Apt & Olderog ‘83] Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); While( ! done) { Unlock( l ); Sleep(); Lock( l ); } Unlock( l ); Lock( l ); done = 1; Unlock( l ); Lock( l ); done = 1; Unlock( l );

Constructing a strongly fair scheduler A round-robin scheduler is not strongly fair It is only weakly fair Extend a round-robin scheduler with priorities [Apt & Olderog ‘83]

CHESS also needs to be demonic Cannot generate all fair schedules There are infinitely many, even for simple programs It is sufficient to generate enough fair schedules to Explore all states (safety coverage) Explore at least one fair cycle, if any (livelock coverage) Do it without capturing the program states

Fair stateless model checking Given a concurrent program Q and a safety property P Q does not necessarily have an acyclic state space Determine Q satisfies P and Q is fair-terminating (livelock-free) Without capturing program states

(Good) Programs indicate lack of progress Good Samaritan assumption: Forall threads t : GF scheduled(t)  GF yield(t) A thread when scheduled infinitely often yields the processor infinitely often Examples of yield: Sleep(), ScheduleThread(), asm {rep nop;} Thread completion while( ! done) { Sleep(); } while( ! done) { Sleep(); } done = 1;

Robustness of the Good Samaritan assumption A violation of the Good Samaritan assumption is a performance error Programs are parsimonious in the use of yields A Sleep() almost always indicates a lack of progress Implies that the thread is stuck in a state-space cycle while( ! done) { ; } while( ! done) { ; } done = 1;

Fair demonic scheduler (outline) Maintain a priority-order (a partial-order) on threads A < B means that A will not be scheduled in a state where B is enabled Threads get a lower priority only when they yield Scheduler is fully demonic on yield-free paths A thread loses its priority once it executes Remove all edges t < A when A executes

Four outcomes of the semi-algorithm Terminates without finding any errors Terminates with a safety violation Diverges with an infinite execution that violates the GS assumption (a performance error) that is strongly-fair (a livelock) In practice: detect infinite executions by a very long execution

Coverage Theorem: The algorithm achieves full coverage if every state is reachable by a yield-free path, and Exists a fair cycle iff exists a fair cycle with at most one yield per thread

Results: Achieves more coverage faster With fairness Without fairness, with depth bound States Explored Percentage Coverage 100%50%87%100%76%40% Time (secs) >5000 Work stealing queue with one stealer

Livelocks in Singularity(both fixed) A thread needlessly burns its CPU quantum in a spin- loop “it's a bug that we think we have seen in practice, but that would have been very difficult to find through normal means” [Dean Tribble] An infinite loop in the Promise implementation Manifested as a non-reproducible problem in an existing stress-test CHESS found the bug in a simple test harness with a repeatable error-trace

Talk outline Introduction Preemption bounding [PLDI ‘07] Tackling state space explosion Fair stateless model checking [PLDI ‘08] Handling cycles in states spaces CHESS architecture details [OSDI ‘08]

CHESS architecture recap CHESS Scheduler CHESS Scheduler Memory Model bugs Memory Model bugs Monitors Coverage Repro Testing Data races Data races Debugging Visualization Unmanaged Program Unmanaged Program Windows Managed Program Managed Program.NET CLR Record the interleaving executed Drive the program along an interleaving Record the interleaving executed Drive the program along an interleaving

Capture the ‘happens-before’ graph Happens-before graph captures all communication between threads in a concurrent execution Abstracts time: For a given input, two executions that result in the same happens-before graph are behaviorally equivalent x = 1 t = x; wait(e) setEvent(e)

Enforce a single-threaded execution Happens-before graph is a partial-order can be converted to a totally-ordered single threaded execution Big performance win Data-accesses are automatically ordered by synchronization events Don’t need to instrument data-accesses Cannot explore non-sequentially consistent executions of a program Resulting from relaxed memory model of the hardware Sober: A tool that detects the presence of such executions [CAV ‘08]

Directing the execution Given a happens-before graph Block the execution of a synchronization if it produces an edge not in the graph Need to understand the precise semantics of synchronization operations

Conclusion Message to concurrency programmers Think seriously about interleaving coverage Message to system/PL researchers Concurrency APIs should have a clear specification of the nondeterminism exposed Don’t stress, use CHESS CHESS binary available for academic use CHESS will be shipped for commercial use, very soon CHESS is extensible Use CHESS scheduler for concurrency tools Plug in new search algorithms

Questions

A stress test fails…

CHESS reproduces the bug in 2 mins