Debugging Concurrent Software by Context-Bounded Analysis Shaz Qadeer Microsoft Research Joint work with: Jakob Rehof, Microsoft Research Dinghao Wu, Princeton.

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

Summarizing Procedures in Concurrent Programs Shaz Qadeer Sriram K. Rajamani Jakob Rehof Microsoft Research.
MATH 224 – Discrete Mathematics
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
1 Model checking. 2 And now... the system How do we model a reactive system with an automaton ? It is convenient to model systems with Transition systems.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Α ϒ ʎ …… Reachability Modulo Theories Akash Lal Shaz Qadeer, Shuvendu Lahiri Microsoft Research.
1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)
1/20 Generalized Symbolic Execution for Model Checking and Testing Charngki PSWLAB Generalized Symbolic Execution for Model Checking and Testing.
Bebop: A Symbolic Model Checker for Boolean Programs Thomas Ball Sriram K. Rajamani
Keeping a Crowd Safe On the Complexity of Parameterized Verification Javier Esparza Technical University of Munich.
Background for “KISS: Keep It Simple and Sequential” cs264 Ras Bodik spring 2005.
CS 267: Automated Verification Lecture 10: Nested Depth First Search, Counter- Example Generation Revisited, Bit-State Hashing, On-The-Fly Model Checking.
Reducing Context-bounded Concurrent Reachability to Sequential Reachability Gennaro Parlato University of Illinois at Urbana-Champaign Salvatore La Torre.
1 Thorough Static Analysis of Device Drivers Byron Cook – Microsoft Research Joint work with: Tom Ball, Vladimir Levin, Jakob Lichtenberg,
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
Poirot – A Concurrency Sleuth Shaz Qadeer Research in Software Engineering Microsoft Research.
1 Concurrency Specification. 2 Outline 4 Issues in concurrent systems 4 Programming language support for concurrency 4 Concurrency analysis - A specification.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Symmetry-Aware Predicate Abstraction for Shared-Variable Concurrent Programs Alastair Donaldson, Alexander Kaiser, Daniel Kroening, and Thomas Wahl Computer.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Summarizing Procedures in Concurrent Programs Shaz Qadeer Sriram K. Rajamani Jakob Rehof Microsoft Research.
Termination Proofs for Systems Code Andrey Rybalchenko, EPFL/MPI joint work with Byron Cook, MSR and Andreas Podelski, MPI PLDI’2006, Ottawa.
Concurrency.
Model Checking Lecture 5. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.
Static Analysis of Embedded C Code John Regehr University of Utah Joint work with Nathan Cooprider.
Context-bounded model checking of concurrent software Shaz Qadeer Microsoft Research Joint work with: Jakob Rehof, Microsoft Research Dinghao Wu, Princeton.
1 Thread Modular Model Checking Cormac Flanagan Systems Research Center HP Labs Joint work with Shaz Qadeer (Microsoft Research)
Thread-modular Abstraction Refinement Tom Henzinger Ranjit Jhala Rupak Majumdar [UC Berkeley] Shaz Qadeer [Microsoft Research]
Synergy: A New Algorithm for Property Checking
CS 267: Automated Verification Lectures 14: Predicate Abstraction, Counter- Example Guided Abstraction Refinement, Abstract Interpretation Instructor:
1 Today Another approach to “coverage” Cover “everything” – within a well-defined, feasible limit Bounded Exhaustive Testing.
Automatically Validating Temporal Safety Properties of Interfaces Thomas Ball and Sriram K. Rajamani Software Productivity Tools, Microsoft Research Presented.
Predicate Abstraction for Software and Hardware Verification Himanshu Jain Model checking seminar April 22, 2005.
Part II: Atomicity for Software Model Checking. Class Account { int balance; static int MIN = 0, MAX = 100; bool synchronized deposit(int n) { int t =
Part 2: Reachability analysis of stack-based systems.
Model Checking Lecture 5. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
Verifying Concurrent Message- Passing C Programs with Recursive Calls Sagar Chaki, Edmund Clarke, Nicholas Kidd, Thomas Reps, and Tayssir Touili.
A Simple Method for Extracting Models from Protocol Code David Lie, Andy Chou, Dawson Engler and David Dill Computer Systems Laboratory Stanford University.
Using Model-Checking to Debug Device Firmware Sanjeev Kumar Microprocessor Research Labs, Intel Kai Li Princeton University.
1 VeriSoft A Tool for the Automatic Analysis of Concurrent Reactive Software Represents By Miller Ofer.
Aditya V. Nori, Sriram K. Rajamani Microsoft Research India.
Race Checking by Context Inference Tom Henzinger Ranjit Jhala Rupak Majumdar UC Berkeley.
Parameterized Verification of Thread-safe Libraries Thomas Ball Sagar Chaki Sriram K. Rajamani.
Context-bounded model checking of concurrent software Shaz Qadeer Microsoft Research Joint work with: Jakob Rehof, Microsoft Research Dinghao Wu, Princeton.
Formalizing Hardware/Software Interface Specifications
Symbolic Execution with Abstract Subsumption Checking Saswat Anand College of Computing, Georgia Institute of Technology Corina Păsăreanu QSS, NASA Ames.
Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.
The Yogi Project Software property checking via static analysis and testing Aditya V. Nori, Sriram K. Rajamani, Sai Deep Tetali, Aditya V. Thakur Microsoft.
CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov August 30, 2011.
Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.
( = “unknown yet”) Our novel symbolic execution framework: - extends model checking to programs that have complex inputs with unbounded (very large) data.
/ PSWLAB Thread Modular Model Checking by Cormac Flanagan and Shaz Qadeer (published in Spin’03) Hong,Shin Thread Modular Model.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
KISS: K EEP I T S IMPLE AND S EQUENTIAL Guy Martin, OSLab, GNU09 Shaz Qadeer Microsoft Research One Microsoft Way Redmond, WA Dinghao Wu Department.
Software Testing.
ZING Systematic State Space Exploration of Concurrent Software
Sequentializing Parameterized Programs
runtime verification Brief Overview Grigore Rosu
Concurrency Specification
Software Testing (Lecture 11-a)
Over-Approximating Boolean Programs with Unbounded Thread Creation
Coding Concepts (Basics)
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Presentation transcript:

Debugging Concurrent Software by Context-Bounded Analysis Shaz Qadeer Microsoft Research Joint work with: Jakob Rehof, Microsoft Research Dinghao Wu, Princeton University

Concurrent software Operating systems, device drivers Databases, web servers, browsers, GUIs,... Modern languages: C#, Java Processor 1 Processor 2                         Thread 1 Thread 2 Thread 3 Thread 4

Concurrency is increasingly important Single-chip multiprocessors are an architectural inflexion point –Software running on these chips will be even more concurrent Embedded systems –Airplanes, cars, PDAs, cellphones Web services

Reliable concurrent software? Correctness Problem –does program behave correctly for all inputs and all interleavings? Bugs due to concurrency are insidious –nondeterministic, timing dependent –difficult to detect, reproduce, eliminate –coverage from testing very poor

Analysis of concurrent programs is difficult (1) Finite-data single-procedure program –n lines –m states for global data variables 1 thread –n * m states K threads –(n) K * m states

Analysis of concurrent programs is difficult (2) Finite-data program with procedures –n lines –m states for global data variables 1 thread –Infinite number of states –Can still decide assertions in O(n * m 3 ) –SLAM, ESP, BLAST implement this algorithm K  2 threads –Undecidable! (Ramalingam 00)

Context-bounded verification of concurrent software           Context Context switch Analyze all executions with small number of context switches !

Many subtle concurrency errors are manifested in executions with a small number of contexts Context-bounded analysis can be performed efficiently Why context-bounded analysis?

KISS: A static checker for concurrent software An implementation of context-bounded analysis –Technique to use any sequential checker to perform context-bounded concurrency analysis Has found a number of concurrency errors in NT device drivers

Sequential program Q KISS Sequential Checker Concurrent program P  No error found Error in Q indicates error in P KISS: A static checker for concurrent software

Sequential program Q KISS Concurrent program P KISS: A static checker for concurrent software  No error found Error in Q indicates error in P SDV

Sequential program Q KISS Concurrent program P KISS: A static checker for concurrent software  No error found Error in Q indicates error in P PREfix

Sequential program Q KISS Concurrent program P KISS: A static checker for concurrent software  No error found Error in Q indicates error in P ESP

Inside a static checker for sequential programs int x, y, z; void foo ( ) { if (x > y) { y = x; } if (y > z) { z = y; } assert (x ≤ z); } Symbolically analyze all paths Check the assertion for each path Interprocedural analysis – e.g., PREfix, ESP, SLAM, BLAST

KISS strategy Q encodes executions of P with small number of context switches –instrumentation introduces lots of extra paths to mimic context switches Leverage all-path analysis of sequential checkers Sequential program Q KISS Concurrent program P  SDV

PnpStop( ) { int t; de->stopping = T; t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent); WaitEvent(& de->stopEvent); } DispatchRoutine( ) { int t; if (! de->stopping) { AtomicIncr(& de->count); // do useful work // … t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent); }

DispatchRoutine( ) { int t; if (! de->stopping) { AtomicIncr(& de->count); // do useful work // … t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent); } PnpStop( ) { int t; if ($) return; de->stopping = T; if ($) return; t = AtomicDecr(& de->count); if ($) return; if (t == 0) SetEvent(& de->stopEvent); if ($) return; WaitEvent(& de->stopEvent); }

PnpStop( ) { int t; if ($) return; de->stopping = T; if ($) return; t = AtomicDecr(& de->count); if ($) return; if (t == 0) SetEvent(& de->stopEvent); if ($) return; WaitEvent(& de->stopEvent); } DispatchRoutine( ) { int t; CODE; if (! de->stopping) { CODE; AtomicIncr(& de->count); // do useful work // … CODE; t = AtomicDecr(& de->count); CODE; if (t == 0) SetEvent(& de->stopEvent); CODE; } if ( !done ) { if ($) { done = T; PnpStop( ); } } CODE  bool done = F;

PnpStop( ) { int t; if ($) return; de->stopping = T; if ($) return; t = AtomicDecr(& de->count); if ($) return; if (t == 0) SetEvent(& de->stopEvent); if ($) return; WaitEvent(& de->stopEvent); } DispatchRoutine( ) { int t; CODE; if (! de->stopping) { CODE; AtomicIncr(& de->count); // do useful work // … CODE; t = AtomicDecr(& de->count); CODE; if (t == 0) SetEvent(& de->stopEvent); CODE; } if ( !done ) { if ($) { done = T; PnpStop( ); } } CODE  bool done = F; main( ) { DispatchRoutine( ); }

PnpStop( ) { int t; CODE; de->stopping = T; CODE; t = AtomicDecr(& de->count); CODE; if (t == 0) SetEvent(& de->stopEvent); CODE; WaitEvent(& de->stopEvent); CODE; } DispatchRoutine( ) { int t; if ($) return; if (! de->stopping) { if ($) return; AtomicIncr(& de->count); // do useful work // … if ($) return; t = AtomicDecr(& de->count); if ($) return; if (t == 0) SetEvent(& de->stopEvent); } if ( !done ) { if ($) { done = T; PnpStop( ); } } CODE  bool done = F; main( ) { PnpStop( ); }

KISS features KISS trades off soundness for scalability Cost of analyzing a concurrent program P = cost of analyzing a sequential program Q –Size of Q asymptotically same as size of P Unsoundness is precisely quantifiable –for 2-thread program, explores all executions with up to two context switches –for n-thread program, explores up to 2n-2 context switches Allows any sequential checker to analyze concurrency

Experimental Evaluation of KISS

Driver Stopping Error in Bluetooth Driver (1 KLOC) DispatchRoutine() { int t; if (! de->stopping) { AtomicIncr(& de->count); assert ! driverStopped; // do useful work // … t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent); } PnpStop() { int t; de->stopping = T; t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent); WaitEvent(& de->stopEvent); driverStopped = T; }

int t; if (! de->stopping) { int t; de->stopping = T; t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent); WaitEvent(& de->stopEvent); driverStopped = T; AtomicIncr(& de->count); assert ! driverStopped; // do useful work // … t = AtomicDecr(& de->count); if (t == 0) SetEvent(& de->stopEvent); } Assertion fails!

DispatchRoutine(IRP *irp) { … irp->CancelRoutine = PacketCancelRoutine; Enqueue(irp); IoMarkIrpPending(irp); … } IoCancelIrp(IRP *irp) { IoAcquireCancelSpinLock(); if (irp->CancelRoutine) { (irp->CancelRoutine)(irp); } … } PacketCancelRoutine(IRP *irp) { … Dequeue(irp); IoCompleteRequest(irp); IoReleaseCancelSpinLock(); … } IRP Cancellation Error in Packet Driver (2.5 KLOC)

… irp->CancelRoutine = PacketCancelRoutine; Enqueue(irp); IoAcquireCancelSpinLock(); if (irp->CancelRoutine) { // inline PacketCancelRoutine(irp) … Dequeue(irp); IoCompleteRequest(irp); IoReleaseCancelSpinLock(); IoMarkIrpPending(irp); Error: An irp should not be marked pending after it has been completed !

Data-race Conditions in DDK Sample Drivers Device extension shared among threads Data-races on device extension fields 18 sample DDK drivers –Range KLOC –Total 70 KLOC Each field checked separately with resource limit of 20 minutes and 800MB Two threads: each calls nondeterministically chosen dispatch routine

DriverKLOC# Fields# Races Tracedrv0.530 Moufiltr Kbfiltr Imca1.151 Startio1.190 Toaster/toastmon1.481 Diskperf diag vdev Fakemodem Toaster/bus Serenum Toaster/func Mouclass Kbdclass Mouser Fdc Total: 30 races

ToastMon_DispatchPnp( DEVICE_OBJECT *obj, IRP *irp) { … IoAcquireRemoveLock(); … case IRP_MN_QUERY_STOP_DEVICE: // Race: write access deviceExt->DevicePnPState = StopPending; … break; … IoReleaseRemoveLock(); … } ToastMon_DispatchPower( DEVICE_OBJECT *obj, IRP *irp) { … // Race: read access if (deviceExt->DevicePnpState == Deleted) { … } … } DevicePnpState Field in Toaster/toastmon

Acknowledgments Tom Ball Byron Cook John Henry Doron Holan Vladimir Levin Jakob Lichtenberg Adrian Oney Sriram Rajamani Peter Wieland …

Keep It Simple and Sequential Context-bounded analysis by leveraging existing sequential checkers Validates the hypothesis that many concurrency errors require few context switches to show up

However… Hard limit on number of explored contexts –e.g., two context switches for concurrent program with two threads Case study: Concurrent transaction management code written in C# (Naik-Rehof 04) –Analyzed by the Zing model checker after automatically translating to the Zing input language –Found three bugs each requiring between three and four context switches

Is a tuning knob possible? Given a concurrent boolean program P and a positive integer c, does P go wrong by failing an assertion via an execution with at most c contexts? Given a concurrent boolean program P, does P go wrong by failing an assertion? Undecidable Decidable Given a concurrent boolean program P with unbounded fork-join parallelism and a positive integer c, does P go wrong by failing an assertion via an execution with at most c contexts? Decidable

          Context Context switch Problem: Unbounded computation possible within each context! Unbounded execution depth and reachable state space Different from bounded-depth model checking

Global store g, valuation to global variables Local store l, valuation to local variables Stack s, sequence of local stores State (g, s) Sequential boolean program

Example bool a = F; void main( ) { L1: a = T; L2: flip(a); L3: } void flip(bool x) { L4: a = !x; L5: } (F,  ) (F,  _, L3  ) (F,  _, L3   T, L5  ) (T,  _, L3   T, L4  ) (T,  _, L2  ) (F,  _, L1  ) (a,  x, pc  )

Global store g, valuation to global variables Local store l, valuation to local variables Stack s, sequence of local stores State (g, s) Sequential boolean program Transition relation: (g, s)  (g’, s’)

Reachability problem for sequential boolean program Given (g, s), is there s’ such that (g, s)  * (error,s’)?

Aggregate state Set of stacks ss Aggregate state (g, ss) = { (g,s) | s  ss } Reach(g, ss) = { (g’, s’) | exists s  ss such that (g, s)  * (g’, s’) }

Aggregate transition relation Observations: There is a unique smallest partition of Reach(g, ss) into aggregate states: (g’ 1, ss’ 1 )  …  (g’ n, ss’ n ) The number of elements in the partition is bounded by the number of global stores (g, ss)  (g’ 1, ss’ 1 ). (g, ss)  (g’ n, ss’ n )

Theorem (Buchi, Schwoon00) If ss is regular and (g, ss)  (g’, ss’), then ss’ is regular. If ss is given as a finite automaton A, then a finite automaton A’ for ss’ can be constructed from A in polynomial time.

Algorithm Solution: Compute automaton for ss’ such that (g, {s})  (error, ss’) and check if ss’ is nonempty. Problem: Given (g, s), is there s’ such that (g, s)  * (error,s’)?

Global store g, valuation to global variables Local store l, valuation to local variables Stack s, sequence of local stores State (g, s 1, s 2 ) Concurrent boolean program Transition relation: (g, s 1 )  (g’, s’ 1 ) in thread 1 (g, s 1, s 2 )  1 (g’, s’ 1, s 2 ) (g, s 2 )  (g’, s’ 2 ) in thread 2 (g, s 1, s 2 )  2 (g’, s 1, s’ 2 )

Reachability problem for concurrent boolean program Given (g, s 1, s 2 ), are there s’ 1 and s’ 2 such that (g, s 1, s 2 ) reaches (error, s’ 1, s’ 2 ) via an execution with at most c contexts?

Aggregate transition relation (g, ss 1 )  (g’, ss’ 1 ) in thread 1 (g, ss 1, ss 2 )  1 (g’, ss’ 1, ss 2 ) (g, ss 1, ss 2 )  2 (g’, ss 1, ss’ 2 ) (g, ss 2 )  (g’, ss’ 2 ) in thread 2 (g, ss 1, ss 2 ) = { (g, s 1, s 2 ) | s 1  ss 1, s 2  ss 2 }

Algorithm: 2 threads, c contexts   12   12   12 Depth c (g, {s 1 }, {s 2 }) Compute the set of reachable aggregate states. Report an error if (g, ss 1, ss 2 ) is reachable and g = error, ss 1 is nonempty, and ss 2 is nonempty.

Complexity: 2 threads, c contexts   12   12   12 Depth of tree = context bound c Branching factor bounded by G  2 (G = # of global stores) Number of edges bounded by (G  2) (c+1) Each edge computable in polynomial time Depth c (g, {s 1 }, {s 2 })

Context-bounded analysis of concurrent software Many subtle concurrency errors are manifested in executions with few context switches –Experience with KISS on Windows drivers –Experience with Zing on transaction manager Algorithms for context-bounded analysis are more efficient than those for unbounded analysis –Reducibility to sequential checking with KISS –Decidability of assertion checking for concurrent boolean programs

Applications of context-bounded analysis Coverage metric for testing concurrent software Analysis of computer protocols –networking –cache-coherence

Unbounded fork-join parallelism Fork operation: x = fork Join operation: join(x) Copy thread identifier from one variable to another

Algorithm: unbounded fork-join parallelism, c contexts At most c threads may perform a transition Reduce to previously solved problem with c threads and c contexts –Nondeterministically pick c forked threads for execution

start : {1, …, c}  boolean, initialized to i. (i == 1) end : {1, …, c}  boolean, initialized to i. false x = fork translates to if ($) { assume(count < c); count = count + 1; x = count; start[count] = true; } else { x = c + 1; } join(x) translates to assume(x  c); assume(end[x]); count : {1, …, c}, initialized to 1 c statically created threads thread i starts execution when start[i] is true thread i sets end[i] to true on termination