Catching Bugs in Software Rajeev Alur Systems Design Research Lab University of Pennsylvania www.cis.upenn.edu/~alur/

Slides:



Advertisements
Similar presentations
Thursday, November 1, 2001(c) 2001 Ibrahim K. El-Far. All rights reserved.1 Enjoying the Perks of Model-based Testing Ibrahim K. El-Far Florida Institute.
Advertisements

Analysis of Computer Algorithms
Copyright 2000 Cadence Design Systems. Permission is granted to reproduce without modification. Introduction An overview of formal methods for hardware.
CSE 599F: Formal Verification of Computer Systems.
Formal Methods and Testing Goal: software reliability Use software engineering methodologies to develop the code. Use formal methods during code development.
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 1.
Artificial Intelligence
SOFTWARE TESTING. Software Testing Principles Types of software tests Test planning Test Development Test Execution and Reporting Test tools and Methods.
1 Software Model Checking Andrey Rybalchenko Slides partly by Rupak Majumdar.
M ODEL CHECKING -Vasvi Kakkad University of Sydney.
A System to Generate Test Data and Symbolically Execute Programs Lori A. Clarke September 1976.
Syntax-Guided Synthesis Rajeev Alur Joint work with R.Bodik, G.Juniwal, M.Martin, M.Raghothaman, S.Seshia, R.Singh, A.Solar-Lezama, E.Torlak, A.Udupa 1.
Syntax-Guided Synthesis Rajeev Alur Joint work with R.Bodik, G.Juniwal, M.Martin, M.Raghothaman, S.Seshia, R.Singh, A.Solar-Lezama, E.Torlak, A.Udupa 1.
Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
Software Engineering & Automated Deduction Willem Visser Stellenbosch University With Nikolaj Bjorner (Microsoft Research, Redmond) Natarajan Shankar (SRI.
© M. Winter COSC 4P41 – Functional Programming Testing vs Proving Testing –uses a set of “typical” examples, –symbolic testing, –may find errors,
TAP: Tests and Proofs, 12 February Testing and Verifying Invariant Based Programs in the SOCOS Environment Ralph-Johan Back, Johannes Eriksson and.
1 University of Toronto Department of Computer Science © 2001, Steve Easterbrook Lecture 10: Formal Verification Formal Methods Basics of Logic first order.
Thomas Ball, Rupak Majumdar, Todd Millstein, Sriram K. Rajamani Presented by Yifan Li November 22nd In PLDI 01: Programming Language.
Introductory Lecture. What is Discrete Mathematics? Discrete mathematics is the part of mathematics devoted to the study of discrete (as opposed to continuous)
Software Reliability CIS 640 Adapted from the lecture notes by Doron Pelel (
The Concept of Computer Architecture
CSEP590 – Model Checking and Software Verification University of Washington Department of Computer Science and Engineering Summer 2003.
HCSSAS Capabilities and Limitations of Static Error Detection in Software for Critical Systems S. Tucker Taft CTO, SofCheck, Inc., Burlington, MA, USA.
1 CS233601: Discrete Mathematics Department of Computer Science National Tsing Hua University.
Resolution Refutation Formal Aspects of Computer Science - Week 10 An Automated Theorem Prover Lee McCluskey, room 2/07
Embedded Systems Laboratory Department of Computer and Information Science Linköping University Sweden Formal Verification and Model Checking Traian Pop.
A practical approach to formal methods Lecturer: Ralph Back Place: A5059 Time:e very second Monday, Dates: 28.1, 11.2, 25.2, 10.3, 31.3, 14.4,
Formal verification Marco A. Peña Universitat Politècnica de Catalunya.
Formal Specification Thomas Alspaugh ICS Nov 7.
Mathematics throughout the CS Curriculum Support by NSF #
Data Structures and Programming.  John Edgar2.
Do we need theoretical computer science in software engineering curriculum: an experience from Uni Novi Sad Bansko, August 28, 2013.
Using a Formal Specification and a Model Checker to Monitor and Guide Simulation Verifying the Multiprocessing Hardware of the Alpha Microprocessor.
Programming Concepts Jacques Tiberghien office : Mobile :
1 Debugging and Testing Overview Defensive Programming The goal is to prevent failures Debugging The goal is to find cause of failures and fix it Testing.
1 Program Correctness CIS 375 Bruce R. Maxim UM-Dearborn.
Swarat Chaudhuri Roberto Lublinerman Pennsylvania State University Sumit Gulwani Microsoft Research CAUCHY Continuity analysis of programs.
By Ian Jackman Davit Stepanyan.  User executed untested code.  The order in which statements were meant to be executed are different than the order.
Discrete Structures for Computing
© Andrew IrelandDependable Systems Group On the Scalability of Proof Carrying Code for Software Certification Andrew Ireland School of Mathematical & Computer.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
© Andrew IrelandDependable Systems Group Invariant Patterns for Program Reasoning Andrew Ireland Dependable Systems Group School of Mathematical & Computer.
1 Predicate Abstraction and Refinement for Verifying Hardware Designs Himanshu Jain Joint work with Daniel Kroening, Natasha Sharygina, Edmund M. Clarke.
Decision methods for arithmetic Third summer school on formal methods Leonardo de Moura Microsoft Research.
Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
Verification & Validation By: Amir Masoud Gharehbaghi
HACNet Simulation-based Validation of Security Protocols Vinay Venkataraghavan Advisors: S.Nair, P.-M. Seidel HACNet Lab Computer Science and Engineering.
MNP1163/MANP1163 (Software Construction).  Minimizing complexity  Anticipating change  Constructing for verification  Reuse  Standards in software.
CES 592 Theory of Software Systems B. Ravikumar (Ravi) Office: 124 Darwin Hall.
FORMAL METHOD. Formal Method Formal methods are system design techniques that use rigorously specified mathematical models to build software and hardware.
© Andrew IrelandGrand Challenges for Computing Research 2004 The Verifying Compiler Andrew Ireland Dependable Systems Group School of Mathematical & Computer.
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
Introductory Lecture. What is Discrete Mathematics? Discrete mathematics is the part of mathematics devoted to the study of discrete (as opposed to continuous)
Formal Verification – Robust and Efficient Code Lecture 1
CENG 424-Logic for CS Introduction Based on the Lecture Notes of Konstantin Korovin, Valentin Goranko, Russel and Norvig, and Michael Genesereth.
Sub-fields of computer science. Sub-fields of computer science.
Formal Methods for Finding Bugs in Concurrent Software
Synthesis from scenarios and requirements
What contribution can automated reasoning make to e-Science?
Partial Differential Equations and Applied Mathematics Seminar
Formal Methods (i.e. mathematical, algorithmic) for Software and Hardware Designs and, more generally, Design Tools and Technologies
Formal Methods in Software Engineering 1
WELCOME TO DIGITAL & computer SCIENCE
Computer courses in Chandigarh. Very Brief History of Computers.
Discrete Mathematics and Its Applications
W. Paul Universität Saarbrücken wiss. Gesamtprojektleiter
Logic: tool-based modeling and reasoning
Mathematical Reasoning with Data Abstractions
Presentation transcript:

Catching Bugs in Software Rajeev Alur Systems Design Research Lab University of Pennsylvania

Software Reliability  Software bugs are pervasive Bugs can be expensive Bugs can cost lives Bulk of development cost is in validation, testing, bug fixes  Old problem that just won’t go away  Many approaches and decades of research Systematic testing Programming languages technology (e.g. types) Formal methods (specification and verification) Grand challenge for computer science: Tools for designing “correct” software

 Correctness is formalized as a mathematical claim to be proved or falsified rigorously always with respect to the given specification  A brief history of formal verification 1. Structured programs; Hoare logic; Network protocols; State-space search; Cache coherency protocols; Symbolic search; Device drivers; Automated abstraction; 2001 Verifier software/model correctness specification Yes/proof No/bug

1. Program Verification  Hoare logic for formalizing correctness of structured programs (late 1960s)  Typical examples: sorting, graph algorithms  Specification for sorting Permute(A,B): array B is a permutation of elements in array A Sorted(A): for 0<i<n, A[i]<=A[i+1]  Function sort is correct if following holds {True} B := sort(A) {Permute(A,B)&Sorted(B)}  Provides calculus for pre/post conditions of structured programs

BubbleSort (A : array[1..n] of int) { B = A : array[1..n] of int; for (i=0; i<n; i++) { Permute(A,B) Sorted(B[n-i,n]) for 0<k<=n-i-1 and n-i<=k’<=n B[k]<=B[k’] for (j=0; j<n-i; j++) { Permute(A,B), Sorted(B[n-i,n], for 0<k<=n-i-1 and n-i<=k’<=n B[k]<=B[k’] for 0<k<j B[k] <= B[j] if (B[j]>B[j+1]) swap(B,j,j+1) } }; return B; } Sample Proof: Bubble Sort BubbleSort (A : array[1..n] of int) { B = A : array[1..n] of int; for (i=0; i<n; i++) { for (j=0; j<n-i; j++) { if (B[j]>B[j+1]) swap(B,j,j+1) } }; return B; } Key to proof: Finding suitable loop invariants

Program Verification  Powerful mathematical logic (e.g. first-order logic, Higher-order logics) needed for formalization Automation extremely difficult Finding proof decomposition requires great expertise  Alive and well, but not booming  Contemporary theorem provers: HOL, PVS, ACL2 provide decision procedures and tactics for decomposition  Main applications: Microprocessor verification, Correctness of JVM…

2. Protocol Analysis  Automated analysis of finite-state protocols Network protocols, Distributed algorithms  Great progress in the last 20 years Protocol modeled as communicating finite-state processes Correctness specified using temporal logic Verification performed automatically to reveal errors Highly optimized state-space search techniques  Model checker SPIN from Bell Labs ACM Software Systems award (2001) Success in finding high-quality bugs in real systems (NASA space shuttle, Lucent’s Pathstar switch)

Example: X.21 Communication Protocol

State-space Explosion !!  Analysis is basically a reachability problem in a graph Nodes are states, where each state gives values of all the variables of all the communicating processes An edge represents execution of a single action of one of the processes (asynchronous communication)  Size of graph grows exponentially as the number of bits required for state encoding, but… Graph is constructed only incrementally, on-the-fly Clever hashing and state compaction techniques Many techniques for exploiting structure: symmetry, data independence, partial order reduction … Millions of states can be explored quickly to reveal bugs  Great flexibility in modeling Abstract many details, simplify Scale down parameters (buffer size, number of network nodes…)

3. Symbolic Model Checking  Constraint-based analysis of Boolean systems Cache coherency protocols, Memory controllers,…  Active in the past 12 years Symbolic Boolean representations (propositional formulas, BDDs) used to encode system dynamics Correctness specified using temporal logic CTL Fix-point computation over state sets Highly optimized memory management  Model checker SMV from CMU ACM Kannellakis Theory in Practice Award (1999) Success in finding high-quality bugs in hardware applications (VHDL/Verilog code)

Cache consistency: Gigamax Real design of a distributed multiprocessor Similar successes: IEEE Futurebus+ standard, IBM/Intel/Motorola… Deadlock found using SMV M P UIC MP Global bus Cluster bus Read-shared/read-owned/write-invalid/write-shared/…

Symbolic Reachability Problem Model variables X ={x1, … xn} Each var is of finite type, say, boolean Initialization: I(X) condition over X Update: T(X,X’) How new vars X’ are related to old vars X as a result of executing one step of the program Target set: F(X) Computational problem: Can F be satisfied starting with I by repeatedly applying T ? Graph Search problem

Symbolic Solution Data type: region to represent state-sets R:=I(X) Repeat If R intersects T report “yes” else if R contains Post(R) report “no” else R := R union Post(R) Post(R(X))= (Exists X. R(X) and T(X,X’))[X’ -> X] Operations needed: union, intersection, test for inclusion/emptiness, projection, renaming

Binary Decision Diagrams Popular representations for Boolean functions Key properties: Canonical! Size depends on choice of ordering of variables Operations such as union/intersection are efficient a b c d Function: (a and b) or (c and d) Like a decision graph No redundant nodes No isomorphic subgraphs Variables tested in fixed order

Symbolic Search Techniques  Size of BDDs can explode during search, and is quite unpredictable Years of research leading to plethora of heuristics  Significant industrial interest In-house groups: Cadence, Synopsis, IBM, NEC… Commercial model checkers/verification consultants  Recent focus: SAT solvers Checking whether F can be reached within k steps can be formulated as a satisfiability of a propositional formula with nk variables Extremely fast solvers such as zChaff (from Princeton) can solve problems with 1000 vars fast ! SAT + BDD can be combined to great effects

4. Software Model Checking via Abstraction  Can we apply model checking to C programs? SPIN approach is fine for analyzing models, but constructing models is expensive, and models have no relation to code  Given a program P, build an abstract finite-state (Boolean) model A such that set of behaviors of P is a subset of those of A (conservative abstraction) Basic ideas around for a while, but all components put together effectively only recently by Microsoft Research team in the project SLAM Shown to be effective on Windows device drivers, Linux source code (about 10K lines of code)

Program Abstraction int x, y; if x>0 { ………… y:=x+1 ……….} else { ………… y:=x+1 ……….} bool bx, by; if bx { ………… by:=true ……….} else { ………… by:={true,false} ……….} Predicate Abstraction bx: x>0; by : y>0

do { KeAcquireSpinLock(); nPacketsOld = nPackets; if(request){ request = request->Next; KeReleaseSpinLock(); nPackets++; } } while (nPackets != nPacketsOld); KeReleaseSpinLock(); Verification Example Does this code obey the locking spec? UnlockedLocked Error Rel Acq Rel Specification

do { KeAcquireSpinLock(); if(*){ KeReleaseSpinLock(); } } while (*); KeReleaseSpinLock(); Initial Abstraction Model checking boolean program Using BDDs U L L L L U L U U U E

do { KeAcquireSpinLock(); nPacketsOld = nPackets; if(request){ request = request->Next; KeReleaseSpinLock(); nPackets++; } } while (nPackets != nPacketsOld); KeReleaseSpinLock(); Feasibility Analysis Is error path feasible in C program? Requires theorem prover for constraint propagation U L L L L U L U U U E

do { KeAcquireSpinLock(); nPacketsOld = nPackets; b = true; if(request){ request = request->Next; KeReleaseSpinLock(); nPackets++; b = b ? false : *; } } while (nPackets != nPacketsOld); !b KeReleaseSpinLock(); Predicate Discovery Add new predicate to boolean program New techniques b : (nPacketsOld == nPackets) U L L L L U L U U U E

Revised Abstraction do { KeAcquireSpinLock(); b = true; if(*){ KeReleaseSpinLock(); b = b ? false : *; } } while ( !b ); KeReleaseSpinLock(); b : (nPacketsOld == nPackets) b b b b U L L L L U L U U b b !b Model checking refined boolean program

Abstraction Based Techniques  Tools for verifying source code combine many techniques Program analysis techniques such as slicing Abstraction Model checking Refinement from counter-examples  New challenges for model checking (beyond finite-state reachability analysis) Recursion gives pushdown control Pointers, dynamic creation of objects, inheritence….  A very active and emerging research area

Research in Formal Methods Verifier model correctness specification proof bug software Decision procedures Algorithms engineering Automated abstraction Compositional analysis Temporal logics Automata From requirements to specs Modeling languages Hierarchy, recursion Real-time, Hybrid Stochastic Bridging the gap Model extraction Model-based design: from models to code

Current Research Projects  Foundations Analysis of context-free models Stochastic hybrid systems Decision problems for timed automata  Algorithms Engineering Combining SAT, BDDs, Abstraction Symbolic solutions to games  Model-based design From hybrid automata to embedded software From state-machine models to Java card policies  Software verification for Java classes

Classical Model Checking  Both model M and specification S are regular (finite-state) M as a generator of all possible behaviors S as an acceptor of “good” behaviors (verification is language inclusion of M in S) or as an acceptor of “bad” behaviors (verification is checking emptiness of intersection of M and S)  Typical specifications (using automata or temporal logic) Safety: Always not ( both P1 and P2 have write-exclusive copy) Liveness: Always (if P1 requests, eventually it gets response)  Robustness of theory of regular languages helps in many ways M can be product of several components (closure under intersection)  For liveness properties, one needs to consider automata over infinite words, but corresponding theory of omega-regular languages is well developed and well understood

Recursive State Machines A2 A1 A3 A2 A3 A1 Entry-pointExit-point Box (superstate) main() { bool y; … x = P(y); … z = P(x); … } bool P(u: bool) { … return Q(u); } bool Q(w: bool) { if … else return P(~w) } Boolean Programs

Model Checking of Recursive Models  Control-flow requires stack, so model M defines a context-free language  Algorithms exist for checking regular specifications against context-free models Emptiness of pushdown automata is solvable Product of a regular language and a context-free language is context-free  But, checking context-free spec against a context-free model is undecidable! Context-free languages are not closed under intersection Inclusion as well as emptiness of intersection undecidable

Are Context-free Specs Interesting ?  Classical Hoare-style pre/post conditions If p holds when procedure A is invoked, q holds upon return Total correctness: every invocation of A terminates Integral part of emerging standard JML  Stack inspection properties (security/access control) If a variable x is being accessed, procedure A must be in the call stack  Above requires matching of calls with returns, or finding unmatched calls Recall: Language of words over [, ] such that brackets are well matched is not regular, but context-free

Caret for Context-free Specifications  Caret: Temporal Logic of Calls and Returns [AEM03] Context-free extension of Pnueli’s Linear Temporal Logic LTL Allows specification of pre/post conditions Allows specification of stack inspection properties  Main result: Checking Caret specifications against a context-free model is decidable Polynomial in the size of the model and exponential in the size of formula (as in case of classical model checking) Proof technique: Product of pushdown model M and Caret specification S is again a pushdown automaton Key to success: The notion of calls and returns is the same for M as well as S

Caret Definition Interpreted over “structured” words in which positions are marked with calls { and returns } p{q{rp rq{ppp}rq}p p Caret provides classical temporal operators such as Next and Always q’ q’=Next(q) p’ p’=Always(p or q)

Caret Abstract Operators Abstract versions of operators jump from a call to the matching return p{q{rp rq{ppp}rq}p p Sample specification: pre/post: Always( p & call -> abstract-next q ) q’ q’=abstract-next(q) q’ p’ p’=abstract-always(p or q) p’

Visibly Pushdown Languages [AM03]  Subclass of context-free languages that is suitable for program analysis / algorithmic verification  Alphabet is structured: Symbols are tagged with calls and returns  A visibly pushdown automaton’s moves are constrained by input If current symbol is a call, it must push If current symbol is a return it must pop Else it can only update control state  Class of languages defined by these automata is very robust Closed under union, intersection, complement, Kleene-*. Emptiness, inclusion, equivalence decidable Alternative characterizations: Embeddings of regular tree languages, Monadic Second Order theory with a binary matching predicate  Caret is a subset of visibly pushdown languages

Synthesis of Behavioral Interfaces  Behavioral type of a class specifies the allowed sequences of method calls  Type for a file class may be (open; (read+open)*;close)*  Can we synthesize this type automatically? Given source code for the class implementation Construct a regular language over the method calls so that a particular exception is never raised  This is useful for compositional verification also: behavioral interface is a suitable abstraction of the class  Proposed route (ongoing project) Use abstraction to get a finite-state model Solve a symbolic game to get the most general strategy for invoking methods to keep the abstract model “safe” Extract interface type from the game solution

Behavioral Interface public Object next() { … lastRet = cursor++; …} public Object prev() { … lastRet = cursor; …} public void remove() { if (lastRet==-1) throw new IllegalExc(); … lastRet = -1; …} public void add(Object o) { … lastRet = -1; …} AbstractList.ListItr Start Unsafe Safe add next add remove,add next,prev

Game in Abstracted Program next prev From black states, Player0 gets to choose the input method call From purple states, Player1 gets to choose a path in the abstract program till call returns Objective for Player0: Ensure error states (from which exception can be rasied) are avoided Winning strategy: Correct method sequence calls

Challenges  Techniques for generating finite-state abstractions  How to solve large games symbolically? In fact, a partial information game (Player0 should choose the next method call only based on values returned so far)  How to construct an understandble behavioral type from the winning strategy?  Abstraction refinement If Player0 does not invoke any method, exceptions can never be raised How to refine the current abstraction based on quality of current behavioral type?  Integrating all these into a working tool