Automatic Generation of Program Specifications

Slides:



Advertisements
Similar presentations
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 18 Program Correctness To treat programming.
Advertisements

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
De necessariis pre condiciones consequentia sine machina P. Consobrinus, R. Consobrinus M. Aquilifer, F. Oratio.
Modular and Verified Automatic Program Repair Francesco Logozzo, Thomas Ball RiSE - Microsoft Research Redmond.
An Abstract Interpretation Framework for Refactoring P. Cousot, NYU, ENS, CNRS, INRIA R. Cousot, ENS, CNRS, INRIA F. Logozzo, M. Barnett, Microsoft Research.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
ISBN Chapter 3 Describing Syntax and Semantics.
1 Semantic Description of Programming languages. 2 Static versus Dynamic Semantics n Static Semantics represents legal forms of programs that cannot be.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
OOP #10: Correctness Fritz Henglein. Wrap-up: Types A type is a collection of objects with common behavior (operations and properties). (Abstract) types.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 18 Program Correctness To treat programming.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Chair of Software Engineering Automatic Verification of Computer Programs.
Describing Syntax and Semantics
School of Computer ScienceG53FSP Formal Specification1 Dr. Rong Qu Introduction to Formal Specification
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Static program checking and verification Slides: Based.
Extended Static Checking for Java  ESC/Java finds common errors in Java programs: null dereferences, array index bounds errors, type cast errors, race.
The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.
Introduction CS 3358 Data Structures. What is Computer Science? Computer Science is the study of algorithms, including their  Formal and mathematical.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Semantics In Text: Chapter 3.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.
An Axiomatic Basis for Computer Programming Robert Stewart.
Inculcating Invariants in Introductory Courses David Evans and Michael Peck University of Virginia ICSE 2006 Education Track Shanghai, 24 May
13 Aug 2013 Program Verification. Proofs about Programs Why make you study logic? Why make you do proofs? Because we want to prove properties of programs.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
Extended Static Checking for Java Cormac Flanagan Joint work with: Rustan Leino, Mark Lillibridge, Greg Nelson, Jim Saxe, and Raymie Stata Compaq Systems.
Combining Static and Dynamic Reasoning for Bug Detection Yannis Smaragdakis and Christoph Csallner Elnatan Reisner – April 17, 2008.
Error Explanation with Distance Metrics Authors: Alex Groce, Sagar Chaki, Daniel Kroening, and Ofer Strichman International Journal on Software Tools for.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
1 CEN 4020 Software Engineering PPT4: Requirement analysis.
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
 System Requirement Specification and System Planning.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 17 – Specifications, error checking & assert.
Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.
Principles of Programming & Software Engineering
WP4 Models and Contents Quality Assessment
The Development Process of Web Applications
Matching Logic An Alternative to Hoare/Floyd Logic
Input Space Partition Testing CS 4501 / 6501 Software Testing
Chapter 8 – Software Testing
Topics: jGRASP editor ideosyncrasies assert debugger.
CSE 374 Programming Concepts & Tools
Verification and Testing
Principles of Programming and Software Engineering
Accessible Formal Methods A Study of the Java Modeling Language
BASICS OF SOFTWARE TESTING Chapter 1. Topics to be covered 1. Humans and errors, 2. Testing and Debugging, 3. Software Quality- Correctness Reliability.
State your reasons or how to keep proofs while optimizing code
Eclat: Automatic Generation and Classification of Test Inputs
Design by Contract Fall 2016 Version.
The Future of Software Engineering: Tools
Automated Support for Program Refactoring Using Invariants
Introduction to Data Structures
Lecture 5 Floyd-Hoare Style Verification
Lecture 09:Software Testing
Programming Languages 2nd edition Tucker and Noonan
Hoare-style program verification
Chapter 1 Introduction(1.1)
CSE 303 Concepts and Tools for Software Development
Data Structures and Algorithms for Information Processing
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Assertions References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 4/25/2019.
Computer Science 340 Software Design & Testing
Chapter 7 Software Testing.
Programming Languages 2nd edition Tucker and Noonan
Chapter 1: Creating a Program.
Presentation transcript:

Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science http://pag.lcs.mit.edu/ Joint work with Michael Ernst [Be excited! Have a twinkle in my eye! This is good research; it is exciting; share the fun I had and enthusiasm I feel.] [Slow down! I will have enough time. Take a breath. Pause after main ideas.] [Make eye contact. Don’t say “so”or other filler words.] Today I will present my research on automatic generation and checking of program specifications.

Synopsis Specifications are useful for many tasks Use of specifications has practical difficulties Dynamic analysis can capture specifications Recover from existing code Infer from traces Results are accurate (90%+) Specification matches implementation The main idea is the following. We know that specifications are useful for a variety of reasons. They can provide documentation, can be used to check assumptions, or can bootstrap proofs. However, specifications also present practical difficulties which hinder their use, such as the human cost of writing and checking them. This research suggests that dynamic analysis is able to recover specifications from existing code, that the results are accurate, and that the results are useful to users.

Outline Motivation Approach: Generate and check specifications Evaluation: Accuracy experiment Conclusion To illustrate this idea, I will present a brief motivation for this research, followed by a description of our approach. I will also present the results from two experiments, and then conclude.

Advantages of specifications Describe behavior precisely Permit reasoning using summaries Can be verified automatically Specifications are a valuable part of any software project. They can precisely describe intended behavior, can permit reasoning using summaries of the code instead of the code itself, and can enable automatic verification of the code.

Problems with specifications Describe behavior precisely Tedious and difficult to write and maintain Permit reasoning using summaries Must be accurate if used in lieu of code Can be verified automatically Verification may require uninteresting annotations However, practical use of specifications also presents problems. Precise specifications can be tedious and difficult to write and maintain. If used in lieu of code, specifications must be accurate. Finally, automatic verification may require writing properties that are not part of the desired specification.

Automatically generate and check specifications from the code Solution Automatically generate and check specifications from the code These three problems can be solved by automatically generating and checking specifications from the code. The user supplies source code. The system produces a specification and a proof that the implementation meets it.

Solution scope Generate and check “complete” specifications Very difficult Generate and check partial specifications Nullness, types, bounds, modification targets, ... Need not operate in isolation User might have some interaction Goal: decrease overall effort Producing complete specifications that fully characterize the input-output behavior of program components is very hard. Instead, the system will produce partial specifications which only describe certain properties of the code. They constrain the behavior, but do not fully characterize it. Furthermore, we allow that the system need not be perfect. It is not meant to be used in isolation, but will be used by programmers. Therefore, asking the user to do a little work is acceptable, as long as there is benefit to doing so.

Outline Motivation Approach: Generate and check specifications Evaluation: Accuracy experiment Conclusion We create specifications by generating them from the source code, and then checking that they are correct.

Previous approaches Generation: Checking By hand Static analysis Non-executable models There are a number of previous approaches, but all are generally lacking. Some have generated specifications by hand, but work by hand is tedious and error-prone. Some have used static analysis, but static analyses are often not powerful enough to produce useful results. Lighter-weight static checking often requires tedious annotation of programs. Some have checked specifications through non-executable models of the code, but models still have to be matched to the code, which is a difficult task in itself.

Our approach Dynamic detection proposes likely properties Static checking verifies properties Combining the techniques overcomes the weaknesses of each Ease annotation Guarantee soundness In our approach, we use dynamic detection to propose likely invariants, and static checking to verify them. Dynamic analysis may produce incorrect properties, so its results should be checked to increase confidence. Now I’ll explain each of the two parts in turn.

Daikon: Dynamic invariant detection Look for patterns in values the program computes: Instrument the program to write data trace files Run the program on a test suite Invariant detector reads data traces, generates potential invariants, and checks them The first part -- dynamic invariant detection -- looks for patterns in values the program computes at runtime. We use the Daikon invariant detector to perform this task. The program is instrumented and run over a test suite. This produces a trace database containing variable values. This database is processed by an invariant detector which generates a list of potential invariants, and checks which were true over the test suite. All of these steps are automatic, and the output is a specification that describes the program as it ran.

ESC/Java: Invariant checking ESC/Java: Extended Static Checker for Java Lightweight technology: intermediate between type-checker and theorem-prover; unsound Intended to detect array bounds and null dereference errors, and annotation violations /*@ requires x != null */ /*@ ensures this.a[this.top] == x */ void push(Object x); We use the Extended Static Checker for Java to perform the static checking. ESC/Java is a lightweight technology, intermediate between a type-checker and a theorem-prover. Its goal is to detect array bounds or null dereference errors, but it also reports when user-written assertions may not hold. Users annotate code with comments like these, which state a precondition that the argument is not null, and a postcondition that the argument is stored at the top of the stack. // [[[ breathe ]]] ESC/Java is a modular checker: it checks and relies on specifications. When checking a procedure, it assumes the preconditions and proves that the postconditions hold. When checking a call site, it proves that the preconditions are met and assumes the postconditions. ESC/Java is unsound. For instance, does not model arithmetic overflow. However, we have chosen it since it is the tool with the best integration with a real programming language. Modular: checks, and relies on, specifications

Integration approach Run Daikon over target program Insert results into program as annotations Run ESC/Java on the annotated program All steps are automatic. Our approach to integrating these tools is simple. We run Daikon over the target program; insert its results into the program as annotations; and run ESC/Java on the annotated program. All of these steps are automatic. // ======= With that background in mind, let’s look at an example of a specification generated and checked by our system.

Stack object invariants public class StackAr { Object[] theArray; int topOfStack; ... invariant theArray != null; invariant \typeof(theArray) == \type(Object[]); invariant topOfStack >= -1; invariant topOfStack < theArray.length; invariant theArray[0..topOfStack] != null; invariant theArray[topOfStack+1..] == null; /*@ invariant theArray != null; invariant \typeof(theArray) == \type(Object[]); invariant topOfStack >= -1; invariant topOfStack < theArray.length; invariant theArray[0..topOfStack] != null; invariant theArray[topOfStack+1..] == null; */ [Spend time explaining this example. (It is an example, not an experiment)] [Write this out and practice it.]

Stack push method public void push( Object x ) { ... } /*@ requires x != null; requires topOfStack < theArray.length - 1; modifies topOfStack, theArray[*]; ensures topOfStack == \old(topOfStack) + 1; ensures x == theArray[topOfStack]; ensures theArray[0..\old(topOfStack)]; == \old(theArray[0..topOfStack]); */ /*@ requires x != null; requires topOfStack < theArray.length - 1; modifies topOfStack, theArray[*]; ensures topOfStack == \old(topOfStack) + 1; ensures x == theArray[topOfStack]; ensures theArray[0..\old(topOfStack)]; == \old(theArray[0..topOfStack]); */ This is an example of one method’s specification as generated by our system. public void push( Object x ) { ... }

Stack summary ESC/Java verified all 25 Daikon invariants Reveal properties of the implementation (e.g., garbage collection of popped elements) No runtime errors if callers satisfy preconditions Implementation meets generated specification Daikon generated 25 invariants, and ESC/Java verified all of them. Now that you have some background on our approach, I can show some of the evaluation we’ve done.

Outline Motivation Approach: Generate and check specifications Evaluation: Accuracy experiment Conclusion To evaluate our approach, we performed an experiment to study the accuracy of the generated specifications.

Accuracy experiment Dynamic generation is potentially unsound How accurate are its results in practice? Combining static and dynamic analyses should produce benefits But perhaps their domains are too dissimilar? Dynamic specification generation is potentially unsound, but we were curious how accurate its results were in practice. Additionally, we speculated that combining static and dynamic analyses should produce benefits surpassing what each analysis could do on its own … but … perhaps the kinds of properties that each is able to describe are too different to see any benefit from the combination. We investigated these questions with and experiment.

Programs studied 11 programs from libraries, assignments, texts Total 2449 NCNB LOC in 273 methods Test suites Used program’s test suite if provided (9 did) If just example calls, spent <30 min. enhancing ~70% statement coverage We studied 11 small programs in the experiment. Programs were drawn from a data structures textbook, the java class library, or solutions to an MIT programming course. Together they were over 2400 NCNB lines of code..

Accuracy measurement Compare generated specification to a verifiable specification invariant theArray[0..topOfStack] != null; invariant theArray[topOfStack+1..] == null; invariant theArray != null; invariant topOfStack >= -1; invariant topOfStack < theArray.length; invariant theArray[0..length-1] == null; We ran these programs through the system and measured the result by comparing it to the nearest verifiable specification. The result may have been verifiable immediately. If not, we edited it so that it verified. We compare with a verifiable answer because it is a good measure of how much work needs to be done by a user of the system, and because it is objective. Standard measures from info ret [Sal68, vR79] Precision (correctness) : 3 / 4 = 75% Recall (completeness) : 3 / 5 = 60%

Experiment results Daikon reported 554 invariants Precision: 96% of reported invariants verified Recall: 91% of necessary invariants were reported The results of this experiment were surprisingly accurate.

Causes of inaccuracy Limits on tool grammars Daikon: May not propose relevant property ESC: May not allow statement of relevant property Incompleteness in ESC/Java Always need programmer judgment Insufficient test suite Shows up as overly-strong specification Verification failure highlights problem; helpful in fixing System tests fared better than unit tests The results of this experiment were surprisingly accurate.

Experiment conclusions Our dynamic analysis is accurate Recovered partial specification Even with limited test suites Enabled verifying lack of runtime exceptions Specification matches the code Results should scale Larger programs dominate results Approach is class- and method-centric What can we learn from these results? Many people believe that the results of a dynamic analysis will be terribly imprecise. But in fact, dynamic analysis accurately recovered a partial specification which was enough to prove the absence of runtime exceptions. We show this for small programs and tiny test suites. However, since the techniques are class- and method-centric, the accuracy should scale well to larger programs.

Value to programmers Generated specifications are accurate Are the specifications useful? How much does accuracy matter? How does Daikon compare with other annotation assistants? Answers at FSE'02 [[We were interested in seeing how Daikon’s accuracy affected users.]]

Outline Motivation Approach: Generate and check specifications Evaluation: Accuracy experiment Conclusion To conclude, ...

Conclusion Specifications via dynamic analysis Accurately produced from limited test suites Automatically verifiable (minor edits) Specification characterizes the code Unsound techniques useful in program development To conclude… I have shown that specifications generated via dynamic analysis can be accurately produced using limited test suites. In other words, the results of our unsound analysis are very close to correct. The result is nearly machine verifiable. Verification guarantees that the code encounters no runtime exceptions, and that the specification correctly characterizes the code. In a broader context, our results suggest that domains of analysis, such as specifications and verification, that traditionally been dominated by sound analyses, are also amenable to an unsound, dynamic analysis. [ Ask audience for questions. ]

Questions? [ Do not show this; the point of this slide is to separate my extra slides from the main part of the talk. ]

Formal specifications Precise, mathematical desc. of behavior [LG01] (Another type of spec: requirements documents) Standard definition; novel use Generated after implementation Still useful to produce [PC86] Many specifications for a program Depends on task e.g. runtime performance Paper carefully explains our views.

Effect of bugs Case 1: Bug is exercised by test suite Falsifies one or more invariants Weaker specification May cause verification to fail Case 2: Bug is not exercised by test suite Not reflected in specification Code and specification disagree Verifier points out inconsistency