Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science http://pag.lcs.mit.edu/ Joint work with Michael Ernst [Be excited! Have a twinkle in my eye! This is good research; it is exciting; share the fun I had and enthusiasm I feel.] [Slow down! I will have enough time. Take a breath. Pause after main ideas.] [Make eye contact. Don’t say “so”or other filler words.] Today I will present my research on automatic generation and checking of program specifications.
Synopsis Specifications are useful for many tasks Use of specifications has practical difficulties Dynamic analysis can capture specifications Recover from existing code Infer from traces Results are accurate (90%+) Specification matches implementation The main idea is the following. We know that specifications are useful for a variety of reasons. They can provide documentation, can be used to check assumptions, or can bootstrap proofs. However, specifications also present practical difficulties which hinder their use, such as the human cost of writing and checking them. This research suggests that dynamic analysis is able to recover specifications from existing code, that the results are accurate, and that the results are useful to users.
Outline Motivation Approach: Generate and check specifications Evaluation: Accuracy experiment Conclusion To illustrate this idea, I will present a brief motivation for this research, followed by a description of our approach. I will also present the results from two experiments, and then conclude.
Advantages of specifications Describe behavior precisely Permit reasoning using summaries Can be verified automatically Specifications are a valuable part of any software project. They can precisely describe intended behavior, can permit reasoning using summaries of the code instead of the code itself, and can enable automatic verification of the code.
Problems with specifications Describe behavior precisely Tedious and difficult to write and maintain Permit reasoning using summaries Must be accurate if used in lieu of code Can be verified automatically Verification may require uninteresting annotations However, practical use of specifications also presents problems. Precise specifications can be tedious and difficult to write and maintain. If used in lieu of code, specifications must be accurate. Finally, automatic verification may require writing properties that are not part of the desired specification.
Automatically generate and check specifications from the code Solution Automatically generate and check specifications from the code These three problems can be solved by automatically generating and checking specifications from the code. The user supplies source code. The system produces a specification and a proof that the implementation meets it.
Solution scope Generate and check “complete” specifications Very difficult Generate and check partial specifications Nullness, types, bounds, modification targets, ... Need not operate in isolation User might have some interaction Goal: decrease overall effort Producing complete specifications that fully characterize the input-output behavior of program components is very hard. Instead, the system will produce partial specifications which only describe certain properties of the code. They constrain the behavior, but do not fully characterize it. Furthermore, we allow that the system need not be perfect. It is not meant to be used in isolation, but will be used by programmers. Therefore, asking the user to do a little work is acceptable, as long as there is benefit to doing so.
Outline Motivation Approach: Generate and check specifications Evaluation: Accuracy experiment Conclusion We create specifications by generating them from the source code, and then checking that they are correct.
Previous approaches Generation: Checking By hand Static analysis Non-executable models There are a number of previous approaches, but all are generally lacking. Some have generated specifications by hand, but work by hand is tedious and error-prone. Some have used static analysis, but static analyses are often not powerful enough to produce useful results. Lighter-weight static checking often requires tedious annotation of programs. Some have checked specifications through non-executable models of the code, but models still have to be matched to the code, which is a difficult task in itself.
Our approach Dynamic detection proposes likely properties Static checking verifies properties Combining the techniques overcomes the weaknesses of each Ease annotation Guarantee soundness In our approach, we use dynamic detection to propose likely invariants, and static checking to verify them. Dynamic analysis may produce incorrect properties, so its results should be checked to increase confidence. Now I’ll explain each of the two parts in turn.
Daikon: Dynamic invariant detection Look for patterns in values the program computes: Instrument the program to write data trace files Run the program on a test suite Invariant detector reads data traces, generates potential invariants, and checks them The first part -- dynamic invariant detection -- looks for patterns in values the program computes at runtime. We use the Daikon invariant detector to perform this task. The program is instrumented and run over a test suite. This produces a trace database containing variable values. This database is processed by an invariant detector which generates a list of potential invariants, and checks which were true over the test suite. All of these steps are automatic, and the output is a specification that describes the program as it ran.
ESC/Java: Invariant checking ESC/Java: Extended Static Checker for Java Lightweight technology: intermediate between type-checker and theorem-prover; unsound Intended to detect array bounds and null dereference errors, and annotation violations /*@ requires x != null */ /*@ ensures this.a[this.top] == x */ void push(Object x); We use the Extended Static Checker for Java to perform the static checking. ESC/Java is a lightweight technology, intermediate between a type-checker and a theorem-prover. Its goal is to detect array bounds or null dereference errors, but it also reports when user-written assertions may not hold. Users annotate code with comments like these, which state a precondition that the argument is not null, and a postcondition that the argument is stored at the top of the stack. // [[[ breathe ]]] ESC/Java is a modular checker: it checks and relies on specifications. When checking a procedure, it assumes the preconditions and proves that the postconditions hold. When checking a call site, it proves that the preconditions are met and assumes the postconditions. ESC/Java is unsound. For instance, does not model arithmetic overflow. However, we have chosen it since it is the tool with the best integration with a real programming language. Modular: checks, and relies on, specifications
Integration approach Run Daikon over target program Insert results into program as annotations Run ESC/Java on the annotated program All steps are automatic. Our approach to integrating these tools is simple. We run Daikon over the target program; insert its results into the program as annotations; and run ESC/Java on the annotated program. All of these steps are automatic. // ======= With that background in mind, let’s look at an example of a specification generated and checked by our system.
Stack object invariants public class StackAr { Object[] theArray; int topOfStack; ... invariant theArray != null; invariant \typeof(theArray) == \type(Object[]); invariant topOfStack >= -1; invariant topOfStack < theArray.length; invariant theArray[0..topOfStack] != null; invariant theArray[topOfStack+1..] == null; /*@ invariant theArray != null; invariant \typeof(theArray) == \type(Object[]); invariant topOfStack >= -1; invariant topOfStack < theArray.length; invariant theArray[0..topOfStack] != null; invariant theArray[topOfStack+1..] == null; */ [Spend time explaining this example. (It is an example, not an experiment)] [Write this out and practice it.]
Stack push method public void push( Object x ) { ... } /*@ requires x != null; requires topOfStack < theArray.length - 1; modifies topOfStack, theArray[*]; ensures topOfStack == \old(topOfStack) + 1; ensures x == theArray[topOfStack]; ensures theArray[0..\old(topOfStack)]; == \old(theArray[0..topOfStack]); */ /*@ requires x != null; requires topOfStack < theArray.length - 1; modifies topOfStack, theArray[*]; ensures topOfStack == \old(topOfStack) + 1; ensures x == theArray[topOfStack]; ensures theArray[0..\old(topOfStack)]; == \old(theArray[0..topOfStack]); */ This is an example of one method’s specification as generated by our system. public void push( Object x ) { ... }
Stack summary ESC/Java verified all 25 Daikon invariants Reveal properties of the implementation (e.g., garbage collection of popped elements) No runtime errors if callers satisfy preconditions Implementation meets generated specification Daikon generated 25 invariants, and ESC/Java verified all of them. Now that you have some background on our approach, I can show some of the evaluation we’ve done.
Outline Motivation Approach: Generate and check specifications Evaluation: Accuracy experiment Conclusion To evaluate our approach, we performed an experiment to study the accuracy of the generated specifications.
Accuracy experiment Dynamic generation is potentially unsound How accurate are its results in practice? Combining static and dynamic analyses should produce benefits But perhaps their domains are too dissimilar? Dynamic specification generation is potentially unsound, but we were curious how accurate its results were in practice. Additionally, we speculated that combining static and dynamic analyses should produce benefits surpassing what each analysis could do on its own … but … perhaps the kinds of properties that each is able to describe are too different to see any benefit from the combination. We investigated these questions with and experiment.
Programs studied 11 programs from libraries, assignments, texts Total 2449 NCNB LOC in 273 methods Test suites Used program’s test suite if provided (9 did) If just example calls, spent <30 min. enhancing ~70% statement coverage We studied 11 small programs in the experiment. Programs were drawn from a data structures textbook, the java class library, or solutions to an MIT programming course. Together they were over 2400 NCNB lines of code..
Accuracy measurement Compare generated specification to a verifiable specification invariant theArray[0..topOfStack] != null; invariant theArray[topOfStack+1..] == null; invariant theArray != null; invariant topOfStack >= -1; invariant topOfStack < theArray.length; invariant theArray[0..length-1] == null; We ran these programs through the system and measured the result by comparing it to the nearest verifiable specification. The result may have been verifiable immediately. If not, we edited it so that it verified. We compare with a verifiable answer because it is a good measure of how much work needs to be done by a user of the system, and because it is objective. Standard measures from info ret [Sal68, vR79] Precision (correctness) : 3 / 4 = 75% Recall (completeness) : 3 / 5 = 60%
Experiment results Daikon reported 554 invariants Precision: 96% of reported invariants verified Recall: 91% of necessary invariants were reported The results of this experiment were surprisingly accurate.
Causes of inaccuracy Limits on tool grammars Daikon: May not propose relevant property ESC: May not allow statement of relevant property Incompleteness in ESC/Java Always need programmer judgment Insufficient test suite Shows up as overly-strong specification Verification failure highlights problem; helpful in fixing System tests fared better than unit tests The results of this experiment were surprisingly accurate.
Experiment conclusions Our dynamic analysis is accurate Recovered partial specification Even with limited test suites Enabled verifying lack of runtime exceptions Specification matches the code Results should scale Larger programs dominate results Approach is class- and method-centric What can we learn from these results? Many people believe that the results of a dynamic analysis will be terribly imprecise. But in fact, dynamic analysis accurately recovered a partial specification which was enough to prove the absence of runtime exceptions. We show this for small programs and tiny test suites. However, since the techniques are class- and method-centric, the accuracy should scale well to larger programs.
Value to programmers Generated specifications are accurate Are the specifications useful? How much does accuracy matter? How does Daikon compare with other annotation assistants? Answers at FSE'02 [[We were interested in seeing how Daikon’s accuracy affected users.]]
Outline Motivation Approach: Generate and check specifications Evaluation: Accuracy experiment Conclusion To conclude, ...
Conclusion Specifications via dynamic analysis Accurately produced from limited test suites Automatically verifiable (minor edits) Specification characterizes the code Unsound techniques useful in program development To conclude… I have shown that specifications generated via dynamic analysis can be accurately produced using limited test suites. In other words, the results of our unsound analysis are very close to correct. The result is nearly machine verifiable. Verification guarantees that the code encounters no runtime exceptions, and that the specification correctly characterizes the code. In a broader context, our results suggest that domains of analysis, such as specifications and verification, that traditionally been dominated by sound analyses, are also amenable to an unsound, dynamic analysis. [ Ask audience for questions. ]
Questions? [ Do not show this; the point of this slide is to separate my extra slides from the main part of the talk. ]
Formal specifications Precise, mathematical desc. of behavior [LG01] (Another type of spec: requirements documents) Standard definition; novel use Generated after implementation Still useful to produce [PC86] Many specifications for a program Depends on task e.g. runtime performance Paper carefully explains our views.
Effect of bugs Case 1: Bug is exercised by test suite Falsifies one or more invariants Weaker specification May cause verification to fail Case 2: Bug is not exercised by test suite Not reflected in specification Code and specification disagree Verifier points out inconsistency