Quickly Detecting Relevant Program Invariants

Slides:

Advertisements

Similar presentations

Michael Ernst, page 1 Learning and repair tools background Michael Ernst MIT Lab for Computer Science Joint work with Jake.

Advertisements

Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.

Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael Ernst, Jake Cockrell, William Griswold, David Notkin Presented by.

Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.

Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.

CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.

Ernst, ICSE 99, page 1 Dynamically Detecting Likely Program Invariants Michael Ernst, Jake Cockrell, Bill Griswold (UCSD), and David Notkin University.

1 Sections 1.5 & 3.1 Methods of Proof / Proof Strategy.

The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.

Testing. 2 Overview Testing and debugging are important activities in software development. Techniques and tools are introduced. Material borrowed here.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.

1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.

M1G Introduction to Programming 2 3. Creating Classes: Room and Item.

Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.

Programming Logic and Design Fifth Edition, Comprehensive Chapter 6 Arrays.

Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.

CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 17 – Specifications, error checking & assert.

Proof And Strategies Chapter 2. Lecturer: Amani Mahajoub Omer Department of Computer Science and Software Engineering Discrete Structures Definition Discrete.

Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.

A Guide to Critical Thinking Concepts and Tools

Creativity of Algorithms & Simple JavaScript Commands

Advanced Algorithms Analysis and Design

A Review of Software Testing - P. David Coward

User-Written Functions

Regression Testing with its types

Debugging Intermittent Issues

Linked Lists in Action Chapter 5 introduces the often-used data public classure of linked lists. This presentation shows how to implement the most common.

Weakest Precondition of Unstructured Programs

Types for Programs and Proofs

Experimental Psychology

Unit 5: Hypothesis Testing

CompSci 230 Software Construction

Input Space Partition Testing CS 4501 / 6501 Software Testing

CS 326 Programming Languages, Concepts and Implementation

Testing Hypotheses About Proportions

A paper on Join Synopses for Approximate Query Answering

CSE 374 Programming Concepts & Tools

Testing UW CSE 160 Winter 2017.

Debugging Intermittent Issues

CSE 311 Foundations of Computing I

Reasoning About Code.

Reasoning about code CSE 331 University of Washington.

Predicting problems caused by component upgrades

Testing UW CSE 160 Spring 2018.

Symbolic Implementation of the Best Transformer

Design by Contract Fall 2016 Version.

The Future of Software Engineering: Tools

Optimizing Malloc and Free

Designing and Debugging Batch and Interactive COBOL Programs

Unit# 9: Computer Program Development

Objects First with Java

Automated Support for Program Refactoring Using Invariants

Testing UW CSE 160 Winter 2016.

Objective of This Course

Testing Hypotheses about Proportions

Java Programming Loops

Arrays in Java What, why and how Copyright Curt Hill.

Version Spaces Learning

Coding Concepts (Basics)

Chapter 13 Quality Management

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Significance Tests: The Basics

Testing Hypotheses About Proportions

ECE 352 Digital System Fundamentals

Programming Logic and Design Fifth Edition, Comprehensive

Java Programming Loops

False discovery rate estimation

This Lecture Substitution model

Data Structures & Algorithms

Presentation transcript:

Quickly Detecting Relevant Program Invariants Michael Ernst, Adam Czeisler, Bill Griswold (UCSD), and David Notkin University of Washington http://www.cs.washington.edu/homes/mernst/daikon [Be excited! Have a twinkle in my eye! This is good research; it is exciting; share the fun I had and enthusiasm I feel.] [Smile when I make a joke -- don’t be too dry, don’t sound mean.] [Don’t go too fast through first part of talk.] [Make eye contact.] This is a technique for helping programmers to cope with inadequately documented programs. Invite questions and comments. Daikon: an Asian radish (not the Asian radish) Perhaps don’t say this, but this is but a summary: I can’t talk about every aspect of this research, but welcome questions either now or in individual meetings. 30-minute slot, assume 25 minutes for talk itself.

Overview Goal: improve dynamic invariant detection [ICSE 99, TSE] Relevance improvements: add desired invariants (2 techniques) eliminate undesired ones (3 techniques) Experiments validate the success Undesired invariants are neither checked nor reported I won’t talk so much about the experiments. (See the paper.)

Program invariants Detect invariants (as in asserts or specifications) x > abs(y) x = 16*y + 4*z + 3 array a contains no duplicates for each node n, n = n.child.parent graph g is acyclic Say “at a program point.” …or in formal specifications. (The last three are not all that likely in an assert statement…) Segue: why would we want to find them? Now I will motivate that.

Uses for invariants Write better programs [Gries 81, Liskov 86] Document code Check assumptions: convert to assert Maintain invariants to avoid introducing bugs Locate unusual conditions Validate test suite: value coverage Provide hints for higher-level profile-directed compilation [Calder 98] Bootstrap proofs [Wegbreit 74, Bensalem 96] Invariants are useful in many aspects of program development, including design, coding, verification, testing, optimization, and maintenance. [This is a laundry list of potential uses for invariants. I'm trying to motivate why one would care about invariants.] Better programs when invariants are used in their design. But here I will focus on uses of invariants in already-constructed programs. Documentation is guaranteed to be up-to-date; for assert, is machine-checkable. Invariants established at one point may be depended upon elsewhere. This was the original motivation for the system. There’s evidence that maintenance introduces many bugs (and anecdotal evidence that some are due to violating invariants). The next four are arguably more useful with dynamic than with static invariants: Unusual (exceptional) conditions may indicate bugs or special cases that should be brought to a programmer’s attention. (But finding bugs isn’t my focus or what I consider the tool’s strength, though it was effective at that task. I would rather prevent a bug than have to find it any day.) Value coverage may be inadequate even if a test suite covers every line of a program. [Example: only small values.] Use invariants like profile-directed compilation, but at a higher and possibly more valuable level (cf “Value Profiling and Optimization” by Calder et al). Bootstrap proofs: or, formally prove the invariants. In theorem-proving, many consider it harder to determine what to prove than to actually do the proof. [Don’t say “’turn the crank”; it belittles theorem-proving.] [If you prove the property, the system becomes sound: but probably don’t raise this red flag yet.] Now, I’ll try to convince you that this is at least possible.

Dynamic invariant detection is accurate Recovered formal specifications, found bugs Target programs: The Science of Programming [Gries 81] Program checkers [Detlefs 98, Xi 98] MIT 6.170 student programs Data Structures and Algorithm Analysis in Java [Weiss 99]

Dynamic invariant detection is useful 563-line C program: regexp search & replace [Hutchins 94, Rothermel 98] Explicated data structures Contradicted expectations, preventing bugs Revealed bugs Showed limited use of procedures Improved test suite Validated program changes

Dynamic invariant detection Look for patterns in values the program computes: Instrument the program to write data trace files Run the program on a test suite Invariant engine reads data traces, generates potential invariants, and checks them [No setup needed; dive right into this.] [Remain excited! (Here and on next slide.)] All of these steps are automatic. Instrumenters are source-to-source translators. Define the front end here, or say “instrumenter” instead of “front end”. Possibly define back end here; but I think I don’t use that term.

Checking invariants For each potential invariant: instantiate (determine constants like a and b in y = ax + b) check for each set of variable values stop checking when falsified This is inexpensive: many invariants, each cheap All three steps are inexpensive: instantiation, checking, and number of checks (most checks fail quickly). There are many invariants, but each one is cheap. Can think of the property as a generalization from the first few values; then check the rest of the values. Example: 2 degrees of freedom in y=ax+b, so two (x,y) pairs uniquely determine a and b. (I don’t actually do it that way for reasons of numerical stability, but that is the idea.)

Relevance Usefulness to a programmer for a task Contingent on task and programmer We manually classified invariants Perfect output is unnecessary (and impossible) Programmer: style, knowledge of the codebase Manually classified: based on our own judgment and careful examination of the program and the test cases. We made every effort to be fair and objective in our assessments. Full set of relevant invariants unknowable. Main thing is to help people. For this talk, we’ll focus on improvement more than absolute numbers like recall and precision. The usefulness of a set of invariants depends on the absolute and relative number of relevant invariants, their organization and presentation, the amount of time it takes to compute them (on-demand or incremental invariant computation complicates matters further), and other factors.

Improved invariant relevance Add desired invariants: 1. Implicit values 2. Unused polymorphism Eliminate undesired invariants (and improve performance): 3. Unjustified properties 4. Redundant invariants 5. Incomparable variables Trade off performance and completeness. The heuristics trade off performance, completeness, and output size. (Bigger output might be a plus or a minus.) [Make it clear that this is all completed work; I have solved these problems.] A naive implementation wouldn’t (and didn’t) produce that exact output. The simplified picture I have given you so far isn’t enough; it wouldn’t produce all the output you saw in the initial examples, and it would produce other undesirable output. I will now give some additional details. Unused polymorphism: overly wide types. I will discuss each of these in turn. Collectively/overall, how much improvement? I can’t say, because these are not just quantitative but qualitative improvements.

1. Implicit values Goal: relationships over non-variables Examples: for array a: length(a), sum(a), min(a), max(a) for array a and scalar i: a[i], a[0..i] for procedure p: #calls(p) [Do not say “expressions not appearing in the source”.] [We saw some of these in the previous output.] These are a few examples.

Derived variables Successfully produces desired invariants Adds many new variables Potential problems: slowdown: interleave derivation and inference irrelevant invariants: techniques 3–5, later in talk [Don’t say “false positive”; it has a technical meaning and people won’t know whether the false positive is regarding truth or relevance.] The point of the many new variables is the invariants over them. The solution to “slowdown” is here on this slide. The solution to “irrelevant invariants” will come soon. Staged derivation and invariant inference avoid deriving meaningless values avoid computing tautological invariants

2. Unused polymorphism Variables declared with general type, used with more specific type Example: given a generic list that contains only integers, report that the contents are sorted Also applicable to subtype polymorphism [Slowly and clearly explain the example.] Polymorphism available in the source code but not used during execution. “In the interest of time”, skip this section.

Unused polymorphism example class MyInteger { int value; … } class Link { Object element; Link next; … } class List { Link header; … } List myList = new List(); for (int i=0; i<10; i++) myList.add(new MyInteger(i)); Desired invariant: in class List, header.closure(next) is sorted by  over key .element.value [Slowly and clearly explain the example.] Also works for subtype polymorphism! Polymorphism available in the source code but not used during execution. This is actual code, not a contrived example (actually it’s simplified). (MyInteger is used instead of Integer because Java 1.1’s Integer doesn’t support the Comparable interface, but myInteger does (though I haven’t shown that on the slide).)

Polymorphism elimination Daikon respects declared types Pass 1: front end outputs object ID, runtime type, and all known fields Pass 2: given refined type, front end outputs more fields Sound for deterministic programs Effective for programs tested so far The Daikon implementation is designed to be simple, basic, and language-independent, to reduce errors and increase robustness. As a result, Daikon has no access to run-time types, and in fact the only types seen by the engine are integers, strings, and sequences, for predeclared variables. The decision about what fields to output is made statically, based on what fields the object is known to contain. The essential idea is to rewrite the program with stricter (and more accurate) types. We determine those stricter types by doing a dynamic analysis, of course! [The feedback mechanism isn’t completely automatic. The annotation is currently done by hand -- and intentionally so. When the output may be inaccurate, I choose to put a human in the loop.] Slide used to say “Annotate with refined type discovered by pass 1”. Pass 2: front end reads source comments, proceeds under the assumption that the object is actually of the specified type. Sound for deterministic programs: the attempted type downcast (the attempt to access a specific field) will never fail. But Daikon traps any such errors that occur anyway.

3. Unjustified properties Given three samples for x: x = 7 x = – 42 x = 22 Potential invariants: x  0 x  22 x  – 42 “Given three samples for x”: a particular program point is executed just three times by a test suite. These are true but not justified. “Not the sort of conclusion you or I would draw.”

Statistical checks Check hypothesized distribution To show x  0 for v values of x in range of size r, probability of no zeroes is Range limits (e.g., x  22): same number of samples as neighbors (uniform) more samples than neighbors (clipped) Statistical test over hypothesized distribution (are the hypothesized distributions meaningful?) Much of the tool can be viewed as tests of hypothesized distributions. x  0: say range is from -r/2 to r/2 (approximately); assumes uniform distribution. [The hypothesized distribution is always: the uniform distribution. Don’t say that early in the slide, only mention uniform distribution when discussing range limit test.] Recall that this is over multiple tests in a test suite, not a single test. There is a test like this for every invariant. This indicates the value of reporting the invariant, it is not a statistical test of the probability that it is really true over all possible executions. This solves much of the problem, but some such undesirable invariants remain.

Duplicate values Array sum program: b is unchanged inside loop // Sum array b of length n into variable s. i := 0; s := 0; while i  n do { s := s+b[i]; i := i+1 } b is unchanged inside loop Problem: at loop head, –88  b[n – 1]  99 –556  sum(b)  539 Reason: more samples inside loop These were eliminated from the procedure entry and exit, but remain at the loop head. These values (–556, 539, –88, 99) happened to be the extrema -- the largest and smallest values seen on this particular set of runs of the program.

Disregard duplicate values Idea: count a value if its var was just modified Front end outputs modification bit per value compared techniques for eliminating duplicates Result: eliminates undesired invariants A variable value contributes to invariant confidence at a program point if the variable was assigned to since the last time the program point was executed. [Tell the concept here, not just the implementation.] You can think of this as a dirty bit. [No need to avoid the term “lvalue”. Used to say “at each assignment, timestamp the lvalue”.] Actually there is a bit per lvalue/program point pair. [Used to say “if small test suite, eliminates desired invariants”; but don’t raise that red flag.] Don’t check just whether the value is the same, but whether the variable holding that value was assigned to.

4. Redundant invariants Given: 0  i  j Redundant: a[i]  a[0..j] max(a[0..i])  max(a[0..j]) Redundant invariants are logically implied Implementation contains many such tests [Say “greater than or equal to”, not “greater than”.] May be implied by one other invariant or by a collection of other invariants.

Suppress redundancies Avoid deriving variables: suppress 25-50% equal to another variable nonsensical (a[i] when i < 0) Avoid checking invariants: false invariants: trivial improvement true invariants: suppress 90% Avoid reporting trivial invariants: suppress 25% [Don’t say “tautologically equal”.] Make it clear that all of this is a good thing. First two are runtime improvements; third is UI improvement. Equal to another variable. EXAMPLE. Nonsensical: a[i] when i<0 (or when i>length(a) ) Staged inference and derivation is key to avoiding deriving variables; that is why this works. All this is a substantial chunk of code. All of the percentages are cumulative. It’s not just a quantitative but a qualitative improvement; system wouldn’t run without staging.

5. Unrelated variables Problem: the following are of no interest bool b; int *p; b < p int myweight, mybirthyear; myweight < mybirthyear Here are two examples of variables that shouldn’t be compared to one another. “non-null pointer p”. These are true and completely uninteresting. 150 pounds is 68 kilograms

Limit comparisons Check relations only over comparable variables declared program types Lackwit [O’Callahan 97]: value flow analysis based on polymorphic type inference Evaluated several solutions for limiting comparisons; here I will discuss two of them. Repeat the birthyear/weight example here, for concreteness. [This slide used to say “potentially miss serendipitous invariants”. I don’t want to get into that here. In any event, the technique is already incomplete, and this is a tradeoff between completeness and efficiency.]

Comparability results Comparisons: declared types: 60% as many comparisons Lackwit: 5% as many comparisons; scales well Runtime: 40-70% improvement Few differences in reported invariants Declared types are worse for larger procedures Lackwit types are better for larger procedures “Few differences”: among any of the three techniques. This is a success and a failure: comparison numbers are new (and gratifyingly low) runtime improvement the system already does a good job of eliminating irrelevant invariants (Lackwit is largely, but not entirely, irrelevant) All the invariants eliminated by Lackwit were uninteresting (so it improved output a bit, but not a huge amount).

Future work Online inference Proving invariants Characterize good test suites New invariants: temporal, existential User interface control over instrumentation display and manipulation of invariants Further experimental evaluation apply to more and bigger programs apply to a variety of tasks [Be excited about these opportunities!] (I have a number of interests and ideas for research directions. These are a few of the potential directions for this particular project.) These are in addition to ideas I have already mentioned, such as test suites, proving, etc. Existential quantifiers: Daikon can already do universals such as “all array elements are zero”; more challenging to detect “the array always contains at least one zero element”. The technique can be similar to what Daikon currently does: read a few values, generalize, and test the generalization. [Even/odd is a bad example because they are exclusive.] Experimental evaluation: be ready to mention specific people and tasks: testing: York; Hoffman proving: ESC, Hongwei Xi pedagogy (for student programs): UW, MIT

Related work Dynamic inference Static inference inductive logic programming [Bratko 93, Cypher 93] program spectra [Reps 97, Harrold 98] finite state machines [Boigelot 97, Cook 98] Static inference checking specifications [Detlefs 96, Evans 96, Jacobs 98] specification extension [Givan 96, Hendren 92] other [Jeffords 98, Henry 90, Ward 96] [Don’t say “offline”; Denise hates it.] Skip this slide, of course

Conclusions Naive implementation is infeasible Relevance improvements: accuracy, performance add desired invariants eliminate undesired invariants Experimental validation Dynamic invariant detection is promising for research and practice I claimed that dynamic invariant detection is accurate and useful. That’s true, but a straightforward unimaginative implementation of the technique has problems with missing invariants, uninteresting invariants, and it just doesn’t run. It is not obvious that this could be done at all: the reaction of various people, including my advisor, was that it is impossible. My advisor’s was not a minority opinion: that was the general consensus. Now people tell me it is trivial. Neither extreme view is true. Even after I had the crucial insight, I still had to come up with quite a few techniques, and do quite a bit engineering, to make the system work. The implementation is adequate for at least small programs. This slide is conclusions and contributions. Make clear that the idea of dynamic invariant inference is mine. [Don’t say “more work needs to be done”. These are exciting opportunities, not onerous duties/responsibilities.] [Don’t put down my work. Don’t call it simple (that insults people who don’t find it so and doesn’t do justice to my significant hard work).]

Questions? The point of this slide is to separate my extra slides from the main part of the talk.

Ways to obtain invariants Programmer-supplied Static analysis: examine the program text [Cousot 77, Gannod 96] properties are guaranteed to be true pointers are intractable in practice Dynamic analysis: run the program complementary to static techniques Omit this slide? [Joke] I have heard that some programs lack exhaustively-specified invariants; we’ll assume for the rest of the talk that we have such invariants. Claim: programmers have these in mind, consciously or unconsciously, when they write code. We want access to that hidden part of the design space. We might want to check/refine/strengthen programmer-supplied invariants, and the program may be modified/maintained. (Leveson has noted that program self-checks are often ineffective.) . Static analysis: For instance, abstract interpretation. The results of a conservative, sound analysis are guaranteed to be true. Static techniques can’t report true but undecidable properties. (or properties of a program context) Pointers: problematic to represent program state and the heap; must approximate, resulting in too-weak results. Dynamic techniques are complementary to static ones.

Unused polymorphism example class MyInteger { int value; … } class Link { Object element; Link next; … } class List { Link header; … } List myList = new List(); for (int i=0; i<10; i++) myList.add(new MyInteger(i)); Desired invariant: in class List, header.closure(next).element.value: sorted by  [Slowly and clearly explain the example.] Also works for subtype polymorphism! Polymorphism available in the source code but not used during execution. This is actual code, not a contrived example (actually it’s simplified). (MyInteger is used instead of Integer because Java 1.1’s Integer doesn’t support the Comparable interface, but myInteger does (though I haven’t shown that on the slide).)

Comparison with AI Dynamic invariant detection: Can be formulated as an AI problem Cannot be solved by current AI techniques not classification or clustering no noise no negative examples; many positive examples intelligible output Viewed as an AI problem: search space, bias, etc.; a form of data mining. Not classification, clustering: I'm looking for higher-order notions in the output. No noise: typical statistical and AI techniques don’t work. Not a regression problem. Intelligible output: compare to neural networks. Also consider the suppression of irrelevant output. Various AI techniques can handle some of these issues; none can handle them all, so I can be viewed as doing forefront AI research as well as software engineering. (A potential exception: version spaces. (No negative examples (but then doesn’t eliminate unjustified hypotheses), can’t deal with noise.) But not really.) Some of my work was inspired by work in AI; I have had fruitful interactions with AI researchers; and since current AI techniques are likely to be applicable to parts of my problem, I expect these interactions to continue.

Is implication obvious? Want: size(topOfStack.closure(next)) = size(orig(topOfStack.closure(next))) + 1 Get: size(topOfStack.next.closure(next)) = size(topOfStack.closure(next)) – 1 topOfStack.next.closure(next) = orig(topOfStack.closure(next)) Solution: interactive UI, queries on variables