Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented by: Nick Rutar
Program Invariants Useful in software development Protect programmers from making errant changes Verify properties of a program Can be explicitly stated in programs Programmers can annotate code with invariants This can take time and effort Many important invariants will be missed
Could there be a way to dynamically discover program invariants???
Daikon: An Invariant Detector Pick a source program (Daikon is language independent) Instrument source program to trace variables of interest Run instrumented program over test cases Infer variants over Instrumented variables (variables present in source) Derived variables Created variables that might be of interest
Derived Variables From any Sequence s Length: size(s) Extremal elements: s[0], s[1], s[-1], s[-2] From a numeric sequence sum(s), min(s), max(s) Any Sequence s and numeric variable(i) Element at index: s[i], s[i-1] Subsequences: s[0…i], s[0…i-1] From Function Invocations: Number of calls so far
Example Program (taken from “The Science of Programming”) i, s = 0; do i ≠ n i, s = i + 1, s + b[i] Precondition: n ≥ 0 Postcondition: s = ( j : 0 ≤ j < n : b[j]) Loop Invariant: 0 ≤ i ≤ n and s = ( j : 0 ≤ j < i : b[j])
Daikon results from the program (100 randomly generated input arrays of length 7-13) ENTER N = size(B) N in [7 … 13] B - All elements ≥ -100 EXIT N = I = orig(N) = size(B) B = orig(B) S = sum(B) N in [7 … 13] B - All elements ≥ -100 LOOP N = size(B) S = sum(B[0 … I -1]) N in [7 … 13] I in [0 … 13] I ≤ N B - all elements in [ ] sum(B) in [ ] B[0] nonzero in [-99.96] B[-1] in [-88.99] N != B[-1] B[0] != B[-1] *boxes indicate generated invariants that match expected ones
Original Program Instrumented Program Instrument Test Suite Run Detect Invariants Data Trace Invariants Architecture of the Daikon tool
Daikon has instrumenters for Java, C, and Lisp Source to Source Translation Determines which variables are in scope Inserts code to dump the variables into an output file Creates a declaration file Variables being instrumented Types in the original program Representations in the trace file Sets of variables that may be sensibly compared Operates only on scalar numbers and arrays of numbers. Scalar numbers includes characters and booleans Any other type is converted to one of these forms Original Program Instrumented Program Instrument
At each program point of interest Instrumented Program writes to a data trace file All variables in scope Global Variables Procedure Arguments Local Variables Return Values (at procedure exits) Modification bit Whether a value has been set since last time For small programs runtime may be I/O bound Instrumented Program Run Data Trace
Single variable invariants (numeric or sequence) Constant value: x = a (variable is a constant) Uninitialized: x = uninit (variable is never set) Modulus: x ≡ a mod b (x mod b = a always holds) Multiple variables up to 3 (numeric or sequence) Linear relationship: y = ax + b. Reversal: x is the reverse of y Invariants over x - y, x + y These are just a few Complete list can be found in the paper Domain-Specific invariants can easily be coded in Detect Invariants Data Trace Invariants
Run Time of Daikon Informally, can be characterized as Time = O( (vars³ x falsetime + trueinvs x testsuite) x program) vars is the number of variables at a program point (in scope) Most invariants are falsified quickly Only true invariants are checked for the entire run Potentially cubic because invariants involve at most 3 variables falsetime is the (small constant) time to falsify a potential invariant trueinvs is the (small) number of true invariants at a program point testsuite is the size of the test suite Must balance accuracy versus runtime program is the number of instrumented program points The default is proportional to the size of the program Users can control the extent of instrumentation
Invariant Stability Size of Test Suite Too Small Small number of invariants More false invariants Too large Increases runtime linearly Interesting vs. Uninteresting Different size test suites will have more/less invariants Uninteresting Difference in a bound on a variable’s range Different small set of possible values Interesting – everything else
Invariant Type/Test Cases Identical Unary Missing Unary Diff Unary Interesting Uninteresting Identical binary Missing Binary Diff Binary Interesting Uninteresting Invariant differences(2500-element test suite)
Invariants and Program Correctness Compare invariants detected across programs Correct versions of programs have more invariants than incorrect ones Examination of 424 intro C programs from U of Washington Given # of students, amount of money, # of pizzas, calculates whether the students can afford the pizzas. Chose eight relevant invariants people – [1…50] pizzas – [1…10] pizza_price – {9,11} excess_money – [0...40] slices = 8 * pizza slices = 0 (mod 8) slices_per – {0,1,2,3} slices_left people - 1
Relationship of Grade and Goal Invariants Grade Invariants Detected
Other Applications of Invariants Inserted as assert statements for testing Double-check existing documentation Check against existing assert statements Useful when program self-checks are ineffective Discovering Bugs Generate test cases or validate existing test suites Could possibly direct a correctness proof
Ongoing and Future Work Increasing Relevance Invariant is relevant if it assists programmer Repress invariants logically implied by others Unrelated variables don’t need to be compared Ignore variables not assigned since last time Viewing and Managing Invariants Overwhelming for a programmer to sort through Various tools for selective reporting of invariants Ordering by category Retrieves invariants based on supplied property List of invariants by program point
More Ongoing Work Improving Performance Balance between invariant quality and runtime Number of Derived Variables used Richer Invariants Invariants over Pointer based data structures Computing Conditional Invariants
Resources Daikon website Contains links to Papers Source Code User Manual Developers Manual
Questions???