Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael Ernst, Jake Cockrell, William Griswold, David Notkin Presented by Charles Song
What are Invariants “An invariant is a condition that does not change, or should not, if the system is working correctly.” – Wikipedia
Invariant Example int getDayOfMonth() { … } (0 < returned value <= 31)
Invariant Example a = x; Y = 0; (y = x – a) while (a != 0) { y = y + 1; a = a – 1; } (y = x – a) if x = 5 y = 0; x = 5; a = 5 y = 1; x = 5; a = 4 y = 2; x = 5; a = 3 y = 3; x = 5; a = 2 y = 4; x = 5; a = 1 y = 5; x = 5; a = 0
Invariants & Software Evolution Specify correct behavior of programs (Axiomatic Approach) Protect programmers from making changes that violate correct behavior
Explicit Invariants Invariants are great, where do we get some? Have programmers annotate code Automatically infer invariants
Technique Overview Dynamic Discovery of Invariants Execute a program on a collection of inputs Extract variable values Infer Invariants
Invariant Detection Engine Instrumentation Select program points at which to insert instrumentation Procedure entry and exit points Loop heads Select variables to examine at selected points All variables in scope
Invariant Detection Engine Selecting/Running test suites Require repeated execution of instrumentation points Accuracy of inferred invariants depends on quality of inputs
Invariant Detection Engine Inferring Invariants Use outputs of instrumented programs List invariants detected at each instrumented point
Invariants Checked Constants/small number of values Range (a < x < b), modulus Linear relationship (x = ay + bz + c) Comparisons (x < y) Functions (z = max(x, y)) Sequences (< 100, membership)
Other Invariants Negative invariants expected relationships but never observed determined by probability Derived variables array: first & last elem, length, subarray numeric array: sum, min, max function invocations
Staged Derivation & Inference Derived variables are not introduced until invariants are computed for variables if j >= len(A) then do not derive A[j]
Evaluations The Science of Programming with formal pre & post conditions, loop invariants detected stated properties and more Search/Replace C Program undocumented code most invariants remained unchanged changed invariants verified modifications
Performance Factors Number of variables in scope Most effect run-time (quadratic) Plot different sets of variables at same instrumentation point 10 derived variables for 1 original one Number of test cases Less effect on runtime (linear)
Invariant Stability 500, 1000, … 2500, 3000 test cases Compare unary and binary invariants Knee somewhere between 500 and 1000 Problems with pointers and uninitialized arrays
Performance Improvments Select interested parts of program Fewer test cases but risk of less precise output Check fewer invariants
Conclusions Automatically detect invariants in programs Encourage programmers to think in terms of invariants Not useful to programmers who knows exactly what they seek