Dynamically Discovering Likely Program Invariants to Support Program Evolution Presented By: Wes Toland, Geoff Gerfin Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin
Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion
Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion
Invariants What are invariants? A constraint over a variable’s values A relationship between multiple variable values. Defined as mathematical predicates (Example: n >= 0) What are invariants? A constraint over a variable’s values A relationship between multiple variable values. Defined as mathematical predicates (Example: n >= 0)
Importance of Invariants In program development: Refining a specification Aid in runtime checking In software evolution: Aid programmer in understanding functionality of undocumented program so incorrect assumptions are not made. Violation of invariant results in a bug. In program development: Refining a specification Aid in runtime checking In software evolution: Aid programmer in understanding functionality of undocumented program so incorrect assumptions are not made. Violation of invariant results in a bug.
Daikon Programmers do not usually explicitly annotate or document code with invariants. Daikon proposes to automatically determine program invariants and report them in a meaningful manner. Programmers do not usually explicitly annotate or document code with invariants. Daikon proposes to automatically determine program invariants and report them in a meaningful manner.
Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion
Daikon’s Infrastructure
Daikon’s Infrastructure: Original Program i,s := 0,0; do i != n -> i,s := i + 1, s + b[i] od
Daikon’s Infrastructure: Instrumented Program print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od
Daikon’s Infrastructure: Trace File print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od
Daikon’s Infrastructure: Invariants 1.) n >= 0 2.) s = SUM(B) 3.) i >= 0 Determined Invariants
Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion
Code Instrumentation (1/6)
Code Instrumentation (2/6) Daikon’s front-end modifies source code to trace specific variables at points of interest: Function entry points (pre-conditions) Function exit points (post-conditions) Loop heads (loop invariants) The trace data is used as input to Daikon’s back-end, which is used to infer invariants Daikon’s front-end modifies source code to trace specific variables at points of interest: Function entry points (pre-conditions) Function exit points (post-conditions) Loop heads (loop invariants) The trace data is used as input to Daikon’s back-end, which is used to infer invariants
Code Instrumentation (3/6) Daikon uses an abstract syntax tree for code instrumentation. What is an AST? Daikon uses an abstract syntax tree for code instrumentation. What is an AST?
Code Instrumentation (4/6) How could this be useful for code instrumentation?
Code Instrumentation (5/6) AST is used by Daikon to determine which variables are in scope at each point of interest. Code is inserted into program point to write the values for all variables in scope to a file in a specific format. AST is used by Daikon to determine which variables are in scope at each point of interest. Code is inserted into program point to write the values for all variables in scope to a file in a specific format.
Code Instrumentation (6/6) Status variables are created for each original program variable and are passed along throughout function calls. Status variables: Modification timestamp (Used to prevent garbage output) Smallest and largest indices (for arrays and pointers) Linked list flag Status variables are updated when a program manipulates its associated variable. Status variables are created for each original program variable and are passed along throughout function calls. Status variables: Modification timestamp (Used to prevent garbage output) Smallest and largest indices (for arrays and pointers) Linked list flag Status variables are updated when a program manipulates its associated variable.
Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion
Data Trace Generation (1/2)
Data Trace Generation (2/2) print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od Instrumented Code Data Trace DB
Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion
Inferring Invariants
Types of Invariants (1/3) Single Variables Constant Valuex = a Uninitialized Valuex = uninit Small Value Setx € {a,b,c} Single Numeric Variables Range Limitsx >= a, x <= b, etc… Non-zerox != 0 Modulusx = a (mod b) Non-Modulusx != a (mod b)
Types of Invariants (2/3) Two Numeric Variables Linear Relationshipy = ax + b Functional Relationshipy = f(x) Comparisonx > y, x = y, etc… Combinations of Single Numeric Values x+y = a (mod b) Three Numeric Variables Polynomial Relationshipz = ax + by + c
Types of Invariants (3/3) Single-sequence variables: Range (min and max values) Ordering (increasing or decreasing) Invariants over all elements (Given array[size], all elements >= c) Two-sequence variables Linear relationship ( y[100] = a*x[100] + b ) Comparison ( x < y where x[i] = y[i]-1 ) Reversal for(i = 0;i < length(y); i++) x[i]= y[length(y) - i] Sequence and numeric variables: Membership: ( i € s) Single-sequence variables: Range (min and max values) Ordering (increasing or decreasing) Invariants over all elements (Given array[size], all elements >= c) Two-sequence variables Linear relationship ( y[100] = a*x[100] + b ) Comparison ( x < y where x[i] = y[i]-1 ) Reversal for(i = 0;i < length(y); i++) x[i]= y[length(y) - i] Sequence and numeric variables: Membership: ( i € s)
Inferring Invariants (1/5) What invariants should be inferred from this method, regardless of the test suite input?
Inferring Invariants (2/5)
Inferring Invariants (3/5) Daikon can identify from this trace that for all samples, x = orig(x)
Inferring Invariants (4/5) Daikon can identify from this trace that for all samples, y = orig(y) = 1.
Inferring Invariants (5/5) Daikon can identify from this trace that for all samples, *x = orig(*x) + 1. Is this invariant too limited? Daikon can identify from this trace that for all samples, *x = orig(*x) + 1. Is this invariant too limited?
Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion
Uses of Invariants (1/2) Explicated Data Structures Clearly define undocumented data structures without looking through code. Confirmed and contradicted expectations Assert an understanding of code functionality. Example: It may appear that x is always less than y, which Daikon can verify for the programmer (assuming a valid test suite). Bug Discovery Explicated Data Structures Clearly define undocumented data structures without looking through code. Confirmed and contradicted expectations Assert an understanding of code functionality. Example: It may appear that x is always less than y, which Daikon can verify for the programmer (assuming a valid test suite). Bug Discovery
Uses of Invariants (2/2) Identify limited use of procedures Identify procedures that have unnecessary functionality based on the input. Demonstrate test suite inadequacy Reveal shortcoming of exercising all branches within a program by analyzing Daikon’s output. Validate program changes After a piece of code has been heavily modified, but should still abide by the original specifications, it is a good idea to compare the invariants. If they match, the programmer can be more confident that the modifications did not have adverse effects. Identify limited use of procedures Identify procedures that have unnecessary functionality based on the input. Demonstrate test suite inadequacy Reveal shortcoming of exercising all branches within a program by analyzing Daikon’s output. Validate program changes After a piece of code has been heavily modified, but should still abide by the original specifications, it is a good idea to compare the invariants. If they match, the programmer can be more confident that the modifications did not have adverse effects.
Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion
Evaluation Overview Asserting Daikon’s Invariant Detection Performance Evaluation Stability Evaluation Asserting Daikon’s Invariant Detection Performance Evaluation Stability Evaluation
Asserting Daikon’s Invariant Detection Simple accuracy evaluation of Daikon A sample program was taken from The Science of Programming The “gold standard” of invariant identification Program had documented precondition, postcondition, and loop variant specifications Daikon reproduced all documented specifications plus some additional invariants: Erroneously omitted (omitted in documentation) Information about the test suite Extraneous (Redundant invariants) Simple accuracy evaluation of Daikon A sample program was taken from The Science of Programming The “gold standard” of invariant identification Program had documented precondition, postcondition, and loop variant specifications Daikon reproduced all documented specifications plus some additional invariants: Erroneously omitted (omitted in documentation) Information about the test suite Extraneous (Redundant invariants)
Performance Evaluation Siemen’s replace program is used over varying test cases and number of variables. Most important factor: number of variables over which invariants are checked This is not the total number of program variables, rather it is the number of variables in a program point’s scope. Invariant detection time grows quadratically with this factor. Additionally, invariant detection time grows linearly with test suite size. Siemen’s replace program is used over varying test cases and number of variables. Most important factor: number of variables over which invariants are checked This is not the total number of program variables, rather it is the number of variables in a program point’s scope. Invariant detection time grows quadratically with this factor. Additionally, invariant detection time grows linearly with test suite size.
Performance Evaluation
Stability Evaluation Number of test cases affects different types of invariants in different ways: Note that the identical unary invariants do not vary much as the number of test cases are increased. However, the number of differing unary invariants varies largely.
Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion
Related Work (1/2) Static Approaches to Inferring Invariants Operate on program text, not test runs (symbolic execution) [Hoare69]. Advantages Reported invariants are true for any program run (but not necessarily exhaustive). Theoretically, static approaches can detect all sound invariants if a program is run to convergence. Limitations Omit properties that are true but uncomputable. Pointer manipulation is impossible to approximate. Static Approaches to Inferring Invariants Operate on program text, not test runs (symbolic execution) [Hoare69]. Advantages Reported invariants are true for any program run (but not necessarily exhaustive). Theoretically, static approaches can detect all sound invariants if a program is run to convergence. Limitations Omit properties that are true but uncomputable. Pointer manipulation is impossible to approximate.
Related Work (2/2) Dynamic Approaches to Inferring Invariants Event traces [Blum93]. Uses a state machine instead of AST. Advantage: Lower data storage requirements. Runtime switches based on user-inserted assert statements (Value Profiling) [Hanson93]. Dynamic Approaches to Inferring Invariants Event traces [Blum93]. Uses a state machine instead of AST. Advantage: Lower data storage requirements. Runtime switches based on user-inserted assert statements (Value Profiling) [Hanson93].
Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion
Limitations (1/2) Accuracy of inferred invariants depends on quality and completeness of test cases Additional test cases could provide data that will lead to additional invariants to be inferred. Additionally, invariants may only hold true for cases in test suite Daikon produces gigabytes of trace data, even while analyzing trivial programs. The initial prototype implementation ran out of memory when testing 5,542 test cases Accuracy of inferred invariants depends on quality and completeness of test cases Additional test cases could provide data that will lead to additional invariants to be inferred. Additionally, invariants may only hold true for cases in test suite Daikon produces gigabytes of trace data, even while analyzing trivial programs. The initial prototype implementation ran out of memory when testing 5,542 test cases
Limitations (2/2) The instrumenter, and therefore Daikon, is currently limited to C, Java, and Lisp. Daikon does not yet follow arbitrary-length paths through recursive structures. Daikon cannot compute invariants such as linear relationships over numerous variables (more than 3). Instrumenting the program by modifying object code (opposed to source code) would allow for improved precision and portability. Exact memory locations could be traced. This approach has many more obstacles. The instrumenter, and therefore Daikon, is currently limited to C, Java, and Lisp. Daikon does not yet follow arbitrary-length paths through recursive structures. Daikon cannot compute invariants such as linear relationships over numerous variables (more than 3). Instrumenting the program by modifying object code (opposed to source code) would allow for improved precision and portability. Exact memory locations could be traced. This approach has many more obstacles.
Future Work (1/2) Ernst et. al. planned on increasing relevance and performance after this work by: Reducing redundant invariance. Removing relations from variables that can be statically proven to be unrelated. Ignoring variables that have not been assigned since their last instrumentation. Converting the implementation of Daikon from Python to C. Checking fewer invariants (useful when programmer wants to focus on specific part of code) Ernst et. al. planned on increasing relevance and performance after this work by: Reducing redundant invariance. Removing relations from variables that can be statically proven to be unrelated. Ignoring variables that have not been assigned since their last instrumentation. Converting the implementation of Daikon from Python to C. Checking fewer invariants (useful when programmer wants to focus on specific part of code)
Future Work (2/2) Since paper publication: Additional front-end support: 2002: Perl (dfepl front-end implementation) 2005: C++ (Kvasir front-end implementation) 2003: Various performance improvements: Handle data trace files incrementally Original implementation stored entire trace file in memory 2005: IDE Plug-in support for Visual Studio Since paper publication: Additional front-end support: 2002: Perl (dfepl front-end implementation) 2005: C++ (Kvasir front-end implementation) 2003: Various performance improvements: Handle data trace files incrementally Original implementation stored entire trace file in memory 2005: IDE Plug-in support for Visual Studio
Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion
Questions???