Dynamically Discovering Likely Program Invariants to Support Program Evolution Presented By: Wes Toland, Geoff Gerfin Michael D. Ernst, Jake Cockrell,

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008
Marking Schema question1: 40 marks question2: 40 marks question3: 20 marks total: 100 marks.
50.530: Software Engineering Sun Jun SUTD. Week 10: Invariant Generation.
Mahadevan Subramaniam and Bo Guo University of Nebraska at Omaha An Approach for Selecting Tests with Provable Guarantees.
CS 111: Introduction to Programming Midterm Exam NAME _________________ UIN __________________ 10/30/08 1.Who is our hero? 2.Why is this person our hero?
Programming Logic and Design, Third Edition Comprehensive
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
Chapter Four Data Types Pratt 2 Data Objects A run-time grouping of one or more pieces of data in a virtual machine a container for data it can be –system.
ISBN Chapter 3 Describing Syntax and Semantics.
Michael Ernst, page 1 Learning and repair tools background Michael Ernst MIT Lab for Computer Science Joint work with Jake.
CSC401 – Analysis of Algorithms Lecture Notes 1 Introduction
Dynamic Invariant Discovery Modified from Tevfik Bultan’s original presentation.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
272: Software Engineering Fall 2008 Instructor: Tevfik Bultan Lecture 16: Dynamic Invariant Discovery.
Analysis of Algorithms1 Estimate the running time Estimate the memory space required. Time and space depend on the input size.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael Ernst, Jake Cockrell, William Griswold, David Notkin Presented by.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
CS Main Questions Given that the computer is the Great Symbol Manipulator, there are three main questions in the field of computer science: What kinds.
Describing Syntax and Semantics
An Object-Oriented Approach to Programming Logic and Design Chapter 7 Arrays.
School of Computer ScienceG53FSP Formal Specification1 Dr. Rong Qu Introduction to Formal Specification
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Ernst, ICSE 99, page 1 Dynamically Detecting Likely Program Invariants Michael Ernst, Jake Cockrell, Bill Griswold (UCSD), and David Notkin University.
1CMSC 345, Version 4/04 Verification and Validation Reference: Software Engineering, Ian Sommerville, 6th edition, Chapter 19.
CHAPTER 7: SORTING & SEARCHING Introduction to Computer Science Using Ruby (c) Ophir Frieder at al 2012.
CC0002NI – Computer Programming Computer Programming Er. Saroj Sharan Regmi Week 7.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
1 Chapter 5: Names, Bindings and Scopes Lionel Williams Jr. and Victoria Yan CSci 210, Advanced Software Paradigms September 26, 2010.
Verification and Validation Yonsei University 2 nd Semester, 2014 Sanghyun Park.
DySy: Dynamic Symbolic Execution for Invariant Inference.
Programming Logic and Design, Second Edition, Comprehensive
1 Debugging and Testing Overview Defensive Programming The goal is to prevent failures Debugging The goal is to find cause of failures and fix it Testing.
Introduction. 2COMPSCI Computer Science Fundamentals.
Software Testing Testing types Testing strategy Testing principles.
Compiler Construction
 2005 Pearson Education, Inc. All rights reserved Searching and Sorting.
The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.
02/12/2014 Presenter: Yuanhang Wang Instructor: Christoph Csallner 1 Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael.
An Object-Oriented Approach to Programming Logic and Design Fourth Edition Chapter 5 Arrays.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
CS 363 Comparative Programming Languages Semantics.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Dynamically Discovering Likely Program Invariants All material in this presentation is derived from documentation online at the Daikon website,
Visual Basic Programming
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Semantics In Text: Chapter 3.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
COP4020 Programming Languages Introduction to Axiomatic Semantics Prof. Robert van Engelen.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
Principle of Programming Lanugages 3: Compilation of statements Statements in C Assertion Hoare logic Department of Information Science and Engineering.
1 Assertions. 2 A boolean expression or predicate that evaluates to true or false in every state In a program they express constraints on the state that.
PLC '06 Experience in Testing Compiler Optimizers Using Comparison Checking Masataka Sassa and Daijiro Sudo Dept. of Mathematical and Computing Sciences.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
A First Book of C++ Chapter 4 Selection. Objectives In this chapter, you will learn about: –Relational Expressions –The if-else Statement –Nested if Statements.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
Programming Logic and Design Fifth Edition, Comprehensive Chapter 6 Arrays.
Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
A First Book of C++ Chapter 4 Selection.
Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.
Topics Introduction to File Input and Output
Topics Introduction to File Input and Output
50.530: Software Engineering
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

Dynamically Discovering Likely Program Invariants to Support Program Evolution Presented By: Wes Toland, Geoff Gerfin Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin

Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Invariants What are invariants? A constraint over a variable’s values A relationship between multiple variable values. Defined as mathematical predicates (Example: n >= 0) What are invariants? A constraint over a variable’s values A relationship between multiple variable values. Defined as mathematical predicates (Example: n >= 0)

Importance of Invariants In program development: Refining a specification Aid in runtime checking In software evolution: Aid programmer in understanding functionality of undocumented program so incorrect assumptions are not made. Violation of invariant results in a bug. In program development: Refining a specification Aid in runtime checking In software evolution: Aid programmer in understanding functionality of undocumented program so incorrect assumptions are not made. Violation of invariant results in a bug.

Daikon Programmers do not usually explicitly annotate or document code with invariants. Daikon proposes to automatically determine program invariants and report them in a meaningful manner. Programmers do not usually explicitly annotate or document code with invariants. Daikon proposes to automatically determine program invariants and report them in a meaningful manner.

Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Daikon’s Infrastructure

Daikon’s Infrastructure: Original Program i,s := 0,0; do i != n -> i,s := i + 1, s + b[i] od

Daikon’s Infrastructure: Instrumented Program print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od

Daikon’s Infrastructure: Trace File print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od

Daikon’s Infrastructure: Invariants 1.) n >= 0 2.) s = SUM(B) 3.) i >= 0 Determined Invariants

Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Code Instrumentation (1/6)

Code Instrumentation (2/6) Daikon’s front-end modifies source code to trace specific variables at points of interest: Function entry points (pre-conditions) Function exit points (post-conditions) Loop heads (loop invariants) The trace data is used as input to Daikon’s back-end, which is used to infer invariants Daikon’s front-end modifies source code to trace specific variables at points of interest: Function entry points (pre-conditions) Function exit points (post-conditions) Loop heads (loop invariants) The trace data is used as input to Daikon’s back-end, which is used to infer invariants

Code Instrumentation (3/6) Daikon uses an abstract syntax tree for code instrumentation. What is an AST? Daikon uses an abstract syntax tree for code instrumentation. What is an AST?

Code Instrumentation (4/6) How could this be useful for code instrumentation?

Code Instrumentation (5/6) AST is used by Daikon to determine which variables are in scope at each point of interest. Code is inserted into program point to write the values for all variables in scope to a file in a specific format. AST is used by Daikon to determine which variables are in scope at each point of interest. Code is inserted into program point to write the values for all variables in scope to a file in a specific format.

Code Instrumentation (6/6) Status variables are created for each original program variable and are passed along throughout function calls. Status variables: Modification timestamp (Used to prevent garbage output) Smallest and largest indices (for arrays and pointers) Linked list flag Status variables are updated when a program manipulates its associated variable. Status variables are created for each original program variable and are passed along throughout function calls. Status variables: Modification timestamp (Used to prevent garbage output) Smallest and largest indices (for arrays and pointers) Linked list flag Status variables are updated when a program manipulates its associated variable.

Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Data Trace Generation (1/2)

Data Trace Generation (2/2) print b, n; i,s := 0,0; do i != n -> print i, s, n, b[i]; i,s := i + 1, s + b[i] od Instrumented Code Data Trace DB

Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Inferring Invariants

Types of Invariants (1/3) Single Variables Constant Valuex = a Uninitialized Valuex = uninit Small Value Setx € {a,b,c} Single Numeric Variables Range Limitsx >= a, x <= b, etc… Non-zerox != 0 Modulusx = a (mod b) Non-Modulusx != a (mod b)

Types of Invariants (2/3) Two Numeric Variables Linear Relationshipy = ax + b Functional Relationshipy = f(x) Comparisonx > y, x = y, etc… Combinations of Single Numeric Values x+y = a (mod b) Three Numeric Variables Polynomial Relationshipz = ax + by + c

Types of Invariants (3/3) Single-sequence variables: Range (min and max values) Ordering (increasing or decreasing) Invariants over all elements (Given array[size], all elements >= c) Two-sequence variables Linear relationship ( y[100] = a*x[100] + b ) Comparison ( x < y where x[i] = y[i]-1 ) Reversal for(i = 0;i < length(y); i++) x[i]= y[length(y) - i] Sequence and numeric variables: Membership: ( i € s) Single-sequence variables: Range (min and max values) Ordering (increasing or decreasing) Invariants over all elements (Given array[size], all elements >= c) Two-sequence variables Linear relationship ( y[100] = a*x[100] + b ) Comparison ( x < y where x[i] = y[i]-1 ) Reversal for(i = 0;i < length(y); i++) x[i]= y[length(y) - i] Sequence and numeric variables: Membership: ( i € s)

Inferring Invariants (1/5) What invariants should be inferred from this method, regardless of the test suite input?

Inferring Invariants (2/5)

Inferring Invariants (3/5) Daikon can identify from this trace that for all samples, x = orig(x)

Inferring Invariants (4/5) Daikon can identify from this trace that for all samples, y = orig(y) = 1.

Inferring Invariants (5/5) Daikon can identify from this trace that for all samples, *x = orig(*x) + 1. Is this invariant too limited? Daikon can identify from this trace that for all samples, *x = orig(*x) + 1. Is this invariant too limited?

Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Uses of Invariants (1/2) Explicated Data Structures Clearly define undocumented data structures without looking through code. Confirmed and contradicted expectations Assert an understanding of code functionality. Example: It may appear that x is always less than y, which Daikon can verify for the programmer (assuming a valid test suite). Bug Discovery Explicated Data Structures Clearly define undocumented data structures without looking through code. Confirmed and contradicted expectations Assert an understanding of code functionality. Example: It may appear that x is always less than y, which Daikon can verify for the programmer (assuming a valid test suite). Bug Discovery

Uses of Invariants (2/2) Identify limited use of procedures Identify procedures that have unnecessary functionality based on the input. Demonstrate test suite inadequacy Reveal shortcoming of exercising all branches within a program by analyzing Daikon’s output. Validate program changes After a piece of code has been heavily modified, but should still abide by the original specifications, it is a good idea to compare the invariants. If they match, the programmer can be more confident that the modifications did not have adverse effects. Identify limited use of procedures Identify procedures that have unnecessary functionality based on the input. Demonstrate test suite inadequacy Reveal shortcoming of exercising all branches within a program by analyzing Daikon’s output. Validate program changes After a piece of code has been heavily modified, but should still abide by the original specifications, it is a good idea to compare the invariants. If they match, the programmer can be more confident that the modifications did not have adverse effects.

Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Evaluation Overview Asserting Daikon’s Invariant Detection Performance Evaluation Stability Evaluation Asserting Daikon’s Invariant Detection Performance Evaluation Stability Evaluation

Asserting Daikon’s Invariant Detection Simple accuracy evaluation of Daikon A sample program was taken from The Science of Programming The “gold standard” of invariant identification Program had documented precondition, postcondition, and loop variant specifications Daikon reproduced all documented specifications plus some additional invariants: Erroneously omitted (omitted in documentation) Information about the test suite Extraneous (Redundant invariants) Simple accuracy evaluation of Daikon A sample program was taken from The Science of Programming The “gold standard” of invariant identification Program had documented precondition, postcondition, and loop variant specifications Daikon reproduced all documented specifications plus some additional invariants: Erroneously omitted (omitted in documentation) Information about the test suite Extraneous (Redundant invariants)

Performance Evaluation Siemen’s replace program is used over varying test cases and number of variables. Most important factor: number of variables over which invariants are checked This is not the total number of program variables, rather it is the number of variables in a program point’s scope. Invariant detection time grows quadratically with this factor. Additionally, invariant detection time grows linearly with test suite size. Siemen’s replace program is used over varying test cases and number of variables. Most important factor: number of variables over which invariants are checked This is not the total number of program variables, rather it is the number of variables in a program point’s scope. Invariant detection time grows quadratically with this factor. Additionally, invariant detection time grows linearly with test suite size.

Performance Evaluation

Stability Evaluation Number of test cases affects different types of invariants in different ways: Note that the identical unary invariants do not vary much as the number of test cases are increased. However, the number of differing unary invariants varies largely.

Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Related Work (1/2) Static Approaches to Inferring Invariants Operate on program text, not test runs (symbolic execution) [Hoare69]. Advantages Reported invariants are true for any program run (but not necessarily exhaustive). Theoretically, static approaches can detect all sound invariants if a program is run to convergence. Limitations Omit properties that are true but uncomputable. Pointer manipulation is impossible to approximate. Static Approaches to Inferring Invariants Operate on program text, not test runs (symbolic execution) [Hoare69]. Advantages Reported invariants are true for any program run (but not necessarily exhaustive). Theoretically, static approaches can detect all sound invariants if a program is run to convergence. Limitations Omit properties that are true but uncomputable. Pointer manipulation is impossible to approximate.

Related Work (2/2) Dynamic Approaches to Inferring Invariants Event traces [Blum93]. Uses a state machine instead of AST. Advantage: Lower data storage requirements. Runtime switches based on user-inserted assert statements (Value Profiling) [Hanson93]. Dynamic Approaches to Inferring Invariants Event traces [Blum93]. Uses a state machine instead of AST. Advantage: Lower data storage requirements. Runtime switches based on user-inserted assert statements (Value Profiling) [Hanson93].

Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Limitations (1/2) Accuracy of inferred invariants depends on quality and completeness of test cases Additional test cases could provide data that will lead to additional invariants to be inferred. Additionally, invariants may only hold true for cases in test suite Daikon produces gigabytes of trace data, even while analyzing trivial programs. The initial prototype implementation ran out of memory when testing 5,542 test cases Accuracy of inferred invariants depends on quality and completeness of test cases Additional test cases could provide data that will lead to additional invariants to be inferred. Additionally, invariants may only hold true for cases in test suite Daikon produces gigabytes of trace data, even while analyzing trivial programs. The initial prototype implementation ran out of memory when testing 5,542 test cases

Limitations (2/2) The instrumenter, and therefore Daikon, is currently limited to C, Java, and Lisp. Daikon does not yet follow arbitrary-length paths through recursive structures. Daikon cannot compute invariants such as linear relationships over numerous variables (more than 3). Instrumenting the program by modifying object code (opposed to source code) would allow for improved precision and portability. Exact memory locations could be traced. This approach has many more obstacles. The instrumenter, and therefore Daikon, is currently limited to C, Java, and Lisp. Daikon does not yet follow arbitrary-length paths through recursive structures. Daikon cannot compute invariants such as linear relationships over numerous variables (more than 3). Instrumenting the program by modifying object code (opposed to source code) would allow for improved precision and portability. Exact memory locations could be traced. This approach has many more obstacles.

Future Work (1/2) Ernst et. al. planned on increasing relevance and performance after this work by: Reducing redundant invariance. Removing relations from variables that can be statically proven to be unrelated. Ignoring variables that have not been assigned since their last instrumentation. Converting the implementation of Daikon from Python to C. Checking fewer invariants (useful when programmer wants to focus on specific part of code) Ernst et. al. planned on increasing relevance and performance after this work by: Reducing redundant invariance. Removing relations from variables that can be statically proven to be unrelated. Ignoring variables that have not been assigned since their last instrumentation. Converting the implementation of Daikon from Python to C. Checking fewer invariants (useful when programmer wants to focus on specific part of code)

Future Work (2/2) Since paper publication: Additional front-end support: 2002: Perl (dfepl front-end implementation) 2005: C++ (Kvasir front-end implementation) 2003: Various performance improvements: Handle data trace files incrementally Original implementation stored entire trace file in memory 2005: IDE Plug-in support for Visual Studio Since paper publication: Additional front-end support: 2002: Perl (dfepl front-end implementation) 2005: C++ (Kvasir front-end implementation) 2003: Various performance improvements: Handle data trace files incrementally Original implementation stored entire trace file in memory 2005: IDE Plug-in support for Visual Studio

Outline Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion Introduction Overview Code Instrumentation Data Trace Generation Inferring Invariants Uses of Invariants Evaluation Related Work Limitations & Future Work Discussion

Questions???