Dynamic Invariant Discovery Modified from Tevfik Bultan’s original presentation.

Slides:



Advertisements
Similar presentations
Semantics Static semantics Dynamic semantics attribute grammars
Advertisements

A MATLAB function is a special type of M-file that runs in its own independent workspace. It receives input data through an input argument list, and returns.
50.530: Software Engineering Sun Jun SUTD. Week 10: Invariant Generation.
Michael Ernst, page 1 Learning and repair tools background Michael Ernst MIT Lab for Computer Science Joint work with Jake.
Chapter 12 Simple Regression
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
1 CSE1301 Computer Programming: Lecture 15 Flowcharts and Debugging.
272: Software Engineering Fall 2008 Instructor: Tevfik Bultan Lecture 16: Dynamic Invariant Discovery.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael Ernst, Jake Cockrell, William Griswold, David Notkin Presented by.
IMSE Week 18 White Box or Structural Testing Reading:Sommerville (4th edition) ch 22 orPressman (4th edition) ch 16.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Testing an individual module
1 CSE1301 Computer Programming: Lecture 15 Flowcharts, Testing and Debugging.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Programming Logic and Design, Introductory, Fourth Edition1 Understanding Computer Components and Operations (continued) A program must be free of syntax.
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 5 Data Flow Testing
Ernst, ICSE 99, page 1 Dynamically Detecting Likely Program Invariants Michael Ernst, Jake Cockrell, Bill Griswold (UCSD), and David Notkin University.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 14.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
Presented By Dr. Shazzad Hosain Asst. Prof., EECS, NSU
CMSC 345 Fall 2000 Unit Testing. The testing process.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Presented By: Wes Toland, Geoff Gerfin Michael D. Ernst, Jake Cockrell,
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
Chapter 3 (Part 3): Mathematical Reasoning, Induction & Recursion  Recursive Algorithms (3.5)  Program Correctness (3.6)
© by Kenneth H. Rosen, Discrete Mathematics & its Applications, Sixth Edition, Mc Graw-Hill, 2007 Chapter 4 (Part 3): Mathematical Reasoning, Induction.
Agenda Introduction Overview of White-box testing Basis path testing
Basic Concepts in Number Theory Background for Random Number Generation 1.For any pair of integers n and m, m  0, there exists a unique pair of integers.
1 Software Testing. 2 Path Testing 3 Structural Testing Also known as glass box, structural, clear box and white box testing. A software testing technique.
Introduction to Programming (in C++) Loops Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC.
The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
02/12/2014 Presenter: Yuanhang Wang Instructor: Christoph Csallner 1 Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael.
Coverage Criteria for Testing of Object Interactions in Sequence Diagrams Atanas (Nasko) Rountev Scott Kagan Jason Sawin Ohio State University.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Dynamically Discovering Likely Program Invariants All material in this presentation is derived from documentation online at the Daikon website,
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
1 Program Testing (Lecture 14) Prof. R. Mall Dept. of CSE, IIT, Kharagpur.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Chapter 4 Controlling Execution CSE Objectives Evaluate logical expressions –Boolean –Relational Change the flow of execution –Diagrams (e.g.,
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
CS451 Lecture 10: Software Testing Yugi Lee STB #555 (816)
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
Higher Computing Science 2016 Prelim Revision. Topics to revise Computational Constructs parameter passing (value and reference, formal and actual) sub-programs/routines,
1 Phase Testing. Janice Regan, For each group of units Overview of Implementation phase Create Class Skeletons Define Implementation Plan (+ determine.
1 Test Coverage Coverage can be based on: –source code –object code –model –control flow graph –(extended) finite state machines –data flow graph –requirements.
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
STROUD Worked examples and exercises are in the text Programme 10: Sequences PROGRAMME 10 SEQUENCES.
1 CSE1301 Computer Programming: Lecture 16 Flow Diagrams and Debugging.
Loops ( while and for ) CSE 1310 – Introduction to Computers and Programming Alexandra Stefan 1.
Verification vs. Validation Verification: "Are we building the product right?" The software should conform to its specification.The software should conform.
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 4 Control Flow Testing
Compiler Construction (CS-636)
Structural testing, Path Testing
Chapter 10 Verification and Validation of Simulation Models
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Quantitative Reasoning
CSE 1020:Software Development
The Selection Problem.
50.530: Software Engineering
UNIT-4 BLACKBOX AND WHITEBOX TESTING
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

Dynamic Invariant Discovery Modified from Tevfik Bultan’s original presentation

Specifications Specification techniques help software developers document design decisions at a higher level of abstraction than the code –Unfortunately software developers do not like to write them! Developers may believe that the cost/benefit ratio is not attractive In practice, many systems have no specifications or even comments

Reverse Engineering Reverse engineering is the process of analyzing a subject system –to identify the system’s components and their inter-relationships, –to create representations of the system in another form or at a higher level of abstraction Examples –Producing call graphs or control flow graphs from the source code –Generating class diagrams from the source code Two types of reverse engineering –Redocumentation: the creation or revision of a semantically equivalent representation within the same relative abstraction layer –Design recovery: involves identifying meaningful higher level abstractions beyond those obtained directly by examining the system itself

Reverse Engineering The main goal is to help with the program comprehension Reverse engineering tries to compensate for missing documentation

Dynamically Discovering Likely Invariants Today I will talk about another reverse engineering approach References –Dynamically Discovering Likely Program Invariants to Support Program Evolution, M.D. Ernst, J. Cockrell, W.G. Griswold, D. Notkin, ICSE 1999 There is also a tool which implements the techniques called Daikon –works on C, C++, Java, Lisp –Can be downloaded from

Dynamically Discovering Likely Invariants The main idea is to discover likely program invariants from a given program In this work, an ``invariant’’ means an assertion that holds at a particular program point (not necessarily all program points) They discover likely program invariants –there is no guarantee that the discovered invariants will hold for all executions

Dynamically Discovering Likely Invariants Discovered properties are not stated in any part of the program –They are discovered by monitoring the execution of the program on a set of inputs (a test set) –Discovered properties held for all the inputs in the test set –No guarantee of soundness or completeness

An Example An example from “The Science of Programming,” by Gries, 1981 –One of the classic references on programming using assertions, Hoare Logic, weakest preconditions, etc. Program // Sum array b of length n into variable s i = 0; s = 0; while (i != n) { s = s + b[i]; i = i+1; } Precondition: n  0 Postcondition: s = (  j : 0  j  n : b[j]) Loop invariant: 0  i  n and s = (  j : 0  j  i : b[j])

An Example The test set used to discover invariants has 100 randomly-generated arrays –Length is uniformly distributed from 7 to 13 –Elements are uniformly distributed from –100 to 100 Daikon discovers invariants by –running the program on this test set –monitoring the values of the variables

Discovered Invariants :::ENTER100 samples N = size(B)(7 values) N in [7..13](7 values) B(100 values) All elements >= -100(200 values) These are the assertions that hold at the entry to the procedure –likely preconditions The invariant in the box implies the precondition of the original program (it is a stronger condition that implies the precondition that N is non-negative) Size of the test set Number of distinct values for N

Discovered Invariants :::EXIT100 samples N = I = orig(N) = size(B)(7 values) B = orig(B)(100 values) S = sum(B) (96 values) N in [7..13](7 values) B(100 values) All elements >= -100(200 values) These are the assertions that hold at the procedure exit –likely postconditions Note that orig(B) corresponds to Old.B in contracts

Discovered Invariants :::LOOP1107 samples N = size(B)(7 vallues) S = sum(B[0..I-1]) (452 values) N in [7..13](7 values) I in [0..13](14 values) I <= N(77 values) B(100 values) All elements in [ ](200 values) sum(B) in [ ](96 values) B[0] nonzero in [ ](79 values) B[-1] in [ ](80 values) B[0..I-1](985 values) All elements in [ ](200 values) N != B[-1](99 values) B[0] != B[-1](100 values) These are the assertions that hold at the loop entry and exit –likely loop invariants Means last element

A Different Test Set Instead of using a uniform distribution for the length and the contents of the array an exponential distribution is used The expected values for the array lengths and the element values are same for both test sets

Discovered Invariants :::ENTER100 samples N = size(B)(24 values) N >= 0 (24 values)

Discovered Invariants :::EXIT100 samples N = I = orig(N) = size(B)(24 vallues) B = orig(B)(96 values) S = sum(B) (95 values) N >= 0(24 values)

Discovered Invariants :::LOOP1107 samples N = size(B)(24 vallues) S = sum(B[0..I-1]) (858 values) N in [0..35](24 values) I >= 0 (36 values) I <= N(363 values) B(96 values) All elements in [ ](784 values) sum(B) in [ ](95 values) B[0..I-1](887 values) All elements in [ ](784 values)

Dynamic Invariant Detection How does dynamic invariant generation work? 1.Run the program on a test set 2.Monitor the program execution 3.Look for potential properties that hold for all the executions

Dynamic Invariant Detection Instrument the program to write data trace files Run the program on a test set Offline invariant engine reads data trace files, checks for a collection of potential invariants (online version exists as well) InstrumentRun Detect Invariants Original Program Test set Data traces Invariants

Inferring Invariants There are two issues 1.Choosing which invariants to infer 2.Inferring the invariants Daikon infers invariants at specific program points, such as: –procedure entries –procedure exits –loop heads Daikon can only infer certain types of invariants –it has a library of invariant patterns –it can only infer invariants which match to these patterns

Trace Values Daikon supports two forms of data values –Scalar number, character, boolean –Sequence of scalars All trace values must be converted to one of these forms For example, an array A of tree nodes each with left and a right child would be converted into two arrays –A.left (containing the object IDs for the left children) –A.right

Invariant Patterns Invariants over any variable x (where a, b, c are computed constants) –Constant value: x = a –Uninitialized: x = uninit –Small value set: x  {a, b, c} variable takes a small set of values Invariants over a single numeric variable: –Range limits: x  a, x  b, a  x  b –Nonzero: x  0 –Modulus: x = a (mod b) –Nonmodulus: x  a (mod b) reported only if x mod b takes on every value other than a

Invariant Patterns Invariants over two numeric variables x, y –Linear relationship: y = ax + b –Ordering comparison: x  y, x  y, x  y, x  y, x = y, x  y –Functions: y = fn(x) or x = fn(y) where fn is absolute value, negation, bitwise complement –Invariants over x+y invariants over single numeric variable where x+y is substituted for the variable –Invariants over x-y

Invariant Patterns Invariants over three numeric variables –Linear relationship: z = ax + by + c, y=ax+bz+c, x=ay+bz+c –Functions z = fn(x,y) where fn is min, max, multiplication, and, or, greatest common divisor, comparison, exponentiation, floating point rounding, division, modulus, left and right shifts All permutations of x, y, z are tested (three permutations for symmetric functions, 6 permutations for asymmetric functions)

Invariant Patterns Invariants over a single sequence variable –Range: minimum and maximum sequence values (based on lexicographic ordering) –Element ordering: nondecreasing, nonincreasing, equal –Invariants over all sequence elements: such as each value in an array being nonnegative

Invariant Patterns Invariants over two sequence variables: x, y –Linear relationship: y = ax + b, elementwise –Comparison: x  y, x  y, x  y, x  y, x = y, x  y (based on lexicographic ordering) –Subsequence relationship: x is a subsequence of y –Reversal: x is the reverse of y Invariants over a sequence x and a numeric variable y –Membership: x  y

Inferring Invariants For each invariant pattern –determine the constants in the pattern –stop checking the invariants that are falsified For example, –For invariant pattern x  a we have to determine the constant a –For invariant pattern x = ay + bz + c we have to determine the constants a, b, c

Inferring Invariants Consider the invariant pattern: x = ay + bz + c Consider an example data trace for variables (x, y, z) (0,1,-7), (10,2,1), (10,1,3), (0, 0,-5), (3, 1, -4), (7, 1, 1),... Based on the first three values for x, y, z in the trace we can figure out the constants a, b, and c 0 = a -7b +c 10 = 2a + b +c 10 = a +3b + c If you solve these equations for a, b, c you get: a=2, b=1, c=5 The next two tuples (0, 0,-5), (3, 1, -4) confirm the invariant However the last trace value (7, 1, 1) kills this invariant –Hence, it is not checked for the remaining trace values and it is not reported as an invariant

Inferring Invariants Determining the constants for invariants are not too expensive –For example three linearly independent data values are sufficient for figuring out the constants in the pattern x = ay + bz + c –there are at most three constants in each invariant pattern Once the constants for the invariants are determined, checking that an invariant holds for each data value is not expensive –Just evaluate the expression and check for equality

Cost of Inferring Invariants The cost of inferring invariants increases as follows: –quadratic in the number of variables at a program point (linear in the number of invariants checked/discovered) –linear in the number of samples or values (test set size) –linear in the the number of program points Example: –For a trace with 10,000 calls, 70 variables, instrumented entry and exit, inference could take a few minutes per procedure

Invariant Confidence Not all unfalsified invariants should be reported There may be a lot of irrelevant invariants which may just reflect properties of the test set If a lot of spurious invariants are reported the output may become unreadable Improving (increasing) the test set would reduce the number of spurious invariants

Invariant Confidence For each detected invariant Daikon computes the probability that such a property would appear by chance in a random input –If that probability is smaller than a user specified confidence parameter, then the property is reported Daikon assumes a distribution and performs a statistical test –It checks the probability that the observed values for the detected invariant were generated by chance from the distribution –If that probability is very low, then the invariant is reported

Invariant Confidence As an example, consider an integer variable x which takes values between r/2 and -r/2-1 Assume that x  0 for all test cases If the values of x is uniformly distributed between r/2 and -r/2-1, then the probability that x is not 0 is 1 – 1/r Given s samples the probability x is never 0 is (1-1/r) s If this probability is less than a user defined confidence level then x  0 is reported as an invariant

Derived Variables Looking for invariants only on variables declared in the program may not be sufficient to detect all interesting invariants Daikon adds certain derived variables (which are actually expressions) and also detects invariants on these derived variables

Derived Variables Derived from any sequence s: –Length: size(s) –Extremal elements: s[0], s[1], s[size(s)-1], s[size(s)-2] Daikon uses s[-1] to denote s[size(s)-1] and s[-2] to denote s[size(s)-2] Derived from any numeric sequence s: –Sum: sum(s) –Minimum element: min(s) –Maximum element: max(s)

Derived Variables Derived from any sequence s and any numeric variable i –Element at the index: s[i], s[i-1] –Subsequences: s[0..i], s[0..i-1] Derived from function invocations: –Number of calls so far

Dynamically Detecting Invariants: Many Uses Useful reengineering tool –Redocumentation Can be used as a coverage criterion during testing –Are all the values in a range covered by the test set? Can be helpful in detecting bugs –Found bugs in an existing program in a case study Can be useful in maintaining invariants –Prevents introducing bugs, programmers are less likely to break existing invariants