50.530: Software Engineering Sun Jun SUTD. Week 10: Invariant Generation.

Slides:



Advertisements
Similar presentations
Exploiting SAT solvers in unbounded model checking
Advertisements

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Challenges in increasing tool support for programming K. Rustan M. Leino Microsoft Research, Redmond, WA, USA 23 Sep 2004 ICTAC Guiyang, Guizhou, PRC joint.
Program verification: flowchart programs Book: chapter 7.
1 Program verification: flowchart programs (Book: chapter 7)
Program verification: flowchart programs Book: chapter 7.
In this episode of The Verification Corner, Rustan Leino talks about Loop Invariants. He gives a brief summary of the theoretical foundations and shows.
Depth-First and Breadth-First Search CS 5010 Program Design Paradigms “Bootcamp” Lesson 9.2 TexPoint fonts used in EMF. Read the TexPoint manual before.
50.530: Software Engineering
SORTING Lecture 12B CS2110 – Spring InsertionSort 2 pre: b 0 b.length ? post: b 0 b.length sorted inv: or: b[0..i-1] is sorted b 0 i b.length sorted.
Semantics Static semantics Dynamic semantics attribute grammars
Copyright , Doron Peled and Cesare Tinelli. These notes are based on a set of lecture notes originally developed by Doron Peled at the University.
Reasoning About Code; Hoare Logic, continued
Hoare’s Correctness Triplets Dijkstra’s Predicate Transformers
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 11.
Hybrid Systems Presented by: Arnab De Anand S. An Intuitive Introduction to Hybrid Systems Discrete program with an analog environment. What does it mean?
50.530: Software Engineering Sun Jun SUTD. Week 9: Hoare Logic.
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 13.
ISBN Chapter 3 Describing Syntax and Semantics.
The Software Model Checker BLAST by Dirk Beyer, Thomas A. Henzinger, Ranjit Jhala and Rupak Majumdar Presented by Yunho Kim Provable Software Lab, KAIST.
Using Statically Computed Invariants Inside the Predicate Abstraction and Refinement Loop Himanshu Jain Franjo Ivančić Aarti Gupta Ilya Shlyakhter Chao.
Dynamic Invariant Discovery Modified from Tevfik Bultan’s original presentation.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented.
272: Software Engineering Fall 2008 Instructor: Tevfik Bultan Lecture 16: Dynamic Invariant Discovery.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael Ernst, Jake Cockrell, William Griswold, David Notkin Presented by.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Blind Search-Part 2 Ref: Chapter 2. Search Trees The search for a solution can be described by a tree - each node represents one state. The path from.
ESC Java. Static Analysis Spectrum Power Cost Type checking Data-flow analysis Model checking Program verification AutomatedManual ESC.
Describing Syntax and Semantics
Recursion Chapter 7. Chapter 7: Recursion2 Chapter Objectives To understand how to think recursively To learn how to trace a recursive method To learn.
DAST 2005 Tirgul 6 Heaps Induction. DAST 2005 Heaps A binary heap is a nearly complete binary tree stored in an array object In a max heap, the value.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
50.530: Software Engineering
Domain testing Tor Stålhane. Domain testing revisited We have earlier looked at domain testing as a simple strategy for selecting test cases. We will.
By: Pashootan Vaezipoor Path Invariant Simon Fraser University – Spring 09.
1 CS October 2008 The while loop and assertions Read chapter 7 on loops. The lectures on the ProgramLive CD can be a big help. Quotes for the Day:
Reading and Writing Mathematical Proofs
1 Program Correctness CIS 375 Bruce R. Maxim UM-Dearborn.
Recursion Chapter 7. Chapter Objectives  To understand how to think recursively  To learn how to trace a recursive method  To learn how to write recursive.
Chapter 3 (Part 3): Mathematical Reasoning, Induction & Recursion  Recursive Algorithms (3.5)  Program Correctness (3.6)
Exercise Solutions 2014 Fall Term. Week 2: Exercise 1 public static Boolean repOK(Stack mystack) { if (mystack.capacity() < 0) { return false;
1 Inference Rules and Proofs (Z); Program Specification and Verification Inference Rules and Proofs (Z); Program Specification and Verification.
The Daikon system for dynamic detection of likely invariants MIT Computer Science and Artificial Intelligence Lab. 16 January 2007 Presented by Chervet.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
CS 363 Comparative Programming Languages Semantics.
Program Correctness. 2 Program Verification An object is a finite state machine: –Its attribute values are its state. –Its methods optionally: Transition.
Symbolic Execution with Abstract Subsumption Checking Saswat Anand College of Computing, Georgia Institute of Technology Corina Păsăreanu QSS, NASA Ames.
Semantics In Text: Chapter 3.
Symbolic and Concolic Execution of Programs Information Security, CS 526 Omar Chowdhury 10/7/2015Information Security, CS 5261.
SMT and Its Application in Software Verification (Part II) Yu-Fang Chen IIS, Academia Sinica Based on the slides of Barrett, Sanjit, Kroening, Rummer,
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
CS357 Lecture 13: Symbolic model checking without BDDs Alex Aiken David Dill 1.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
1 Section 8.2 Program Correctness (for imperative programs) A theory of program correctness needs wffs, axioms, and inference rules. Wffs (called Hoare.
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
C HAPTER 3 Describing Syntax and Semantics. D YNAMIC S EMANTICS Describing syntax is relatively simple There is no single widely acceptable notation or.
1 CS1110 Lecture 16, 26 Oct 2010 While-loops Reading for next time: Ch (arrays) Prelim 2: Tu Nov 9 th, 7:30-9pm. Last name A-Lewis: Olin 155 Last.
1 Alan Mishchenko Research Update June-September 2008.
CORRECTNESS ISSUES AND LOOP INVARIANTS Lecture 8 CS2110 – Fall 2014.
Axiomatic Verification II Prepared by Stephen M. Thebaut, Ph.D. University of Florida Software Testing and Verification Lecture Notes 18.
CS October 2008 The while loop and assertions
Axiomatic Verification II
Semantics In Text: Chapter 3.
Axiomatic Verification II
The Zoo of Software Security Techniques
Predicate Abstraction
50.530: Software Engineering
COP4020 Programming Languages
Pointer analysis John Rollinson & Kaiyuan Li
Presentation transcript:

50.530: Software Engineering Sun Jun SUTD

Week 10: Invariant Generation

Problem {pre}while B do program{post} if there exists an invariant inv such that the following are satisfied: (1) pre => inv (2) {inv && B} program {inv} (3) inv && !B => post and the loop terminates. How do we find inv so as to complete the proof?

inv Big View pre pre => inv

inv Big View B!B post inv && !B => post pre

inv Big View B!B post {inv && B}program{inv} pre one iteration

Static/Dynamic Analysis Static analysis: infer (loop) invariants based on source code without executing the program (treating programs a mathematical formula) Dynamic analysis: infer (loop) invariants based on testing results. – It’s about learning something about the invariants and making guesses!

Exercise 1 x = 0.1; y = 0; while (x < 2) { k = 4 – x*x; y = sqrt(4-k); x += 0.001; } if (y < 0) { error(); } Show that the error is not occurring.

DYNAMICALLY DISCOVERING LIKELY PROGRAM INVARIANTS TO SUPPORT PROGRAM EVOLUTION Ernst et al. IEEE Transactions on Software Engineering 2001

The Approach Seem familiar?

Instrumentation Instrument at the beginning/end of each method and the start of loops. Daikon only supports two forms of data: scalar numbers (including characters and Booleans) and sequence of scalars; Convert other values into one of these forms.

Example: Instrumentation public int sumUp (int[] B, int N) { int i = 0; int s = 0; while (i != N) { i = i+1; s = s +B[i] } return s; } public int sumUp (int[] B, int N) { //add code to output values int i = 0; int s = 0; while ( i != N) { //add code to output values i = i+1; s = s +B[i]; } //add code to output values return s; }

Example: Testing 100 randomly-generated arrays of length 7 to 13, in which each element was a random number in the range of -100 to 100. The following s is the learned pre-condition.

Example: Testing 100 randomly-generated arrays of length 7 to 13, in which each element was a random number in the range of -100 to 100. The following s is the learned post-condition.

Example: Testing 100 randomly-generated arrays of length 7 to 13, in which each element was a random number in the range of -100 to 100. The following loop invariants are learned.

Discussion What invariants should we infer?

What Invariants to Infer? Invariants over any variables – Constant value, e.g., x = a; – Uninitialized, e.g., x = uninit; Invariants over a single numeric variable – Range limit, e.g., x >= a, x <= b, a <= x <= b – Nonzero, e.g., x != 0 – Modulus, e.g., x mod b = a – Non-modulus, e.g., x mod b != a

What Invariants to Infer? Invariants over two numeric variables – Linear relationship, e.g., y = ax+b – Ordering comparison: x y, x >= y, x = y, x != y – Functions, e.g., y = fn(x) or x = fn(y) where fn is one of Python’s built-in unary functions like absolute values, negation, etc. – Invariants over x+y: any invariant from the list of invariants over a single numeric variable, such as (x+y) mod b = a – Invariants over x-y: as for x+y;

What Invariants to Infer? Invariants over three numeric variables – Linear relationship, e.g., z = ax+by+c – Functions, e.g., z = fn(x, y) or x = fn(y) where fn is one of Python’s built-in binary functions like min, max, GCD, and, or, etc. How about four variables and more?

What Invariants to Infer? Invariants over a single sequence variable – Range: minimum and maximum sequence values, ordered lexicographically; for instance, this can indicate the range of string or array values – Element ordering: whether the elements of each sequence are non-decreasing, non-increasing, or equal – Invariants over all the sequence elements (treated as a single large collection)

What Invariants to Infer? Invariants over two sequence variables – Linear relationship: y = ax + b, element-wise – Comparison: x y, x >= y, x = y, x != y, perform lexicographically – Subsequence relationship: x is a subsequence of y or vice versa – Reversal: x is the reverse of y Invariants over a sequence and a numeric variable – Membership: i in s

What Invariants to Infer? Derived variables – Derived from any sequence s Length: size(s) Extremal elements: s[0], s[1], s[size(s)-1], s[size(s)-2] – Derived from any numeric sequence s sum: sum(s) Minimum elements: min(s) Maximum elements: max(s) – Derived from any sequence s and any numeric variable i Element at the index: s[i], s[i-1] Subsequences: s[0..i], s[0..i-1] – Derived from function invocations: number of calls so far

Algorithm Collect samples at a program point (through instrumentation and testing) For all variables, test every potential invariant (defined above) Remove an invariant if it is violated by a sample.

Exercise 2 int inc(int *x, int y) { *x += y; return *x; } Given the program and the collected data, what are the invariants?

Filtering Invariants Too many potentially invariants could discourage programmers from looking through them. A better test suite could help. Daikon filters invariants by computing an invariant confidence: assume a random input, what is the chance of the invariant would appear?

Invariant Confidence: Example A range for numeric ranges like x in [ ] are reported only if the limits appear to be non-coincidental: if several values near the extremes all appear about as often as would be expected (assuming uniform distribution).

Invariant Confidence: Example Suppose the reported value for variable x fall in a range of size r that includes 0 Suppose that x != 0 holds for all test cases The confidence of x != 0 is: (1-1/r)^n where n is the number of samples If the confidence is less than a user-defined threshold, then x != 0 is discarded.

Scalability Daikon’s invariant detection time is Potentially cubic in the number of variables in scope at a program point (not the total number of variables in the program) Linear in the number of samples (the number of times a program point is executed) Linear in the number of instrumented program points.

Case Study: Invariant Stability Warming: One program!

Case Study: Invariant Stability Conclusion: Stable?

More Invariants, Better Programs? Experiment setup – 424 student programs from a single assignment for CSE 142 at University of Washington – The quality of the programs is measured by their scores. – Invariant detection was performed over 200 executions of each program, resulting in 3 to 28 invariants per program. Conclusion: No co-relation

Discussion For invariant generation, shall we use random test case generation or systematic test case generation? How do we measure the usefulness of the generated invariants? How do we test whether a generated invariant is really a loop invariant? How do we identify the useful templates for invariants? Can we discover disjunctive invariants?

UNBOUNDED SYMBOLIC EXECUTION FOR PROGRAM VERIFICATION Jaffar et al. RV’11

Motivation Symbolic execution doesn’t handle loops well: path explosion Loop invariants are essential to handle loops. Idea: learn loop invariant through symbolic execution

Iterative Deepening Step 1: execute path L0,1,4,5 symbolically x = 0 &&//from L0 x >= n && //from L1 x < 0 //from L4 Interpolant at L4: x >= 0 L0 x = 0; L1 while (x < n) { L2 x++; L3 } L4 if (x < 0) { L5 error(); L6 }

Iterative Deepening Step 2: check if x >= 0 is a loop invariant by checking whether the following is satisfiable. x >= 0 && x < n && x1 = x+1 && x1 < 0 No! Thus x >= 0 is a loop invariant. Complete the proof with Hoare logic rules. L0 x = 0; L1 while (x < n) { L2 x++; L3 } L4 if (x < 0) { L5 error(); L6 }

Another Look Initially, L0 x = 0; L1 while (x < n) { L2 x++; L3 } L4 if (x < 0) { L5 error(); L6 } L0 L1 L2 L3 L4 error x>=n x<0 x<n

Another Look With the loop invariant, L0 x = 0; L1 while (x < n) { L2 x++; L3 } L4 if (x < 0) { L5 error(); L6 } L0 L1 L2 L3 L4 error x>=n x<0 x<n x>=0 This serves as a proof that error is not reachable. Finding a loop invariant is to find this label at this a loop head! This serves as a proof that error is not reachable. Finding a loop invariant is to find this label at this a loop head!

Iterative Deepening L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error(); Is error happening? L0 L1 L2 L3 L6 error new=old lock=0 new!=old L4 L5 What label shall we generate at L1?

Iterative Deepening Step 1: execute path L0,1,6,7 symbolically lock=0&&new=old+1&& //from L0 new==old && //from L1 lock==0 //from L6 Interpolant at L6: lock!=0 Is lock!=0 an invariant during the loop? L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error();

Iterative Deepening Step 1: execute path L0,1,6,7 symbolically lock=0&&new=old+1&& //from L0 new=old && //from L1 lock==0 //from L6 What is the interpolant at L1? That is, A is lock=0&&new=old+1 B is new=old&&lock=0 L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error();

Ideal Case The interpolant at L1 is new!=old || lock != 0 Exercise: Is this a loop invariant strong enough to prove that error is not possible? L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error(); Recall existing techniques only return conjunctive interpolants. The interpolant at L1 thus may be either new!=old or lock!=0, neither of which is a loop invariant.

Iterative Deepening L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error(); Step 2: execute path L0,1,2,3,5,1,6,7 symbolically lock=0&&new=old+1&& //from L0 new!=old && //from L1 lock1=1&old1=new && //from L2 new=old1&&//from L1 lock1==0//from L6 Interpolant at L1?

Iterative Deepening L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error(); Step 2: execute path L0,1,2,3,4,5,1,6,7 symbolically lock=0&&new=old+1&& //from L0 new!=old && //from L1 lock1=1&old1=new && //from L2 lock2=0&new1=new+1 && //from L2 new=old1&&//from L1 lock1==0//from L6 Interpolant at L1? It doesn’t help to execute more iterations

Alternative Approach L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error(); L0 L2 L1 L3 L5 L1’ lock=0&&new=old+1 lock=1&&old=new Assume there is a label Inv at L1 which is a loop invariant; The following is true. lock=0&&new=old+1 => Inv lock=1&&old=new => Inv

Alternative Approach L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error(); L0 L2 L1 L3 L5 L1’ lock=0&&new=old+1 lock=1&&old=new L4 L5L1’ lock=0&&new=old+1 Ideally, we let Inv be (lock=0&&new=old+1) || (lock=1&&old=new) || (lock=0&&new=old+1) Exercise: check if Inv is indeed a loop invariant.

Invariant Validation L0 L1 L2 L3 L6 error new=old lock=0 new!=old L4 L5 (lock=0&&new=old+1) || (lock=1&&old=new) Since it is a loop invariant, we can label L1 now. Is it strong enough?

An Ideal Algorithm Identify paths which end at the loop head for the first time. Test if the disjunction of the path conditions is a loop invariant strong enough for the proof If positive, terminate Otherwise, identify paths which end at the loop head for the second time. …

Discussion int i = 0; while (i < 1000) { i++; } First time: i = 0; Second time: i = 1; Third time: i = 2; … How do we make the jump to i <= 1000?

Another Look at Daikon L0 L1 L2 L3 L6 error new=old lock=0 new!=old L4 L5 {(lock=0,old=*, new=*+1), (lock=1,old=*+1, new=*+1), …} Pre-defined abstraction lock=0 new=old new=old+1 Can Daikon find the right invariant in this case?

New Approach: USE Step 1: execute symbolically L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error(); L0 L6 L1 L7

New Approach: USE Step 2: Compute interpolant L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error(); L0 L6 L1 L7 lock!=0

New Approach: USE Step 3: Label loop head L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error(); L0 L6 L1 L7 {lock=0, new=old+1} lock!=0

New Approach: USE Step 4: abstract loop head labels based on the new condition. The loop head L1 is visited with a different path with a new condition. Abstract the labels on L1 so that it is implied by the new condition. L0 L2 L1 L3 L5 L1’ lock=0&&new=old+1 lock=1&&old=new

New Approach: USE L0 L2 L1 L3 L5 L1’ lock=0&&new=old+1 lock=1&&old=new true Step 4: abstract loop head labels based on the new condition. Remove labels at L1 until the conjunction of the remaining labels is implied by the new condition Do we need to continue from L1’ given now it is stronger than an ancestor L1?

New Approach: USE L0 L2 L1 L3 L4 L5 lock=0&&new=old+1 true Step 5: execute symbolically Since lock=0&&new=old+1 (at L1’) implies true (at L1). We stop. L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error(); L1’

USE: First Abstraction L0 L1 L2 L3 L6 error new=old lock=0 new!=old L4 L5 true Is error reachable or not based on this abstraction? Is this abstraction safe or not? It is safe iff error is not reachable if it is not reachable based on this abstraction.

USE: Checking Run DFS/BFS algorithm on this graph shows that error is reachable. L0 -> L1 -> L6 -> error A counterexample based on the abstraction might not be a real counterexample! L0 L1 L2 L3 L6 error new=old lock=0 new!=old L4 L5 true

USE: Spuriousness Checking Run DFS/BFS algorithm on this graph shows that error is reachable. L0 -> L1 -> L6 -> error Symbolically execute the above path and conclude that it is spurious. L0 L1 L2 L3 L6 error new=old lock=0 new!=old L4 L5 true Why it is spurious?

USE: Refinement The path L0,L1,L6,error is spurious. One (or more) loop head in this path must be too abstract. Find an interpolant at the loop head (L1) L0 L1 L2 L3 L6 error new=old lock=0 new!=old L4 L5 true lock=0&&new=old+1&& new=old && lock=0 Assume the interpolant found at L1 is: new!=old lock=0&&new=old+1&& new=old && lock=0 Assume the interpolant found at L1 is: new!=old

USE: Refinement The path L0,L1,L6,error is spurious. One (or more) loop head in this path must be too abstract. Find an interpolant at the loop head (L1) L0 L6 error new=old lock=0 new!=old

USE: Re-explore Since the label at L1 has changed, we need to re- explore. This time, we can’t remove the label at L1. We continue instead. L0 L2 L1 L3 L5 L1’ new!=old lock=1&&old=new

USE: Re-explore Continue with L6, symbolic execution proves that it is not possible. L0 L2 L1 L3 L5 L1’ new!=old lock=1&&old=new L6 L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error();

USE: Re-explore Backtrack to L1’ and continue with L2, symbolic execution shows it is not feasible. L0 L2 L1 L3 L5 L1’ new!=old lock=1&&old=new L2’ L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error();

USE: Re-Explore L0 L2 L1 L3 L4 L5 lock=0&&new=old+1 Backtrack to L3, continue with L4,L5,L1. We can stop at L1’ because lock=0&&new=old+1 implies new!=old. L0 lock=0;new=old+1 L1 while (new!=old) { L2 lock=1;old=new; L3 if (*) { L4 lock=0;new++;} L5 }; L6 if (lock==0) L7 error(); L1’ new!=old

Recap: the USE Approach L0 L1 L2 L3 new!=old L4 L5 new!=old L1 L6 L2 L5 L1 subsumed by new!=old

Recap: the USE Approach This approach acknowledges the difficulty in finding (disjunctive) loop invariants and compensates it with a combination of state space exploring and abstraction-refinement.

Case Study Iterative Deepening New Approach

Exercise 3 The path L0,L1,L6,error is spurious. One (or more) loop head in this path must be too abstract. Find an interpolant at the loop head (L1) L0 L1 L2 L3 L6 error new=old lock=0 new!=old L4 L5 lock=0&&new=old+1&& new=old && lock=0 What if the interpolant at L1 is: new=old+1? lock=0&&new=old+1&& new=old && lock=0 What if the interpolant at L1 is: new=old+1?