Abstraction of Source Code

Slides:



Advertisements
Similar presentations
Technologies for finding errors in object-oriented software K. Rustan M. Leino Microsoft Research, Redmond, WA Lecture 1 Summer school on Formal Models.
Advertisements

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Abstraction of Source Code (from Bandera lectures and talks)
Semantics Static semantics Dynamic semantics attribute grammars
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Hoare’s Correctness Triplets Dijkstra’s Predicate Transformers
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 13.
1 Temporal Claims A temporal claim is defined in Promela by the syntax: never { … body … } never is a keyword, like proctype. The body is the same as for.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Axiomatic Semantics.
ISBN Chapter 3 Describing Syntax and Semantics.
1 Semantic Description of Programming languages. 2 Static versus Dynamic Semantics n Static Semantics represents legal forms of programs that cannot be.
1/22 Programs : Semantics and Verification Charngki PSWLAB Programs: Semantics and Verification Mordechai Ben-Ari Mathematical Logic for Computer.
CS 355 – Programming Languages
1 Basic abstract interpretation theory. 2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
ESC Java. Static Analysis Spectrum Power Cost Type checking Data-flow analysis Model checking Program verification AutomatedManual ESC.
CS 267: Automated Verification Lectures 14: Predicate Abstraction, Counter- Example Guided Abstraction Refinement, Abstract Interpretation Instructor:
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
Describing Syntax and Semantics
Floyd Hoare Logic. Semantics A programming language specification consists of a syntactic description and a semantic description. Syntactic description:symbols.
Propositional Calculus Math Foundations of Computer Science.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu, October, 2001 Thesis Committee: Matthew Dwyer, Major Advisor David.
Systems Architecture I1 Propositional Calculus Objective: To provide students with the concepts and techniques from propositional calculus so that they.
Reading and Writing Mathematical Proofs
1 Automatic Refinement and Vacuity Detection for Symbolic Trajectory Evaluation Orna Grumberg Technion Haifa, Israel Joint work with Rachel Tzoref.
1 Inference Rules and Proofs (Z); Program Specification and Verification Inference Rules and Proofs (Z); Program Specification and Verification.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
CS 363 Comparative Programming Languages Semantics.
1 Bisimulations as a Technique for State Space Reductions.
Finding Feasible Counter-examples when Model Checking Abstracted Java Programs Corina S. Pasareanu, Matthew B. Dwyer (Kansas State University) and Willem.
Propositional Calculus CS 270: Mathematical Foundations of Computer Science Jeremy Johnson.
Chapter 3 Part II Describing Syntax and Semantics.
COP4020 Programming Languages Introduction to Axiomatic Semantics Prof. Robert van Engelen.
Tool-supported Program Abstraction for Finite-state Verification Matthew Dwyer 1, John Hatcliff 1, Corina Pasareanu 1, Robby 1, Roby Joehanes 1, Shawn.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
CS357 Lecture 13: Symbolic model checking without BDDs Alex Aiken David Dill 1.
Tool-supported Program Abstraction for Finite-state Verification Matthew Dwyer 1, John Hatcliff 1, Corina Pasareanu 1, Robby 1, Roby Joehanes 1, Shawn.
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
C HAPTER 3 Describing Syntax and Semantics. D YNAMIC S EMANTICS Describing syntax is relatively simple There is no single widely acceptable notation or.
Boolean Programs: A Model and Process For Software Analysis By Thomas Ball and Sriram K. Rajamani Microsoft technical paper MSR-TR Presented by.
Abstraction and Abstract Interpretation. Abstraction (a simplified view) Abstraction is an effective tool in verification Given a transition system, we.
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Model Checking Java Programs (Java PathFinder)
Abstraction Data type based abstractions
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Reasoning About Code.
Reasoning about code CSE 331 University of Washington.
Symbolic Implementation of the Best Transformer
Design by Contract Fall 2016 Version.
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Propositional Calculus: Boolean Algebra and Simplification
Lecture 5 Floyd-Hoare Style Verification
Axiomatic semantics Points to discuss: The assignment statement
Programming Languages and Compilers (CS 421)
Programming Languages 2nd edition Tucker and Noonan
Slides by Steve Armstrong LeTourneau University Longview, TX
Over-Approximating Boolean Programs with Unbounded Thread Creation
Semantics In Text: Chapter 3.
Information Security CS 526
Predicate Transformers
Functional Program Verification
CS 240 – Lecture 7 Boolean Operations, Increment and Decrement Operators, Constant Types, enum Types, Precedence.
This Lecture Substitution model
Abstraction, Verification & Refinement
The Zoo of Software Security Techniques
Program correctness Axiomatic semantics
Predicate Abstraction
Programming Languages and Compilers (CS 421)
Programming Languages 2nd edition Tucker and Noonan
COP4020 Programming Languages
Presentation transcript:

Abstraction of Source Code (from Bandera lectures and talks)

Abstraction: the key to scaling up Original system property P symbolic state Abstract system property P’ represents a set of states abstraction Of course, most people recognize that abstraction is the key to scaling fsv tech. To reasoning about realistic programs. Abstraction involves creation of a new sys with fewer states, were each state in the abs.sys. Is like a “symbolic” state that actually represents multiple states. A symbolic state may even repres.an infinite number of state. When building abstractions, we want them to be safe. Safety allows us to infer that if a property is true for the abs system then the prop. Is true for the orig system. However, because of over-approx., the ability to disprove properties is in general sacrificed. Safety: The set of behaviors of the abstract system over-approximates the set of behaviors of the original system

Data Abstraction Data Abstraction Abstraction proceeds component-wise, where variables are components Even Odd …, -3, -1, 1, 3, … …, -2, 0, 2, 4, … x:int 1, 2, 3, … …, -3, -2, -1 Pos Neg Zero y:int

Data Type Abstraction Collapses data domains via abstract interpretation: Code Data domains int int x = 0; if (x == 0) x = x + 1; (n<0) : NEG (n==0): ZERO (n>0) : POS Signs NEG POS ZERO Signs x = ZERO; if (Signs.eq(x,ZERO)) x = Signs.add(x,POS); The technique can be illustrated on this fragment of code; assume we would like to keep only enough info about x to allow the conditional test to be decided. So we only need to know if x is zero or not for this we could use the signs abstraction that maps integers onto a set of 3 symbolic values:neg,zero,pos we transform the code so that to operate on the abs domain and it looks like this; here the concrete type int was replaced by abs type signs, concrete constants 0 and 1 were replaced with abs ct 0 and pos; and primitive ops on ints were replaced with calls to some methods that implement the abs.ops.that manipulate abstract values. Ex : equality operator was replaced with a call to method signs.eq and + was replace by signs.add . So, how do we apply this abstraction technique to the DEOS example ? We have to decide which variables to abstract, what abstrations to use and then we have to effectively transform the system to encode the abstractions.

Hypothesis Abstraction of data domains is necessary Automated support for Defining abstract domains (and operators) Selecting abstractions for program components Generating abstract program models Interpreting abstract counter-examples will make it possible to Scale property verification to realistic systems Ensure the safety of the verification process The DEOS experience makes it clear that abstraction is necessary for model checking of realistic Java programs. Furthermore,this experience also illustrates the need for several different forms of tool support to achieve abstraction of large programs Link to next slide … Now that you’ve seen the basic steps in abstracting a program in Bandera I’ll give an overview of the abstraction components starting with the user’s interface …

Definition of Abstractions in BASL operator + add begin (NEG , NEG) -> {NEG} ; (NEG , ZERO) -> {NEG} ; (ZERO, NEG) -> {NEG} ; (ZERO, ZERO) -> {ZERO} ; (ZERO, POS) -> {POS} ; (POS , ZERO) -> {POS} ; (POS , POS) -> {POS} ; (_,_) -> {NEG,ZERO,POS}; /* case (POS,NEG),(NEG,POS) */ end abstraction Signs abstracts int begin TOKENS = { NEG, ZERO, POS }; abstract(n) n < 0 -> {NEG}; n == 0 -> {ZERO}; n > 0 -> {POS}; end Automatic Generation Example: Start safe, then refine: +(NEG,NEG)={NEG,ZERO,POS} In BASL, we can define AI’s. an AI is a collection of 3 components : a domain of abs.values, an abstract function and a collection of abstract op., one for each primitive operation in the program. -For example, this is the BASL representation of the signs AI, where the abstract domain is the powerset of the set of tokens. -The abstract ops.are automatically generated using the PVS theorem prover, for example this is the abstract version of +. -Note that abstract operations can return more than one token.This is because by abstraction, we loose info about abs.values : you will see on the next slide, that imprecision is MODELED as a non-det. choice over the values in the set. -So now, how do we use the theorem prover. For ex, this is how we generate the def of abs.op + applied to 2 NEG values. We start whith set {NEG, ZERO, POS} which is safe because it covers all the possible results of adding 2 negative numbers. -We try to eliminate from the set the tokens that we can prove are not possible. We submit to PVS 3 implications : one for each token in the def. Here predicate NEG? ...POS? and ZERO? are defined similarly. -So we ask PVS : is it true that by adding 2 neg numbers we can not get a pos. number, and PVS answers yes, so we eliminate POS from the definition,…. -> same technique for all the cases! Forall n1,n2: neg?(n1) and neg?(n2) implies not pos?(n1+n2) Forall n1,n2: neg?(n1) and neg?(n2) implies not zero?(n1+n2) Forall n1,n2: neg?(n1) and neg?(n2) implies not neg?(n1+n2) Proof obligations submitted to PVS...

Compiling BASL Definitions abstraction Signs abstracts int begin TOKENS = { NEG, ZERO, POS }; abstract(n) n < 0 -> {NEG}; n == 0 -> {ZERO}; n > 0 -> {POS}; end operator + add (NEG , NEG) -> {NEG} ; (NEG , ZERO) -> {NEG} ; (ZERO, NEG) -> {NEG} ; (ZERO, ZERO) -> {ZERO} ; (ZERO, POS) -> {POS} ; (POS , ZERO) -> {POS} ; (POS , POS) -> {POS} ; (_,_)-> {NEG, ZERO, POS}; /* case (POS,NEG), (NEG,POS) */ public class Signs { public static final int NEG = 0; // mask 1 public static final int ZERO = 1; // mask 2 public static final int POS = 2; // mask 4 public static int abs(int n) { if (n < 0) return NEG; if (n == 0) return ZERO; if (n > 0) return POS; } public static int add(int arg1, int arg2) { if (arg1==NEG && arg2==NEG) return NEG; if (arg1==NEG && arg2==ZERO) return NEG; if (arg1==ZERO && arg2==NEG) return NEG; if (arg1==ZERO && arg2==ZERO) return ZERO; if (arg1==ZERO && arg2==POS) return POS; if (arg1==POS && arg2==ZERO) return POS; if (arg1==POS && arg2==POS) return POS; return Bandera.choose(7); /* case (POS,NEG), (NEG,POS) */ Compiled BASL specification are compiled into Java classes For example, this is the Java class for the signs abstraction. Abstract tokens are implemented as integer values. The abstract function is straightforward. And this is the definition of method add that implements the abstract version of + -> we use a special method Bandera.choose the model the imprecision introduced by the abs. -> For ex, Bandera.choose (7) will be interpreted in subsequent verification tools (i.e. JPF) as a non-deterministic choice between the tokens encoded in the bit-vector 7, that is between NEG, ZERO, POS.

Interpreting Results For an abstracted program, a counter-example may be infeasible because: Over-approximation introduced by abstraction Example: x = -2; if(x + 2 == 0) then ... x = NEG; if(Signs.eq(Signs.add(x,POS),ZERO)) then ... {NEG,ZERO,POS} -> abs.program has more behaviors than the original program -> a counter-ex may be reported for a behavior in the abstract system not present in the original system We developed a technique that searches for feasible counter-ex, during verification of the abstract system.

Choose-free state space search Theorem [Saidi:SAS’00] Every path in the abstracted program where all assignments are deterministic is a path in the concrete program. Bias the model checker to look only at paths that do not include instructions that introduce non-determinism JPF model checker modified to detect non-deterministic choice (i.e. calls to Bandera.choose()); backtrack from those points

Choice-bounded Search State space searched Undetectable Violation Detectable Violation choose() Verif. on choose free paths will report this path…. But not this paths….. That may correspond to a feasible execution and indicates a real defect X X

Counter-example guided simulation Use abstract counter-example to guide simulation of concrete program Why it works: Correspondence between concrete and abstracted program Unique initial concrete state

Example of Abstracted Code Java Program: class App{ public static void main(…) { [1] new AThread().start(); … [2] int i=0; [3] while(i<2) { [4] assert(!Global.done); [5] i++; }}} class Athread extends Thread { public void run() { [6] Global.done=true; }} Abstracted Program: class App{ public static void main(…) { [1] new AThread().start(); … [2] int i=Signs.ZERO; [3] while(Signs.lt(i,signs.POS)){ [4] assert(!Global.done); [5] i=Signs.add(i,Signs.POS); }}} class Athread extends Thread { public void run() { [6] Global.done=true; }} Nondeterminism! i=zero Problem: I is abstracted with Signs, the loop condition is abstracted, infeasible counter-examples because the loop gets executed more than 2 times Choose-free counter-example: 1 - 2 - 6 - 3 - 4

Example of Abstracted Code Java Program: class App{ public static void main(…) { [1] new AThread().start(); … [2] int i=0; [3] while(i<2) { [4] assert(!Global.done); [5] i++; }}} class Athread extends Thread { public void run() { [6] Global.done=true; }} Abstracted Program: class App{ public static void main(…) { [1] new AThread().start(); … [2] int i=Signs.ZERO; [3] while(Signs.lt(i,signs.POS)){ [4] assert(!Global.done); [5] i=Signs.add(i,Signs.POS); }}} class Athread extends Thread { public void run() { [6] Global.done=true; }} i=zero i=0 i=pos i=1 i=pos i=2 i=zero i=0 i=pos i=1 i=zero i=0 i=pos i=2 i=pos i=1 Abstract counter-example: 1 - 2 - 3 - 4 - 5 - 3 - 4 - 5 - 3 - 6 - 4 Mismatch

Hybrid Approach Program & Property Abstraction Refine selections Guided Simulation Mismatch Abstract Program & Property counter-example Abstract Property false! (counter-example) Property true! Model Check Choose-free Model Check

Property Abstraction Property System Model Property Abstraction (under-approximation) Program Abstraction (over-approximation) Property System Model If the abstract property holds on the abstract system, then the original property holds on the original system

Property Abstraction Properties are temporal logic formulas, written in negational normal form. Abstract propositions under-approximate the truth of concrete propositions. Examples: Invariance property: Abstracted to: (x > -1) ((x = zero)  (x=pos)) (x > -2) ((x = zero)  (x=pos))

Predicate Abstraction Use a boolean variable to hold the value of an associated predicate that expresses a relationship between variables true false (1, 2) (0, 0) (1, 1) (-1, -1) (-1, 3) (3, 2) … int * int predicate: x = y

An Example x and y are unbounded Init: x := 0; y := 0; z := 1; goto Body; Body: assert (z = 1); x := (x + 1); y := (y + 1); if (x = y) then Z1 else Z0; Z1: z := 1; Z0: z := 0; x and y are unbounded Data abstraction does not work in this case --- abstracting component-wise (per variable) cannot maintain the relationship between x and y We will use predicate abstraction in this example

Predicate Abstraction Process Add boolean variables to your program to represent current state of particular predicates E.g., add a boolean variable [x=y] to represent whether the condition x=y holds or not These boolean variables are updated whenever program statements update variables mentioned in predicates E.g., add updates to [x=y] whenever x or y or assigned

An Example Init: x := 0; y := 0; z := 1; goto Body; Body: assert (z = 1); x := (x + 1); y := (y + 1); if (x = y) then Z1 else Z0; Z1: z := 1; Z0: z := 0; We will use the predicates listed below, and remove variables x and y since they are unbounded. Don’t worry too much yet about how we arrive at this particular set of predicates; we will talk a little bit about that later Predicates Boolean Variables p1: (x = 0) p2: (y = 0) p3: (x = (y + 1)) p4: (x = y) b1: [(x = 0)] b2: [(y = 0)] b3: [(x = (y + 1))] b4: [(x = y)] This is our new syntax for representing boolean variables that helps make the correspondence to the predicates clear

Transforming Programs An example of how to transform an assignment statement Predicates Assignment Statement [(x=0)] := true; [(x=(y+1))] := if [$(y=0)] then false else top; [(x=y)] := if [$(y = 0)] then true else if ![$(y=0)] then false Where: [$P] = prev. value of [P] top is a non-deterministic choice between true and false The statement to the left is replaced the statements below x := 0; [(x = 0)] [(y = 0)] [(x = (y + 1))] [(x = y)] [(x=0)] := true; [(x=y)] := H([$(y=0)], ![$(y=0)]); [(x=(y+1))] := H(false, [$(y=0)]); Where: true, if e1 H (e, e2) = false, if e2 top, otherwise { Make a more compact representation using a helper function H (following SLAM notation)

State Simulation (n2,[x ! 2, y ! 2, z ! 0]) (n2,[ [x=0] ! False, Given a program abstracted by predicates E1, …, En, an abstract state simulates a concrete state if Ei holds on the concrete state iff the boolean variable [Ei] is true and remaining concrete vars and control points agree. Concrete Abstract (n2,[x ! 2, y ! 2, z ! 0]) simulates (n2,[ [x=0] ! False, [y=0] ! False, [x=(y+1)] ! False, [x=y] ! True, z ! 0]) (n2,[x ! 3, y ! 3, z ! 0]) (n2,[x ! 1, y ! 0, z ! 1]) simulates (n2,[ [x=0] ! False, [y=0] ! True, [x=(y+1)] ! True, [x=y] ! False, z ! 1]) (n2,[x ! 3, y ! 3, z ! 1]) does not simulate

Computing Abstracted Programs Truth values of predicates on s1. (n1,s1) (n2,s2) simulates (n1, [[P1], [P2], …, [Pn]]) Truth values of predicates on s1. simulates (n2, [[P1]’, [P2]’, …, [Pn]’]) For each statement s in the original program, we need to compute a new statement that describes how the given predicates change in value due to the effects of s. To do this for a given predicate Pi, we need to know if we should set [Pi] to true or false in the abstract target state. Thus, we need to know the conditions at (n1, s1) that guarantee that [Pi] will be true in the target state and the conditions that guarantee that [Pi] will be false in the target state. These conditions will be used in the helper function H. true, if e1 [P_i] := H (e, e2) = false, if e2 top, otherwise { Conditions that make [Pi] true. Conditions that make [Pi] false.

Computing Abstracted Programs Example (n1,s1) (n2,s2) (n1, [[P1], [P2], …, [Pn]]) a ´ x := 0; [x=y] := H(…?…, …?…) (n2, [[P1]’, [P2]’, …, [Pn]’]) What conditions have to hold before a is executed to guarantee that x=y is true (false) after a is executed? Note: we want the least restrictive conditions The technical term for what we want is the “weakest pre-condition of a with respect to x=y” Let’s take a little detour to learn about weakest preconditions.

Floyd-Hoare Triples {F1} C {F2} Command {F1} C {F2} Pre-condition (boolean formula) Post-condition (boolean formula) A triple is a logical judgement that holds when the following condition is met: For all states s that satisfies F1 (I.e., s ² F1), if executing C on s terminates with a state s’, then s’ ² F2.

Weaker/Stronger Formulas If F’  F (F’ implies F), we say that F is weaker than F’. Intuitively, F’ contains as least as much information as F because whatever F says about the world can be derived from F’. Intuitively, stronger formulas impose more restrictions on states. Thinking in terms of sets of states… SF’ = {s | s ² F’} SF = {s | s ² F} Note that SF’ µ SF since F’ imposes more restrictions than F Let Question: what formula is the weakest of all? (In other words, what formula describes the largest set of states? What formula imposes the least restrictions?)

Weakest Preconditions The weakest precondition of C with respect to F2 (denoted WP(C,F2)) is a formula F1 such that {F1} C {F2} and for all other F’1 such that {F’1} C {F2}, F’1 ) F1 (F1 is weaker then F’1). This notion is useful because it answers the question: “what formula F1 captures the fewest restrictions that I can impose on a state s so that when s’ = [[C]]s then s’ ² F2?” WP is interesting for us when calculating predicate abstractions because for a given command C and boolean variable [Pi], we want to know the least restrictive conditions that have to hold before C so that we can conclude that Pi is definitely true (false) after executing C.

Calculating Weakest Preconditions Calculating WP for assignments is easy: WP(x := e, F) = F[x ! e ] Intuition: x is going to get a new value from e, so if F has to hold after x := e, then F[x ! e] is required to hold before x := e is executed. Examples = (0 = y) WP(x := 0, x = y) = (x = y)[x ! 0] = (0 = y + 1) WP(x := 0, x = y + 1) = (x = y + 1)[x ! 0] = (x + 1 = y + 1) WP(x := x+1, x = y + 1) = (x = y + 1)[x ! x + 1]

Calculating Weakest Preconditions Calculating WP for other commands (state transformers): WP(skip, F) = F WP(assert e, F) = e ) F (: e  F) WP(assume e, F) = e ) F (: e  F) Skip: since the store is not modified, then F will hold afterward iff it holds before. Assert and Assume: even though we have a different operational interpretation of assert and assume in the verifier, the definition of WP of these rely on the fact that we assume that if an assertion or assume condition is violated, it’s the same as the command “not completing”. Note that if e is false, then the triple {(: e  F)} assert e {F} always holds since the command never completes for any state.

Assessment Intuition: Source Program atomic { } Abstracted Program Transformed Assignment statement C Assignment to each boolean variable [Pi] where each assignment has the form [Pi] := H(WP(C,Pi),WP(C,!Pi)). But what’s wrong with this? Answer: the predicates Pi refer to concrete variables, and the entire purpose of the abstraction process is to remove those from the program. The point is that the conditions in the ‘H’ function should be stated in terms of the boolean variables [Pi] instead of the predicates Pi.

Assessment In the case of x := 0 and the predicate x = y, we have WP(x := 0, x=y) = (0=y) WP(x := 0, !x=y) = !(0=y) In this case, the information in the predicate variables is enough to decide whether 0=y holds or not. That is, we can simply generate the assignment statement [(x=y)] := H([$(y = 0)],![$(y=0)]);

Assessment In the case of x := 0 and the predicate x = (y+1), we have WP(x := 0, (x=y+1)) = (0=y+1) WP(x := 0, !(x=y+1)) = !(0=y+1) In this case, we don’t have a predicate variable [0=y+1]. We must consider combinations of our existing predicate variables that imply the conditions above. That is, we consider stronger (more restrictive, less desirable but still useful) conditions formed using the predicate variables that we have.

What Are Appropriate Predicates? In general, a difficult question, and subject of much research Research focuses on automatic discovery of predicates by processing (infeasible) counterexamples If a counterexample is infeasible, add predicates that allow infeasible branches to be avoided counterexample Add the predicate x > y so that we have enough information to avoid the infeasible branch. If x > y … Infeasible branch taken feasible branch

What Are Appropriate Predicates? Some general heuristics that we will follow Use the predicates A mentioned in property P, if variables mentioned in predicates are being removed by abstraction At least, we want to tell if our property predicates are true or not Use predicates that appear in conditionals along “important paths” E.g., (x=y) Predicates that allow predicates above to be decided E.g., (x=0), (y=0), (x = (y + 1))