Spring 2016 Program Analysis and Verification

Slides:

Advertisements

Similar presentations

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?

Advertisements

Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.

Lecture 11: Code Optimization CS 540 George Mason University.

Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.

Reasoning About Code; Hoare Logic, continued

Hoare’s Correctness Triplets Dijkstra’s Predicate Transformers

Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 11.

Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 5: Axiomatic Semantics II Roman Manevich Ben-Gurion University.

CS 536 Spring Global Optimizations Lecture 23.

Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.

4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)

Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.

Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.

Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.

Describing Syntax and Semantics

Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.

Abstract Interpretation (Cousot, Cousot 1977) also known as Data-Flow Analysis.

Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 7: Static Analysis I Roman Manevich Ben-Gurion University.

Verifiable Programming Reason about imperative sequential programs such as Java programs Imperative program –defines state space defined by collection.

Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 2: Operational Semantics I Roman Manevich Ben-Gurion University.

Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 14: Numerical Abstractions Roman Manevich Ben-Gurion University.

Program Analysis and Verification Noam Rinetzky Lecture 6: Abstract Interpretation 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.

Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 14: Numerical Abstractions Roman Manevich Ben-Gurion University.

Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 9: Abstract Interpretation I Roman Manevich Ben-Gurion University.

Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.

Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 4: Axiomatic Semantics I Roman Manevich Ben-Gurion University.

Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 12: Abstract Interpretation IV Roman Manevich Ben-Gurion University.

Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 4: Axiomatic Semantics I Roman Manevich Ben-Gurion University.

Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 13: Abstract Interpretation V Roman Manevich Ben-Gurion University.

Program Analysis and Verification Noam Rinetzky Lecture 5: Abstract Interpretation 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.

1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:

Operational Semantics Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 8: Static Analysis II Roman Manevich Ben-Gurion University.

Program Analysis and Verification Noam Rinetzky Lecture 6: Abstract Interpretation 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.

Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 6: Axiomatic Semantics III Roman Manevich Ben-Gurion University.

Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 6: Axiomatic Semantics III Roman Manevich Ben-Gurion University.

Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 8: Static Analysis II Roman Manevich Ben-Gurion University.

Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.

CS 412/413 Spring 2005Introduction to Compilers1 CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 30: Loop Optimizations and Pointer Analysis.

Program Analysis and Verification Spring 2016 Program Analysis and Verification Lecture 5: Axiomatic Semantics II Roman Manevich Ben-Gurion University.

Verifiable Programming Reason about imperative sequential programs such as Java Imperative program –defines state space defined by collection of typed.

Spring 2017 Program Analysis and Verification

Spring 2017 Program Analysis and Verification

Spring 2016 Program Analysis and Verification

Spring 2016 Program Analysis and Verification

Spring 2017 Program Analysis and Verification

Spring 2016 Program Analysis and Verification

Spring 2016 Program Analysis and Verification

Reasoning About Code.

Reasoning about code CSE 331 University of Washington.

Fall Compiler Principles Lecture 6: Dataflow & Optimizations 1

Symbolic Implementation of the Best Transformer

Fall Compiler Principles Lecture 8: Loop Optimizations

Spring 2017 Program Analysis and Verification Operational Semantics

Topic 10: Dataflow Analysis

Axiomatic semantics Points to discuss: The assignment statement

Program Analysis and Verification

Programming Languages and Compilers (CS 421)

Semantics In Text: Chapter 3.

Fall Compiler Principles Lecture 10: Global Optimizations

Fall Compiler Principles Lecture 10: Loop Optimizations

Data Flow Analysis Compiler Design

Static Single Assignment

Fall Compiler Principles Lecture 6: Dataflow & Optimizations 1

Spring 2016 Program Analysis and Verification

Intermediate Code Generation

Spring 2016 Program Analysis and Verification Operational Semantics

Programming Languages and Compilers (CS 421)

Presentation transcript:

Spring 2016 Program Analysis and Verification Lecture 7: Static Analysis I Roman Manevich Ben-Gurion University

Tentative syllabus Program Verification Program Analysis Basics Operational semantics Hoare Logic Applying Hoare Logic Weakest Precondition Calculus Proving Termination Data structures Automated Verification Program Analysis Basics From Hoare Logic to Static Analysis Control Flow Graphs Equation Systems Collecting Semantics Using Soot Abstract Interpretation fundamentals Lattices Fixed-Points Chaotic Iteration Galois Connections Domain constructors Widening/ Narrowing Analysis Techniques Numerical Domains Alias analysis Interprocedural Analysis Shape Analysis CEGAR

Previously Axiomatic verification Weakest precondition calculus Strongest postcondition calculus Handling data structures Total correctness

Agenda Static analysis for compiler optimization Common Subexpression Elimination Available Expression domain Develop a static analysis: Simple Available Expressions Constant Propagation Basic concepts in static analysis Control flow graphs Equation systems Collecting semantics

Array-max example: Post1 nums : array N : int // N stands for num’s length { N0 } x := 0 { N0  x=0 } res := nums[0] { x=0 } Inv = { xN } while x < N { x=k  k<N } if nums[x] > res then res := nums[x] { x=k  k<N } x := x + 1 { x=k+1  k<N } { xN  xN } { x=N }

Can we find this proof automatically? nums : array N : int { N0 } x := 0 { N0  x=0 } res := nums[0] { x=0 } Inv = { xN } while x < N { x=k  k<N } if nums[x] > res then { x=k  k<N } res := nums[x] { x=k  k<N } { x=k  k<N } x := x + 1 { x=k+1  k<N } { xN  xN } { x=N } Observation: predicates in proof have the general form  constraint where constraint has the form X - Y  c or X  c

Look under the street lamp …We may move lamp a bit By Infopablo00 (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Zone Abstract Domain Developed by Antoine Mine in his Ph.D. thesis Uses constraints of the form X - Y  c and X  c

Analysis with Zone abstract domain Static Analysis with Zone Abstraction Manual Proof nums : array N : int { N0 } x := 0 { N0  x=0 } res := nums[0] { N0  x=0 } Inv = { N0  0xN } while x < N { N0  0x<N } if nums[x] > res then { N0  0x<N } res := nums[x] { N0  0x<N } { N0  0x<N } x := x + 1 { N0  0<x<N } {N0  0x  x=N } nums : array N : int { N0 } x := 0 { N0  x=0 } res := nums[0] { x=0 } Inv = { xN } while x < N { x=k  kN } if nums[x] > res then { x=k  k<N } res := nums[x] { x=k  k<N } { x=k  k<N } x := x + 1 { x=k+1  k<N } { xN  xN } { x=N }

Array-max example: Post3 nums : array { N0  0m<N } // N stands for num’s length x := 0 { x=0 } res := nums[0] { x=0  res=nums(0) } Inv = { 0m<x  nums(m)res } while x < N { x=k  res=oRes  0m<k  nums(m)oRes } if nums[x] > res then { nums(x)>oRes  res=oRes  x=k  0m<k  nums(m)oRes } res := nums[x] { res=nums(x)  nums(x)>oRes  x=k  0m<k  nums(m)oRes } { x=k  0mk  nums(m)<res } { (x=k  0m<k  nums(m)<res)  (res≥nums(x)  x=k  res=oRes  0m<k  nums(m)oRes)} { x=k  0m<k  nums(m)res } x := x + 1 { x=k+1  0mx-1  nums(m)res } { 0m<x  nums(m)res } { x=N  0m<x  nums(m)res} [univp]{ m. 0m<N  nums(m)res }

Can we find this proof automatically? Various static analysis techniques can A framework for numeric analysis of array operations [Gopan et al. in POPL 2015] Discovering properties about arrays in simple programs [Halbwachs & Péron in PLDI 2008]

Static analysis for compiler optimizations

Motivating problem: optimization A compiler optimization is defined by a program transformation: T : Stmt  Stmt The transformation is semantics-preserving: s. Ssos  C  s = Ssos  T(C)  s The transformation is applied to the program only if an enabling condition is met We use static analysis for inferring enabling conditions

Common Subexpression Elimination If we have two variable assignments x := a op b … y := a op b and the values of x, a, and b have not changed between the assignments, rewrite the code as x = a op b … y := x Eliminates useless recalculation Paves the way for more optimizations (e.g., dead code elimination) op  {+, -, *, ==, <=}

What do we need to prove? CSE { true } C1 x := a op b C2 { x = a op b } y := a op b C3 { true } C1 x := a op b C2 { x = a op b } y := x C3 CSE Assertion localizes decision

A simplified problem CSE { true } C1 x := a + b C2 { x = a + b } y := a + b C3 { true } C1 x := a + b C2 { x = a + b } y := x C3 CSE

Available Expressions analysis A static analysis that infers for every program point a set of facts of the form AV = { x = y | x, y  Var }  { x = op y | x, y  Var, op  {-, !} }  { x = y op z | y, z  Var, op  {+, -, *, <=} } For every program with n=|Var| variables number of possible facts is finite: |AV|=O(n3) Yields a trivial algorithm … Is it efficient?

Simple Available Expressions Define atomic facts (for SAV) as  = { x = y | x, y  Var }  { x = y + z | x, y, z  Var } For n=|Var| number of atomic facts is O(n3) Define sav-predicates as  = 2

Notation for conjunctive sets of facts For a set of atomic facts D  , we define Conj(D) = D E.g., if D={a=b, c=b+d, b=c} then Conj(D) = (a=b)  (c=b+d)  (b=c) Notice that for two sets of facts D1 and D2 Conj(D1  D2) = Conj(D1)  Conj(D1) What does Conj({}) stand for…?

Towards an automatic proof Goal: automatically compute an annotated program proving as many facts as possible of the form x = y and x = y + z Decision 1: develop a forward-going proof Decision 2: draw predicates from a finite set D “looking under the light of the lamp” A compromise that simplifies problem by focusing attention – possibly miss some facts that hold Challenge 1: handle straight-line code Challenge 2: handle conditions Challenge 3: handle loops

Challenge 1: handling straight-line code By Zachary Dylan Tax (Zachary Dylan Tax) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons

Straight line code example { } x := a + b { x=a+b } z := a + c { x=a+b, z=a+c } b := a * c { z=a+c } Find a proof that satisfies both conditions

Straight line code example sp { } x := a + b { x=a+b } z := a + c { x=a+b, z=a+c } b := a * c { z=a+c } cons Frame Can we turn this into an algorithm? What should we ensure for each triple?

Goal Given a program of the form x1 := a1; … xn := an Find predicates P0, …, Pn such that {P0} x1 := a1 {P1} … {Pn-1} xn := an {Pn} is a proof That is: sp(xi := ai, Pi-1)  Pi Each Pi has the form Conj(Di) where Di is a set of atomic

Algorithm for straight-line code Goal: find predicates P0, …, Pn such that {P0} x1 := a1 {P1} … {Pn-1} xn := an {Pn} is a proof That is: sp(xi := ai, Pi-1)  Pi Each Pi has the form Conj(Di) where Di is a set of atomic facts Idea: define a function FSAV[x:=a] :    s.t. if FSAV[x:=a](D) = D’ then sp(x := a, Conj(D))  Conj(D’) We call F the abstract transformer of x:=a Unless D0 is given, initialize D0={} (why?) For each i: compute Di+1 = Conj(FSAV[xi := ai] Di) Finally Pi = Conj(Di)

Defining an SAV abstract transformer Goal: define a function FSAV[x:=a] :    s.t. if FSAV[x:=a](D) = D’ then sp(x := a, Conj(D))  Conj(D’) Idea: define rules for individual facts and generalize to sets of facts by the conjunction rule

Defining an SAV abstract transformer Goal: define a function FSAV[x:=a] :    s.t. if FSAV[x:=a](D) = D’ then sp(x := a, Conj(D))  Conj(D’) Idea: define rules for individual facts and generalize to sets of facts by the conjunction rule

Defining an SAV abstract transformer Goal: define a function FSAV[x:=a] :    s.t. if FSAV[x:=a](D) = D’ then sp(x := a, Conj(D))  Conj(D’) Idea: define rules for individual facts and generalize to sets of facts by the conjunction rule { x= } x:=a { } [kill-lhs]  Is either a variable v or an addition expression v+w { y=x+w } x:=a { } [kill-rhs-1] { y=w+x } x:=a { } [kill-rhs-2] { } x:=  { x= } [gen] { y=z+w } x:=a { y=z+w } [preserve]

SAV abstract transformer example { } x := a + b { x=a+b } z := a + c { x=a+b, z=a+c } b := a * c { z=a+c }  Is either a variable v or an addition expression v+w { x= } x:= aexpr { } [kill-lhs] { y=x+w } x:= aexpr { } [kill-rhs-1] { y=w+x } x:= aexpr { } [kill-rhs-2] { } x:=  { x= } [gen] { y=z+w } x:= aexpr { y=z+w } [preserve]

Problem 1: large expressions { } x := a + b + c { } y := a + b + c { } Missed CSE opportunity Large expressions on the right hand sides of assignments are problematic Can miss optimization opportunities Require complex transformers Solution: …?

Problem 1: large expressions { } x := a + b + c { } y := a + b + c { } Missed CSE opportunity Large expressions on the right hand sides of assignments are problematic Can miss optimization opportunities Require complex transformers Solution: transform code to normal form where right-hand sides have bounded size Standard compiler transformation – lowering into three address code

Three-address code { } x := a + b + c { } y := a + b + c { } { } i1 := a + b { i1=a+b } x := i1 + c { i1=a+b, x=i1+c } i2 := a + b { i1=a+b, x=i1+c, i2=a+b } y := i2 + c { i1=a+b, x=i1+c, i2=a+b, y=i2+c } Main idea: simplify expressions by storing intermediate results in new temporary variables Number of variables in simplified statements  3

Three-address code { } x := a + b + c { } y := a + b + c { } { } i1 := a + b { i1=a+b } x := i1 + c { i1=a+b, x=i1+c } i2 := a + b { i1=a+b, x=i1+c, i2=a+b } y := i2 + c { i1=a+b, x=i1+c, i2=a+b, y=i2+c } Need to infer i1=i2 Main idea: simplify expressions by storing intermediate results in new temporary variables Number of variables in simplified statements  3

Problem 2: transformer precision { } i1 := a + b { i1=a+b } x := i1 + c { i1=a+b, x=i1+c } i2 := a + b { i1=a+b, x=i1+c, i2=a+b } y := i2 + c { i1=a+b, x=i1+c, i2=a+b, y=i2+c } Need to infer i1=i2 Our transformer only infers syntactically available expressions – ones that appear in the code explicitly We want a transformer that considers the meaning of the predicates Takes equalities into account

Defining a semantic reduction Idea: make as many implicit facts explicit by Using symmetry and transitivity of equality Commutativity of addition Meaning of equality – can substitute equal variables For an SAV-predicate P=Conj(D) define reduce(D) = minimal set D* such that: D  D* x=y  D* implies y=x  D* x=y  D* y=z  D* implies x=z  D* x=y+z  D* implies x=z+y  D* x=y  D* and x=z+w  D* implies y=z+w  D* x=y  D* and z=x+w  D* implies z=y+w  D* x=z+w  D* and y=z+w  D* implies x=y  D* Notice that reduce(D)  D reduce is a special case of a semantic reduction

Sharpening the transformer Define: F*[x:=aexpr] = reduce  FSAV[x:= aexpr] { } i1 := a + b { i1=a+b, i1=b+a } x := i1 + c { i1=a+b, i1=b+a, x=i1+c, x=c+i1 } i2 := a + b { i1=a+b, i1=b+a, x=i1+c, x=c+i1, i2=a+b, i2=b+a, i1=i2, i2=i1, x=i2+c, x=c+i2, } y := i2 + c { ... } Since sets of facts and their conjunction are isomorphic we will use them interchangeably

An algorithm for annotating SLP Annotate(P, x:=aexpr) = {P} x:=aexpr F*[x:= aexpr](P) Annotate(P, S1; S2) = let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2}

Challenge 2: handling conditions

Goal {bexpr  P } S1 { Q }, { bexpr  P } S2 { Q } { P } if bexpr then S1 else S2 { Q } [ifp] Annotate a program if bexpr then S1 else S2 with predicates from  Assumption 1: P is given (otherwise use true) Assumption 2: bexpr is a simple binary expression e.g., x=y, xy, x<y (why?) { P } if bexpr then { bexpr  P } S1 { Q1 } else { bexpr  P } S2 { Q2 } { Q }

Joining predicates [ifp] {bexpr  P } S1 { Q }, { bexpr  P } S2 { Q } { P } if bexpr then S1 else S2 { Q } [ifp] Possibly an SAV-fact Start with P or {bexpr  P} and annotate S1 (yielding Q1) Start with P or {bexpr  P} and annotate S2 (yielding Q2) How do we infer a Q such that Q1Q and Q2Q? Q1=Conj(D1), Q2=Conj(D2) Define: Q = Q1  Q2 = Conj(D1  D2) { P } if bexpr then { bexpr  P } S1 { Q1 } else { bexpr  P } S2 { Q2 } { Q } Possibly an SAV-fact

Joining predicates [ifp] {bexpr  P } S1 { Q }, { bexpr  P } S2 { Q } { P } if bexpr then S1 else S2 { Q } [ifp] Start with P or {bexpr  P} and annotate S1 (yielding Q1) Start with P or {bexpr  P} and annotate S2 (yielding Q2) How do we infer a Q such that Q1Q and Q2Q? Q1=Conj(D1), Q2=Conj(D2) Define: Q = Q1  Q2 = Conj(D1  D2) { P } if bexpr then { bexpr  P } S1 { Q1 } else { bexpr  P } S2 { Q2 } { Q } The join operator for SAV

Joining predicates Q1=Conj(D1), Q2=Conj(D2) We want to soundly approximate Q1  Q2 in  Define: Q = Q1  Q2 = Conj(D1  D2) Notice that Q1Q and Q2Q meaning Q1  Q2 Q

Simplifying handling of conditions Extend While with Non-determinism (or) and An assume statement assume b, s sos s if B b s = tt Use the fact that the following two statements are equivalent if b then S1 else S2 (assume b; S1) or (assume b; S2)

Handling conditional expressions We want to soundly approximate D  bexpr and D  bexpr in  Define (bexpr) = if bexpr is factoid {bexpr} else {} Define F[assume bexpr](D) = D  (bexpr) Can sharpen F*[assume bexpr] = reduce  FSAV[assume bexpr]

Handling conditional expressions Notice bexpr  (bexpr) Examples (y=z) = {y=z} (y<z) = {}

An algorithm for annotating conditions let Pt = F*[assume bexpr] P let Pf = F*[assume bexpr] P let Annotate(Pt, S1) be {Pt} A1 {Q1} let Annotate(Pf, S2) be {Pf} A2 {Q2} return {P} if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1  Q2}

Example { } if (x = y) { x=y, y=x } a := b + c { x=y, y=x, a=b+c, a=c+b } d := b – c { x=y, y=x, a=b+c, a=c+b } else { } a := b + c { a=b+c, a=c+b } d := b + c { a=b+c, a=c+b, d=b+c, d=c+b, a=d, d=a } { a=b+c, a=c+b }

Example { } if (x = y) { x=y, y=x } a := b + c { x=y, y=x, a=b+c, a=c+b } d := b – c { x=y, y=x, a=b+c, a=c+b } else { } a := b + c { a=b+c, a=c+b } d := b + c { a=b+c, a=c+b, d=b+c, d=c+b, a=d, d=a } { a=b+c, a=c+b }

Recap We now have an algorithm for soundly annotating loop-free code Generates forward-going proofs Algorithm operates on abstract syntax tree of code Handles straight-line code by applying F* Handles conditions by recursively annotating true and false branches and then intersecting their postconditions

Example { } if (x = y) { x=y, y=x } a := b + c { x=y, y=x, a=b+c, a=c+b } d := b – c { x=y, y=x, a=b+c, a=c+b } else { } a := b + c { a=b+c, a=c+b } d := b + c { a=b+c, a=c+b, d=b+c, d=c+b, a=d, d=a } { a=b+c, a=c+b }

Challenge 2: handling loops By Stefan Scheer (Own work (Own Photo)) [GFDL (http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/) or CC-BY-SA-2.5-2.0-1.0 (http://creativecommons.org/licenses/by-sa/2.5-2.0-1.0)], via Wikimedia Commons

{bexpr  P } S { P } { P } while b do S {bexpr  P } Goal {bexpr  P } S { P } { P } while b do S {bexpr  P } [whilep] Annotate a program while bexpr do S with predicates from  s.t. P  N Main challenge: find N Assumption 1: P is given (otherwise use true) Assumption 2: bexpr is a simple binary expression { P } Inv = { N } while bexpr do { bexpr  N } S { Q } {bexpr  N }

Example: annotate this program { y=x+a, y=a+x, w=d, d=w } Inv = { y=x+a, y=a+x } while (x  z) do { z=x+a, z=a+x, w=d, d=w } x := x + 1 { w=d, d=w } y := x + a { y=x+a, y=a+x, w=d, d=w } d := x + a { y=x+a, y=a+x, d=x+a, d=a+x, y=d, d=y } { y=x+a, y=a+x, x=z, z=x }

Example: annotate this program { y=x+a, y=a+x, w=d, d=w } Inv = { y=x+a, y=a+x } while (x  z) do { y=x+a, y=a+x } x := x + 1 { } y := x + a { y=x+a, y=a+x } d := x + a { y=x+a, y=a+x, d=x+a, d=a+x, y=d, d=y } { y=x+a, y=a+x, x=z, z=x }

{bexpr  P } S { P } { P } while b do S {bexpr  P } Goal {bexpr  P } S { P } { P } while b do S {bexpr  P } [whilep] Idea: try to guess a loop invariant from a small number of loop unrollings We know how to annotate S (by induction) { P } Inv = { N } while bexpr do { bexpr  N } S { Q } {bexpr  N }

k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x  z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a } { P } Inv = { N } while (x  z) do x := x + 1 y := x + a d := x + a { P } if (x  z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a } if (x  z) x := x + 1 y := x + a d := x + a Q2 = { y=x+a } …

k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x  z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } { P } Inv = { N } while (x  z) do x := x + 1 y := x + a d := x + a { P } if (x  z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } if (x  z) x := x + 1 y := x + a d := x + a Q2 = { y=x+a, y=a+x } …

k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x  z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } { P } Inv = { N } while (x  z) do x := x + 1 y := x + a d := x + a { P } if (x  z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } if (x  z) x := x + 1 y := x + a d := x + a Q2 = { y=x+a, y=a+x } The following must hold: P  N Q1  N Q2  N … Qk  N …

k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x  z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } { P } Inv = { N } while (x  z) do x := x + 1 y := x + a d := x + a { P } if (x  z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } if (x  z) x := x + 1 y := x + a d := x + a Q2 = { y=x+a, y=a+x } The following must hold: P  N Q1  N Q2  N … Qk  N … Observation 1: No need to explicitly unroll loop – we can reuse postcondition from unrolling k-1 for k We can compute the following sequence: N0 = P N1 = N0  Q1 N2 = N1  Q2 … Nk = Nk-1  Qk …

k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x  z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } { P } Inv = { N } while (x  z) do x := x + 1 y := x + a d := x + a { P } if (x  z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } if (x  z) x := x + 1 y := x + a d := x + a Q2 = { y=x+a, y=a+x } The following must hold: P  N Q1  N Q2  N … Qk  N … Observation 2: Nk monotonically decreases set of facts. Question: does it stabilizes for some k? We can compute the following sequence: N0 = P N1 = N1  Q1 N2 = N1  Q2 … Nk = Nk-1  Qk …

Algorithm for annotating a loop Annotate(P, while bexpr do S) = Initialize N := Nc := P repeat let Annotate(P, if b then S else skip) be {Nc} if bexpr then S else skip {N} Nc := Nc  N until N = Nc return {P} INV= N while bexpr do F[assume bexpr](N) Annotate(F[assume bexpr](N), S) F[assume bexpr](N)

Putting it together

Algorithm for annotating a program Annotate(P, S) = case S is x:=aexpr return {P} x:=aexpr {F*[x:=aexpr] P} case S is S1; S2 let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2} case S is if bexpr then S1 else S2 let Pt = F[assume bexpr] P let Pf = F[assume bexpr] P let Annotate(Pt, S1) be {Pt} A1 {Q1} let Annotate(Pf, S2) be {Pf} A2 {Q2} return {P} if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1  Q2} case S is while bexpr do S N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc  N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)}

Exercise: apply algorithm { } y := a+b { } x := y { } while (xz) do { } w := a+b { } x := a+b { } a := z { }

Step 1/18 {} y := a+b { y=a+b }* Not all factoids are shown – apply reduce to get all factoids {} y := a+b { y=a+b }* x := y while (xz) do w := a+b x := a+b a := z

Step 2/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* while (xz) do w := a+b x := a+b a := z

Step 3/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do w := a+b x := a+b a := z

Step 4/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b x := a+b a := z

Step 5/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b a := z

Step 6/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z

Step 7/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z { w=y, w=x, x=y, a=z }*

Step 8/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z { w=y, w=x, x=y, a=z }*

Step 9/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { x=y }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z { w=y, w=x, x=y, a=z }*

Step 10/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { x=y }* w := a+b { x=y, w=a+b }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z { w=y, w=x, x=y, a=z }*

Step 11/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { x=y }* w := a+b { x=y, w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=y, w=x, x=y, a=z }*

Step 12/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { x=y }* w := a+b { x=y, w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*

Step 13/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { x=y }* w := a+b { x=y, w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*

Step 14/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { } w := a+b { x=y, w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*

Step 15/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { } w := a+b { w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*

Step 16/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { } w := a+b { w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*

Step 17/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { } w := a+b { w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*

Step 18/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv = { } while (xz) do { } w := a+b { w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }* { x=z }*

Constant propagation

Second static analysis example Optimization: constant folding Example: x:=7; y:=x*9 transformed to: x:=7; y:=7*9 and then to: x:=7; y:=63 Analysis: constant propagation (CP) Infers facts of the form x=c simplifies constant expressions constant folding { x=c } y := aexpr y := eval(aexpr[c/x])

Plan Define domain – set of allowed assertions Handle assignments Handle composition Handle conditions Handle loops

Constant propagation domain

CP semantic domain ?

CP semantic domain Define CP-factoids:  = { x = c | x  Var, c  Z } How many factoids are there? Define predicates as  = 2 How many predicates are there? Do all predicates make sense? (x=5)  (x=7) Treat conjunctive formulas as sets of factoids {x=5, y=7} ~ (x=5)  (y=7)

Handling assignments

CP abstract transformer Goal: define a function FCP[x:=aexpr] :    such that if FCP[x:=aexpr] P = P’ then sp(x:=aexpr, P)  P’ ?

CP abstract transformer Goal: define a function FCP[x:=aexpr] :    such that if FCP[x:=aexpr] P = P’ then sp(x:=aexpr, P)  P’ { x=c } x:=aexpr { } [kill] { } x:=c { x=c } [gen-1] { y=c1, z=c2 } x:=y op z { x=c} and c=c1 op c2 [gen-2] { y=c } x:=aexpr { y=c } [preserve]

Gen-kill formulation of transformers Suited for analysis propagating sets of factoids Available expressions, Constant propagation, etc. For each statement, define a set of killed factoids and a set of generated factoids F[S] P = (P \ kill(S))  gen(S) FCP[x:=aexpr] P = (P \ {x=c}) aexpr is not a constant FCP[x:=k] P = (P \ {x=c})  {x=k} Used in dataflow analysis – a special case of abstract interpretation

Handling composition

Does this still work? Annotate(P, S1; S2) = let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2}

Handling conditions

Handling conditional expressions We want to soundly approximate D  bexpr and D  bexpr in  Define (bexpr) = if bexpr is CP-factoid {bexpr} else {} Define F[assume bexpr](D) = D  (bexpr)

Does this still work? let Pt = F[assume bexpr] P let Pf = F[assume bexpr] P let Annotate(Pt, S1) be {Pt} A1 {Q1} let Annotate(Pf, S2) be {Pf} A2 {Q2} return {P} if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1  Q2} How do we define join for CP?

Join example {x=5, y=7}  {x=3, y=7, z=9} =

Handling loops

Does this still work? What about correctness? What about termination? Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc  N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} What about correctness? What about termination?

Does this still work? What about correctness? What about termination? Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc  N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} What about correctness? If loop terminates then is N a loop invariant? What about termination?

A termination principle g : X  X is a function How can we determine whether the sequence x0, x1 = g(x0), …, xk+1=g(xk),… stabilizes? Technique: Find ranking function rank : X  N (that is show that rank(x)  0 for all x) Show that if xg(x) then rank(g(x)) < rank(x)

Rank function for available expressions rank(P) = ?

Rank function for available expressions Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc  N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} rank(P) = |P| number of factoids Prove that either Nc = Nc  N or rank(Nc  N) <? rank(Nc)

Rank function for constant propagation Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc  N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} rank(P) = ? Prove that either Nc = Nc  N or rank(Nc) >? rank(Nc  N)

Rank function for constant propagation Annotate(P, while bexpr do S) = N’ := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N’} Nc := Nc  N’ until N’ = Nc return {P} INV= {N’} while bexpr do {Pt} Abody {F[assume bexpr](N)} rank(P) = |P| number of factoids Prove that either Nc = Nc  N’ or rank(Nc) >? rank(Nc  N’)

Available Expressions Abstract Interpretation Generalizing 1 Available Expressions Constant Propagation By NMZ (Photoshop) [CC0], via Wikimedia Commons Abstract Interpretation

Towards a recipe for static analysis Two static analyses Available Expressions (extended with equalities) Constant Propagation Semantic domain – a family of formulas Join operator approximates pairs of formulas Abstract transformers for basic statements Assignments assume statements Initial precondition

Control flow graphs

A technical issue Unrolling loops is quite inconvenient and inefficient (but we can avoid it as we just saw) How do we handle more complex control-flow constructs, e.g., goto , break, exceptions…? The problem: non-inductive control flow constructs Solution: model control-flow by labels and goto statements Would like a dedicated data structure to explicitly encode control flow in support of the analysis Solution: control-flow graphs (CFGs)

Modeling control flow with labels while (x  z) do x := x + 1 y := x + a d := x + a a := b label0: if x  z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b

Control-flow graph example line number label0: if x  z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b 1 2 3 4 1 label0: 5 6 2 if x  z 7 8 label1: x := x + 1 7 3 a := b y := x + a 8 4 d := x + a 5 goto label0 6

Control-flow graph example label0: if x  z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b 1 entry 2 3 4 1 label0: 5 6 2 if x  z 7 8 label1: x := x + 1 7 3 a := b y := x + a 8 4 exit d := x + a 5 goto label0 6

Control-flow graph Node are statements or labels Special nodes for entry/exit A edge from node v to node w means that after executing the statement of v control passes to w Conditions represented by splits and join node Loops create cycles Can be generated from abstract syntax tree in linear time Automatically taken care of by the front-end Usage: store analysis results (assertions) in CFG nodes

Control-flow graph example label0: if x  z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b 1 entry 2 3 4 1 label0: 5 6 2 if x  z 7 8 label1: x := x + 1 7 3 a := b y := x + a 8 4 exit d := x + a 5 goto label0 6

Eliminating labels We can use edges to point to the nodes following labels and remove all label nodes (other than entry/exit)

Control-flow graph example label0: if x  z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b 1 entry 2 3 4 1 label0: 5 6 2 if x  z 7 8 label1: x := x + 1 7 3 a := b y := x + a 8 4 exit d := x + a 5 goto label0 6

Control-flow graph example label0: if x  z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b 1 entry 2 3 4 5 6 2 if x  z 7 8 x := x + 1 3 a := b y := x + a 8 4 exit d := x + a 5

Basic blocks A basic block is a chain of nodes with a single entry point and a single exit point Entry/exit nodes are separate blocks entry 2 if x  z x := x + 1 3 a := b y := x + a 8 4 exit d := x + a 5

Blocked CFG Stores basic blocks in a single node Extended blocks – maximal connected loop-free subgraphs entry 2 if x  z x := x + 1 y := x + a d := x + a 3 4 a := b 5 8 exit

Collecting semantics

Why need another semantics? Operational semantics explains how to compute output from a given input Useful for implementing an interpreter/compiler Less useful for reasoning about safety properties Not suitable for analysis purposes – does not explicitly show how assertions in different program points influence each other Need a more explicit semantics Over a control flow graph

Control-flow graph example label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 entry 2 3 label0: 4 1 5 2 if x > 0 x := x - 1 3 label1: goto label0: 5 4 exit

Trimmed CFG label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 3 entry 4 5 2 if x > 0 exit x := x - 1 3

Collecting semantics example: input 1 label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 [x0] [x1] 2 if x > 0 [x0] exit [x1] x := x - 1 3

Collecting semantics example: input 2 label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 [x2] [x0] [x1] 2 if x > 0 [x0] exit [x1] x := x - 1 3 [x2]

Collecting semantics example: input 3 label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 [x3] [x2] [x0] [x1] 2 if x > 0 [x0] exit [x1] x := x - 1 3 [x3] [x2]

ad infinitum – fixed point label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 … [x3] [x2] [x0] [x1] 2 if x > 0 [x0] exit [x1] x := x - 1 … 3 [x-2] [x-1] [x3] [x2] …

Predicates at fixed point label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 3 {true} entry 4 5 {?} 2 if x > 0 {?} exit {?} x := x - 1 3

Predicates at fixed point label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 3 {true} entry 4 5 {true} 2 if x > 0 {x0} exit {x>0} x := x - 1 3 {x0}

Collecting semantics Accumulates for each control-flow node the (possibly infinite) sets of states that can reach there by executing the program from some given set of input states Not computable in general A reference point for static analysis (An abstraction of the trace semantics) We will define it formally

Collecting semantics in equational form

Math reference: function lifting Let f : X  Y be a function The lifted function f’ : 2X  2Y is defined as f’(XS) = { f(x) | x XS } We will sometimes use the same symbol for both functions when it is clear from the context which one is used

Equational definition example A vector of variables R[0, 1, 2, 3, 4] R[0] = {xZ} // established input R[1] = R[0]  R[4] R[2] = assume x>0 R[1] R[3] = assume (x>0) R[1] R[4] = x:=x-1 R[2] A (recursive) system of equations Semantic function for x:=x-1 lifted to sets of states entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1

General definition A vector of variables R[0, …, k] one per input/output of a node R[0] is for entry For node n with multiple predecessors add equation R[n] = {R[k] | k is a predecessor of n} For an atomic operation node R[m] S R[n] add equation R[n] = S R[m] Transform if b then S1 else S2 to (assume b; S1) or (assume b; S2) entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1

see you next time