Spring 2016 Program Analysis and Verification Lecture 7: Static Analysis I Roman Manevich Ben-Gurion University
Tentative syllabus Program Verification Program Analysis Basics Operational semantics Hoare Logic Applying Hoare Logic Weakest Precondition Calculus Proving Termination Data structures Automated Verification Program Analysis Basics From Hoare Logic to Static Analysis Control Flow Graphs Equation Systems Collecting Semantics Using Soot Abstract Interpretation fundamentals Lattices Fixed-Points Chaotic Iteration Galois Connections Domain constructors Widening/ Narrowing Analysis Techniques Numerical Domains Alias analysis Interprocedural Analysis Shape Analysis CEGAR
Previously Axiomatic verification Weakest precondition calculus Strongest postcondition calculus Handling data structures Total correctness
Agenda Static analysis for compiler optimization Common Subexpression Elimination Available Expression domain Develop a static analysis: Simple Available Expressions Constant Propagation Basic concepts in static analysis Control flow graphs Equation systems Collecting semantics
Array-max example: Post1 nums : array N : int // N stands for num’s length { N0 } x := 0 { N0 x=0 } res := nums[0] { x=0 } Inv = { xN } while x < N { x=k k<N } if nums[x] > res then res := nums[x] { x=k k<N } x := x + 1 { x=k+1 k<N } { xN xN } { x=N }
Can we find this proof automatically? nums : array N : int { N0 } x := 0 { N0 x=0 } res := nums[0] { x=0 } Inv = { xN } while x < N { x=k k<N } if nums[x] > res then { x=k k<N } res := nums[x] { x=k k<N } { x=k k<N } x := x + 1 { x=k+1 k<N } { xN xN } { x=N } Observation: predicates in proof have the general form constraint where constraint has the form X - Y c or X c
Look under the street lamp …We may move lamp a bit By Infopablo00 (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
Zone Abstract Domain Developed by Antoine Mine in his Ph.D. thesis Uses constraints of the form X - Y c and X c
Analysis with Zone abstract domain Static Analysis with Zone Abstraction Manual Proof nums : array N : int { N0 } x := 0 { N0 x=0 } res := nums[0] { N0 x=0 } Inv = { N0 0xN } while x < N { N0 0x<N } if nums[x] > res then { N0 0x<N } res := nums[x] { N0 0x<N } { N0 0x<N } x := x + 1 { N0 0<x<N } {N0 0x x=N } nums : array N : int { N0 } x := 0 { N0 x=0 } res := nums[0] { x=0 } Inv = { xN } while x < N { x=k kN } if nums[x] > res then { x=k k<N } res := nums[x] { x=k k<N } { x=k k<N } x := x + 1 { x=k+1 k<N } { xN xN } { x=N }
Array-max example: Post3 nums : array { N0 0m<N } // N stands for num’s length x := 0 { x=0 } res := nums[0] { x=0 res=nums(0) } Inv = { 0m<x nums(m)res } while x < N { x=k res=oRes 0m<k nums(m)oRes } if nums[x] > res then { nums(x)>oRes res=oRes x=k 0m<k nums(m)oRes } res := nums[x] { res=nums(x) nums(x)>oRes x=k 0m<k nums(m)oRes } { x=k 0mk nums(m)<res } { (x=k 0m<k nums(m)<res) (res≥nums(x) x=k res=oRes 0m<k nums(m)oRes)} { x=k 0m<k nums(m)res } x := x + 1 { x=k+1 0mx-1 nums(m)res } { 0m<x nums(m)res } { x=N 0m<x nums(m)res} [univp]{ m. 0m<N nums(m)res }
Can we find this proof automatically? Various static analysis techniques can A framework for numeric analysis of array operations [Gopan et al. in POPL 2015] Discovering properties about arrays in simple programs [Halbwachs & Péron in PLDI 2008]
Static analysis for compiler optimizations
Motivating problem: optimization A compiler optimization is defined by a program transformation: T : Stmt Stmt The transformation is semantics-preserving: s. Ssos C s = Ssos T(C) s The transformation is applied to the program only if an enabling condition is met We use static analysis for inferring enabling conditions
Common Subexpression Elimination If we have two variable assignments x := a op b … y := a op b and the values of x, a, and b have not changed between the assignments, rewrite the code as x = a op b … y := x Eliminates useless recalculation Paves the way for more optimizations (e.g., dead code elimination) op {+, -, *, ==, <=}
What do we need to prove? CSE { true } C1 x := a op b C2 { x = a op b } y := a op b C3 { true } C1 x := a op b C2 { x = a op b } y := x C3 CSE Assertion localizes decision
A simplified problem CSE { true } C1 x := a + b C2 { x = a + b } y := a + b C3 { true } C1 x := a + b C2 { x = a + b } y := x C3 CSE
Available Expressions analysis A static analysis that infers for every program point a set of facts of the form AV = { x = y | x, y Var } { x = op y | x, y Var, op {-, !} } { x = y op z | y, z Var, op {+, -, *, <=} } For every program with n=|Var| variables number of possible facts is finite: |AV|=O(n3) Yields a trivial algorithm … Is it efficient?
Simple Available Expressions Define atomic facts (for SAV) as = { x = y | x, y Var } { x = y + z | x, y, z Var } For n=|Var| number of atomic facts is O(n3) Define sav-predicates as = 2
Notation for conjunctive sets of facts For a set of atomic facts D , we define Conj(D) = D E.g., if D={a=b, c=b+d, b=c} then Conj(D) = (a=b) (c=b+d) (b=c) Notice that for two sets of facts D1 and D2 Conj(D1 D2) = Conj(D1) Conj(D1) What does Conj({}) stand for…?
Towards an automatic proof Goal: automatically compute an annotated program proving as many facts as possible of the form x = y and x = y + z Decision 1: develop a forward-going proof Decision 2: draw predicates from a finite set D “looking under the light of the lamp” A compromise that simplifies problem by focusing attention – possibly miss some facts that hold Challenge 1: handle straight-line code Challenge 2: handle conditions Challenge 3: handle loops
Challenge 1: handling straight-line code By Zachary Dylan Tax (Zachary Dylan Tax) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons
Straight line code example { } x := a + b { x=a+b } z := a + c { x=a+b, z=a+c } b := a * c { z=a+c } Find a proof that satisfies both conditions
Straight line code example sp { } x := a + b { x=a+b } z := a + c { x=a+b, z=a+c } b := a * c { z=a+c } cons Frame Can we turn this into an algorithm? What should we ensure for each triple?
Goal Given a program of the form x1 := a1; … xn := an Find predicates P0, …, Pn such that {P0} x1 := a1 {P1} … {Pn-1} xn := an {Pn} is a proof That is: sp(xi := ai, Pi-1) Pi Each Pi has the form Conj(Di) where Di is a set of atomic
Algorithm for straight-line code Goal: find predicates P0, …, Pn such that {P0} x1 := a1 {P1} … {Pn-1} xn := an {Pn} is a proof That is: sp(xi := ai, Pi-1) Pi Each Pi has the form Conj(Di) where Di is a set of atomic facts Idea: define a function FSAV[x:=a] : s.t. if FSAV[x:=a](D) = D’ then sp(x := a, Conj(D)) Conj(D’) We call F the abstract transformer of x:=a Unless D0 is given, initialize D0={} (why?) For each i: compute Di+1 = Conj(FSAV[xi := ai] Di) Finally Pi = Conj(Di)
Defining an SAV abstract transformer Goal: define a function FSAV[x:=a] : s.t. if FSAV[x:=a](D) = D’ then sp(x := a, Conj(D)) Conj(D’) Idea: define rules for individual facts and generalize to sets of facts by the conjunction rule
Defining an SAV abstract transformer Goal: define a function FSAV[x:=a] : s.t. if FSAV[x:=a](D) = D’ then sp(x := a, Conj(D)) Conj(D’) Idea: define rules for individual facts and generalize to sets of facts by the conjunction rule
Defining an SAV abstract transformer Goal: define a function FSAV[x:=a] : s.t. if FSAV[x:=a](D) = D’ then sp(x := a, Conj(D)) Conj(D’) Idea: define rules for individual facts and generalize to sets of facts by the conjunction rule { x= } x:=a { } [kill-lhs] Is either a variable v or an addition expression v+w { y=x+w } x:=a { } [kill-rhs-1] { y=w+x } x:=a { } [kill-rhs-2] { } x:= { x= } [gen] { y=z+w } x:=a { y=z+w } [preserve]
SAV abstract transformer example { } x := a + b { x=a+b } z := a + c { x=a+b, z=a+c } b := a * c { z=a+c } Is either a variable v or an addition expression v+w { x= } x:= aexpr { } [kill-lhs] { y=x+w } x:= aexpr { } [kill-rhs-1] { y=w+x } x:= aexpr { } [kill-rhs-2] { } x:= { x= } [gen] { y=z+w } x:= aexpr { y=z+w } [preserve]
Problem 1: large expressions { } x := a + b + c { } y := a + b + c { } Missed CSE opportunity Large expressions on the right hand sides of assignments are problematic Can miss optimization opportunities Require complex transformers Solution: …?
Problem 1: large expressions { } x := a + b + c { } y := a + b + c { } Missed CSE opportunity Large expressions on the right hand sides of assignments are problematic Can miss optimization opportunities Require complex transformers Solution: transform code to normal form where right-hand sides have bounded size Standard compiler transformation – lowering into three address code
Three-address code { } x := a + b + c { } y := a + b + c { } { } i1 := a + b { i1=a+b } x := i1 + c { i1=a+b, x=i1+c } i2 := a + b { i1=a+b, x=i1+c, i2=a+b } y := i2 + c { i1=a+b, x=i1+c, i2=a+b, y=i2+c } Main idea: simplify expressions by storing intermediate results in new temporary variables Number of variables in simplified statements 3
Three-address code { } x := a + b + c { } y := a + b + c { } { } i1 := a + b { i1=a+b } x := i1 + c { i1=a+b, x=i1+c } i2 := a + b { i1=a+b, x=i1+c, i2=a+b } y := i2 + c { i1=a+b, x=i1+c, i2=a+b, y=i2+c } Need to infer i1=i2 Main idea: simplify expressions by storing intermediate results in new temporary variables Number of variables in simplified statements 3
Problem 2: transformer precision { } i1 := a + b { i1=a+b } x := i1 + c { i1=a+b, x=i1+c } i2 := a + b { i1=a+b, x=i1+c, i2=a+b } y := i2 + c { i1=a+b, x=i1+c, i2=a+b, y=i2+c } Need to infer i1=i2 Our transformer only infers syntactically available expressions – ones that appear in the code explicitly We want a transformer that considers the meaning of the predicates Takes equalities into account
Defining a semantic reduction Idea: make as many implicit facts explicit by Using symmetry and transitivity of equality Commutativity of addition Meaning of equality – can substitute equal variables For an SAV-predicate P=Conj(D) define reduce(D) = minimal set D* such that: D D* x=y D* implies y=x D* x=y D* y=z D* implies x=z D* x=y+z D* implies x=z+y D* x=y D* and x=z+w D* implies y=z+w D* x=y D* and z=x+w D* implies z=y+w D* x=z+w D* and y=z+w D* implies x=y D* Notice that reduce(D) D reduce is a special case of a semantic reduction
Sharpening the transformer Define: F*[x:=aexpr] = reduce FSAV[x:= aexpr] { } i1 := a + b { i1=a+b, i1=b+a } x := i1 + c { i1=a+b, i1=b+a, x=i1+c, x=c+i1 } i2 := a + b { i1=a+b, i1=b+a, x=i1+c, x=c+i1, i2=a+b, i2=b+a, i1=i2, i2=i1, x=i2+c, x=c+i2, } y := i2 + c { ... } Since sets of facts and their conjunction are isomorphic we will use them interchangeably
An algorithm for annotating SLP Annotate(P, x:=aexpr) = {P} x:=aexpr F*[x:= aexpr](P) Annotate(P, S1; S2) = let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2}
Challenge 2: handling conditions
Goal {bexpr P } S1 { Q }, { bexpr P } S2 { Q } { P } if bexpr then S1 else S2 { Q } [ifp] Annotate a program if bexpr then S1 else S2 with predicates from Assumption 1: P is given (otherwise use true) Assumption 2: bexpr is a simple binary expression e.g., x=y, xy, x<y (why?) { P } if bexpr then { bexpr P } S1 { Q1 } else { bexpr P } S2 { Q2 } { Q }
Joining predicates [ifp] {bexpr P } S1 { Q }, { bexpr P } S2 { Q } { P } if bexpr then S1 else S2 { Q } [ifp] Possibly an SAV-fact Start with P or {bexpr P} and annotate S1 (yielding Q1) Start with P or {bexpr P} and annotate S2 (yielding Q2) How do we infer a Q such that Q1Q and Q2Q? Q1=Conj(D1), Q2=Conj(D2) Define: Q = Q1 Q2 = Conj(D1 D2) { P } if bexpr then { bexpr P } S1 { Q1 } else { bexpr P } S2 { Q2 } { Q } Possibly an SAV-fact
Joining predicates [ifp] {bexpr P } S1 { Q }, { bexpr P } S2 { Q } { P } if bexpr then S1 else S2 { Q } [ifp] Start with P or {bexpr P} and annotate S1 (yielding Q1) Start with P or {bexpr P} and annotate S2 (yielding Q2) How do we infer a Q such that Q1Q and Q2Q? Q1=Conj(D1), Q2=Conj(D2) Define: Q = Q1 Q2 = Conj(D1 D2) { P } if bexpr then { bexpr P } S1 { Q1 } else { bexpr P } S2 { Q2 } { Q } The join operator for SAV
Joining predicates Q1=Conj(D1), Q2=Conj(D2) We want to soundly approximate Q1 Q2 in Define: Q = Q1 Q2 = Conj(D1 D2) Notice that Q1Q and Q2Q meaning Q1 Q2 Q
Simplifying handling of conditions Extend While with Non-determinism (or) and An assume statement assume b, s sos s if B b s = tt Use the fact that the following two statements are equivalent if b then S1 else S2 (assume b; S1) or (assume b; S2)
Handling conditional expressions We want to soundly approximate D bexpr and D bexpr in Define (bexpr) = if bexpr is factoid {bexpr} else {} Define F[assume bexpr](D) = D (bexpr) Can sharpen F*[assume bexpr] = reduce FSAV[assume bexpr]
Handling conditional expressions Notice bexpr (bexpr) Examples (y=z) = {y=z} (y<z) = {}
An algorithm for annotating conditions let Pt = F*[assume bexpr] P let Pf = F*[assume bexpr] P let Annotate(Pt, S1) be {Pt} A1 {Q1} let Annotate(Pf, S2) be {Pf} A2 {Q2} return {P} if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1 Q2}
Example { } if (x = y) { x=y, y=x } a := b + c { x=y, y=x, a=b+c, a=c+b } d := b – c { x=y, y=x, a=b+c, a=c+b } else { } a := b + c { a=b+c, a=c+b } d := b + c { a=b+c, a=c+b, d=b+c, d=c+b, a=d, d=a } { a=b+c, a=c+b }
Example { } if (x = y) { x=y, y=x } a := b + c { x=y, y=x, a=b+c, a=c+b } d := b – c { x=y, y=x, a=b+c, a=c+b } else { } a := b + c { a=b+c, a=c+b } d := b + c { a=b+c, a=c+b, d=b+c, d=c+b, a=d, d=a } { a=b+c, a=c+b }
Recap We now have an algorithm for soundly annotating loop-free code Generates forward-going proofs Algorithm operates on abstract syntax tree of code Handles straight-line code by applying F* Handles conditions by recursively annotating true and false branches and then intersecting their postconditions
Example { } if (x = y) { x=y, y=x } a := b + c { x=y, y=x, a=b+c, a=c+b } d := b – c { x=y, y=x, a=b+c, a=c+b } else { } a := b + c { a=b+c, a=c+b } d := b + c { a=b+c, a=c+b, d=b+c, d=c+b, a=d, d=a } { a=b+c, a=c+b }
Challenge 2: handling loops By Stefan Scheer (Own work (Own Photo)) [GFDL (http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/) or CC-BY-SA-2.5-2.0-1.0 (http://creativecommons.org/licenses/by-sa/2.5-2.0-1.0)], via Wikimedia Commons
{bexpr P } S { P } { P } while b do S {bexpr P } Goal {bexpr P } S { P } { P } while b do S {bexpr P } [whilep] Annotate a program while bexpr do S with predicates from s.t. P N Main challenge: find N Assumption 1: P is given (otherwise use true) Assumption 2: bexpr is a simple binary expression { P } Inv = { N } while bexpr do { bexpr N } S { Q } {bexpr N }
Example: annotate this program { y=x+a, y=a+x, w=d, d=w } Inv = { y=x+a, y=a+x } while (x z) do { z=x+a, z=a+x, w=d, d=w } x := x + 1 { w=d, d=w } y := x + a { y=x+a, y=a+x, w=d, d=w } d := x + a { y=x+a, y=a+x, d=x+a, d=a+x, y=d, d=y } { y=x+a, y=a+x, x=z, z=x }
Example: annotate this program { y=x+a, y=a+x, w=d, d=w } Inv = { y=x+a, y=a+x } while (x z) do { y=x+a, y=a+x } x := x + 1 { } y := x + a { y=x+a, y=a+x } d := x + a { y=x+a, y=a+x, d=x+a, d=a+x, y=d, d=y } { y=x+a, y=a+x, x=z, z=x }
{bexpr P } S { P } { P } while b do S {bexpr P } Goal {bexpr P } S { P } { P } while b do S {bexpr P } [whilep] Idea: try to guess a loop invariant from a small number of loop unrollings We know how to annotate S (by induction) { P } Inv = { N } while bexpr do { bexpr N } S { Q } {bexpr N }
k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a } { P } Inv = { N } while (x z) do x := x + 1 y := x + a d := x + a { P } if (x z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a } if (x z) x := x + 1 y := x + a d := x + a Q2 = { y=x+a } …
k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } { P } Inv = { N } while (x z) do x := x + 1 y := x + a d := x + a { P } if (x z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } if (x z) x := x + 1 y := x + a d := x + a Q2 = { y=x+a, y=a+x } …
k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } { P } Inv = { N } while (x z) do x := x + 1 y := x + a d := x + a { P } if (x z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } if (x z) x := x + 1 y := x + a d := x + a Q2 = { y=x+a, y=a+x } The following must hold: P N Q1 N Q2 N … Qk N …
k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } { P } Inv = { N } while (x z) do x := x + 1 y := x + a d := x + a { P } if (x z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } if (x z) x := x + 1 y := x + a d := x + a Q2 = { y=x+a, y=a+x } The following must hold: P N Q1 N Q2 N … Qk N … Observation 1: No need to explicitly unroll loop – we can reuse postcondition from unrolling k-1 for k We can compute the following sequence: N0 = P N1 = N0 Q1 N2 = N1 Q2 … Nk = Nk-1 Qk …
k-loop unrolling { y=x+a, y=a+x, w=d, d=w } if (x z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } { P } Inv = { N } while (x z) do x := x + 1 y := x + a d := x + a { P } if (x z) x := x + 1 y := x + a d := x + a Q1 = { y=x+a, y=a+x } if (x z) x := x + 1 y := x + a d := x + a Q2 = { y=x+a, y=a+x } The following must hold: P N Q1 N Q2 N … Qk N … Observation 2: Nk monotonically decreases set of facts. Question: does it stabilizes for some k? We can compute the following sequence: N0 = P N1 = N1 Q1 N2 = N1 Q2 … Nk = Nk-1 Qk …
Algorithm for annotating a loop Annotate(P, while bexpr do S) = Initialize N := Nc := P repeat let Annotate(P, if b then S else skip) be {Nc} if bexpr then S else skip {N} Nc := Nc N until N = Nc return {P} INV= N while bexpr do F[assume bexpr](N) Annotate(F[assume bexpr](N), S) F[assume bexpr](N)
Putting it together
Algorithm for annotating a program Annotate(P, S) = case S is x:=aexpr return {P} x:=aexpr {F*[x:=aexpr] P} case S is S1; S2 let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2} case S is if bexpr then S1 else S2 let Pt = F[assume bexpr] P let Pf = F[assume bexpr] P let Annotate(Pt, S1) be {Pt} A1 {Q1} let Annotate(Pf, S2) be {Pf} A2 {Q2} return {P} if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1 Q2} case S is while bexpr do S N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)}
Exercise: apply algorithm { } y := a+b { } x := y { } while (xz) do { } w := a+b { } x := a+b { } a := z { }
Step 1/18 {} y := a+b { y=a+b }* Not all factoids are shown – apply reduce to get all factoids {} y := a+b { y=a+b }* x := y while (xz) do w := a+b x := a+b a := z
Step 2/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* while (xz) do w := a+b x := a+b a := z
Step 3/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do w := a+b x := a+b a := z
Step 4/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b x := a+b a := z
Step 5/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b a := z
Step 6/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z
Step 7/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’ = { y=a+b, x=y, x=a+b }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z { w=y, w=x, x=y, a=z }*
Step 8/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { y=a+b, x=y, x=a+b }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z { w=y, w=x, x=y, a=z }*
Step 9/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { x=y }* w := a+b { y=a+b, x=y, x=a+b, w=a+b, w=x, w=y }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z { w=y, w=x, x=y, a=z }*
Step 10/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { x=y }* w := a+b { x=y, w=a+b }* x := a+b { y=a+b, w=a+b, w=y, x=a+b, w=x, x=y }* a := z { w=y, w=x, x=y, a=z }*
Step 11/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { x=y }* w := a+b { x=y, w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=y, w=x, x=y, a=z }*
Step 12/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’ = { x=y }* while (xz) do { x=y }* w := a+b { x=y, w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*
Step 13/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { x=y }* w := a+b { x=y, w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*
Step 14/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { } w := a+b { x=y, w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*
Step 15/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { } w := a+b { w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*
Step 16/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { } w := a+b { w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*
Step 17/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv’’’ = { } while (xz) do { } w := a+b { w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }*
Step 18/18 {} y := a+b { y=a+b }* x := y { y=a+b, x=y, x=a+b }* Inv = { } while (xz) do { } w := a+b { w=a+b }* x := a+b { x=a+b, w=a+b, w=x }* a := z { w=x, a=z }* { x=z }*
Constant propagation
Second static analysis example Optimization: constant folding Example: x:=7; y:=x*9 transformed to: x:=7; y:=7*9 and then to: x:=7; y:=63 Analysis: constant propagation (CP) Infers facts of the form x=c simplifies constant expressions constant folding { x=c } y := aexpr y := eval(aexpr[c/x])
Plan Define domain – set of allowed assertions Handle assignments Handle composition Handle conditions Handle loops
Constant propagation domain
CP semantic domain ?
CP semantic domain Define CP-factoids: = { x = c | x Var, c Z } How many factoids are there? Define predicates as = 2 How many predicates are there? Do all predicates make sense? (x=5) (x=7) Treat conjunctive formulas as sets of factoids {x=5, y=7} ~ (x=5) (y=7)
Handling assignments
CP abstract transformer Goal: define a function FCP[x:=aexpr] : such that if FCP[x:=aexpr] P = P’ then sp(x:=aexpr, P) P’ ?
CP abstract transformer Goal: define a function FCP[x:=aexpr] : such that if FCP[x:=aexpr] P = P’ then sp(x:=aexpr, P) P’ { x=c } x:=aexpr { } [kill] { } x:=c { x=c } [gen-1] { y=c1, z=c2 } x:=y op z { x=c} and c=c1 op c2 [gen-2] { y=c } x:=aexpr { y=c } [preserve]
Gen-kill formulation of transformers Suited for analysis propagating sets of factoids Available expressions, Constant propagation, etc. For each statement, define a set of killed factoids and a set of generated factoids F[S] P = (P \ kill(S)) gen(S) FCP[x:=aexpr] P = (P \ {x=c}) aexpr is not a constant FCP[x:=k] P = (P \ {x=c}) {x=k} Used in dataflow analysis – a special case of abstract interpretation
Handling composition
Does this still work? Annotate(P, S1; S2) = let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2}
Handling conditions
Handling conditional expressions We want to soundly approximate D bexpr and D bexpr in Define (bexpr) = if bexpr is CP-factoid {bexpr} else {} Define F[assume bexpr](D) = D (bexpr)
Does this still work? let Pt = F[assume bexpr] P let Pf = F[assume bexpr] P let Annotate(Pt, S1) be {Pt} A1 {Q1} let Annotate(Pf, S2) be {Pf} A2 {Q2} return {P} if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1 Q2} How do we define join for CP?
Join example {x=5, y=7} {x=3, y=7, z=9} =
Handling loops
Does this still work? What about correctness? What about termination? Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} What about correctness? What about termination?
Does this still work? What about correctness? What about termination? Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} What about correctness? If loop terminates then is N a loop invariant? What about termination?
A termination principle g : X X is a function How can we determine whether the sequence x0, x1 = g(x0), …, xk+1=g(xk),… stabilizes? Technique: Find ranking function rank : X N (that is show that rank(x) 0 for all x) Show that if xg(x) then rank(g(x)) < rank(x)
Rank function for available expressions rank(P) = ?
Rank function for available expressions Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} rank(P) = |P| number of factoids Prove that either Nc = Nc N or rank(Nc N) <? rank(Nc)
Rank function for constant propagation Annotate(P, while bexpr do S) = N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} rank(P) = ? Prove that either Nc = Nc N or rank(Nc) >? rank(Nc N)
Rank function for constant propagation Annotate(P, while bexpr do S) = N’ := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N’} Nc := Nc N’ until N’ = Nc return {P} INV= {N’} while bexpr do {Pt} Abody {F[assume bexpr](N)} rank(P) = |P| number of factoids Prove that either Nc = Nc N’ or rank(Nc) >? rank(Nc N’)
Available Expressions Abstract Interpretation Generalizing 1 Available Expressions Constant Propagation By NMZ (Photoshop) [CC0], via Wikimedia Commons Abstract Interpretation
Towards a recipe for static analysis Two static analyses Available Expressions (extended with equalities) Constant Propagation Semantic domain – a family of formulas Join operator approximates pairs of formulas Abstract transformers for basic statements Assignments assume statements Initial precondition
Control flow graphs
A technical issue Unrolling loops is quite inconvenient and inefficient (but we can avoid it as we just saw) How do we handle more complex control-flow constructs, e.g., goto , break, exceptions…? The problem: non-inductive control flow constructs Solution: model control-flow by labels and goto statements Would like a dedicated data structure to explicitly encode control flow in support of the analysis Solution: control-flow graphs (CFGs)
Modeling control flow with labels while (x z) do x := x + 1 y := x + a d := x + a a := b label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b
Control-flow graph example line number label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b 1 2 3 4 1 label0: 5 6 2 if x z 7 8 label1: x := x + 1 7 3 a := b y := x + a 8 4 d := x + a 5 goto label0 6
Control-flow graph example label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b 1 entry 2 3 4 1 label0: 5 6 2 if x z 7 8 label1: x := x + 1 7 3 a := b y := x + a 8 4 exit d := x + a 5 goto label0 6
Control-flow graph Node are statements or labels Special nodes for entry/exit A edge from node v to node w means that after executing the statement of v control passes to w Conditions represented by splits and join node Loops create cycles Can be generated from abstract syntax tree in linear time Automatically taken care of by the front-end Usage: store analysis results (assertions) in CFG nodes
Control-flow graph example label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b 1 entry 2 3 4 1 label0: 5 6 2 if x z 7 8 label1: x := x + 1 7 3 a := b y := x + a 8 4 exit d := x + a 5 goto label0 6
Eliminating labels We can use edges to point to the nodes following labels and remove all label nodes (other than entry/exit)
Control-flow graph example label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b 1 entry 2 3 4 1 label0: 5 6 2 if x z 7 8 label1: x := x + 1 7 3 a := b y := x + a 8 4 exit d := x + a 5 goto label0 6
Control-flow graph example label0: if x z goto label1 x := x + 1 y := x + a d := x + a goto label0 label1: a := b 1 entry 2 3 4 5 6 2 if x z 7 8 x := x + 1 3 a := b y := x + a 8 4 exit d := x + a 5
Basic blocks A basic block is a chain of nodes with a single entry point and a single exit point Entry/exit nodes are separate blocks entry 2 if x z x := x + 1 3 a := b y := x + a 8 4 exit d := x + a 5
Blocked CFG Stores basic blocks in a single node Extended blocks – maximal connected loop-free subgraphs entry 2 if x z x := x + 1 y := x + a d := x + a 3 4 a := b 5 8 exit
Collecting semantics
Why need another semantics? Operational semantics explains how to compute output from a given input Useful for implementing an interpreter/compiler Less useful for reasoning about safety properties Not suitable for analysis purposes – does not explicitly show how assertions in different program points influence each other Need a more explicit semantics Over a control flow graph
Control-flow graph example label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 entry 2 3 label0: 4 1 5 2 if x > 0 x := x - 1 3 label1: goto label0: 5 4 exit
Trimmed CFG label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 3 entry 4 5 2 if x > 0 exit x := x - 1 3
Collecting semantics example: input 1 label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 [x0] [x1] 2 if x > 0 [x0] exit [x1] x := x - 1 3
Collecting semantics example: input 2 label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 [x2] [x0] [x1] 2 if x > 0 [x0] exit [x1] x := x - 1 3 [x2]
Collecting semantics example: input 3 label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 [x3] [x2] [x0] [x1] 2 if x > 0 [x0] exit [x1] x := x - 1 3 [x3] [x2]
ad infinitum – fixed point label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 … [x3] [x2] [x0] [x1] 2 if x > 0 [x0] exit [x1] x := x - 1 … 3 [x-2] [x-1] [x3] [x2] …
Predicates at fixed point label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 3 {true} entry 4 5 {?} 2 if x > 0 {?} exit {?} x := x - 1 3
Predicates at fixed point label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 3 {true} entry 4 5 {true} 2 if x > 0 {x0} exit {x>0} x := x - 1 3 {x0}
Collecting semantics Accumulates for each control-flow node the (possibly infinite) sets of states that can reach there by executing the program from some given set of input states Not computable in general A reference point for static analysis (An abstraction of the trace semantics) We will define it formally
Collecting semantics in equational form
Math reference: function lifting Let f : X Y be a function The lifted function f’ : 2X 2Y is defined as f’(XS) = { f(x) | x XS } We will sometimes use the same symbol for both functions when it is clear from the context which one is used
Equational definition example A vector of variables R[0, 1, 2, 3, 4] R[0] = {xZ} // established input R[1] = R[0] R[4] R[2] = assume x>0 R[1] R[3] = assume (x>0) R[1] R[4] = x:=x-1 R[2] A (recursive) system of equations Semantic function for x:=x-1 lifted to sets of states entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1
General definition A vector of variables R[0, …, k] one per input/output of a node R[0] is for entry For node n with multiple predecessors add equation R[n] = {R[k] | k is a predecessor of n} For an atomic operation node R[m] S R[n] add equation R[n] = S R[m] Transform if b then S1 else S2 to (assume b; S1) or (assume b; S2) entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1
see you next time