Spring 2016 Program Analysis and Verification Lecture 8: Abstract Interpretation I Semantic Domains Roman Manevich Ben-Gurion University
Tentative syllabus Program Verification Program Analysis Basics Operational semantics Hoare Logic Applying Hoare Logic Weakest Precondition Calculus Proving Termination Data structures Automated Verification Program Analysis Basics From Hoare Logic to Static Analysis Control Flow Graphs Equation Systems Collecting Semantics Using Soot Abstract Interpretation fundamentals Lattices Fixed-Points Chaotic Iteration Galois Connections Domain constructors Widening/ Narrowing Analysis Techniques Numerical Domains Alias analysis Interprocedural Analysis Shape Analysis CEGAR
Collecting semantics in equational form A vector of variables R[0, …, k] one per input/output of a node R[0] is for entry For node n with multiple predecessors add equation R[n] = {R[k] | k is a predecessor of n} For an atomic operation node R[m] S R[n] add equation R[n] = S R[m] Transform if b then S1 else S2 to (assume b; S1) or (assume b; S2) entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1
Agenda Semantic domains Preorders Partial orders (posets) Appendix A. Semantic domains Preorders Partial orders (posets) Pointed posets Ascending/descending chains The height of a poset Join and Meet operators Complete lattices Constructing new lattices from old
Abstract interpretation Theory [1977] By Rama (Own work) [CC-BY-SA-2.0-fr (http://creativecommons.org/licenses/by-sa/2.0/fr/deed.en)], via Wikimedia Commons
Abstract Interpretation [CC77] A very general mathematical framework for approximating semantics Generalizes Hoare Logic Generalizes weakest precondition calculus Allows designing sound static analysis algorithms Usually compute by iterating to a fixed-point Not specific to any programming language style Results of an abstract interpretation are (loop) invariants Can be interpreted as axiomatic verification assertions and used for verification
Annotating programs { P’ } S { Q’ } { P } S { Q } [consp] Annotate(P, S) = case S is x:=aexpr return {P} x:=aexpr {F*[x:=aexpr] P} case S is S1; S2 let Annotate(P, S1) be {P} A1 {Q1} let Annotate(Q1, S2) be {Q1} A2 {Q2} return {P} A1; {Q1} A2 {Q2} case S is if bexpr then S1 else S2 let Pt = F[assume bexpr] P let Pf = F[assume bexpr] P let Annotate(Pt, S1) be {Pt} A1 {Q1} let Annotate(Pf, S2) be {Pf} A2 {Q2} return {P} if bexpr then {Pt} A1 {Q1} else {Pf} A2 {Q2} {Q1 Q2} case S is while bexpr do S N := Nc := P // Initialize repeat let Pt = F[assume bexpr] Nc let Annotate(Pt, S) be {Nc} Abody {N} Nc := Nc N until N = Nc return {P} INV= {N} while bexpr do {Pt} Abody {F[assume bexpr](N)} Approximates concrete semantics sp(x:=aexpr, P) F*[x:=aexpr] Approximates disjunction { P’ } S { Q’ } { P } S { Q } [consp] if PP’ and Q’Q
representation of sets of states representation of sets of states The big picture Use semantic domains to define both concrete semantics and abstract semantics Relate semantics in a sound way Interpret program over abstract semantics abstract representation of sets of states abstract representation of sets of states statement S abstract semantics abstraction meaning abstraction meaning set of states set of states set of states statement S collecting semantics
A theory of semantic domains 1. Approximating elements 2. Approximating sets of elements By Brett Jordan David Macdonald [CC-BY-2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
Overall idea A semantic domain can be used to define properties (representations of predicates) Also called abstract states We called them assertions in axiomatic semantics Common representations Logical formulas Automata Specialized graphs
A taxonomy of semantic domain types Complete Lattice (D, , , , , ) Lattice (D, , , , , ) Join semilattice (D, , , ) Meet semilattice (D, , , ) Complete partial order (CPO) (D, , ) Partial order (poset) (D, ) Preorder (D, )
preorders
Preorder Let D (for semantic domain) be a set of elements We say that a binary order relation over D is a preorder if the following conditions hold for every d, d’, d’’ D Reflexive: d d Transitive: d d’ and d’ d’’ implies d d’’ There may exist d, d’ such that d d’ and d’ d yet d d’
Preorder examples SAV-predicates SAV-factoids = { x = y | x, y Var } { x = y + z | x, y, z Var } SAV-predicates = 2 Order relation 1: P1 set P2 iff P1 P2 Order relation 2: P1 imp P2 iff P1 P2 Which order relation is stronger (contains more pairs)? Which order relation is easier to check? What if both P1 and P2 are in the image of reduce?
SAV preorder 1: P1 set P2 iff P1 P2 Hasse diagram Var = {x, y} {} {x=y} {y=x} {x=x+x} {y=y+y} {y=x+y} {y=y+x} {x=x+y} {x=y+x} … {x=y, y=x} {x=y, x=x+x} {x=x+y, x=y+x} … {x=y, x=x+x, x=x+y} {x=y, x=x+x, x=x+y} {x=y, y=x, x=x+x, y=y+y, y=x+y, y=y+x, x=x+y, x=y+x}
SAV preorder 2: P1 imp P2 iff P1 P2 Var = {x, y} {} {x=y} {y=x} {x=x+x} {y=y+y} {y=x+y} {y=y+x} {x=x+y} {x=y+x} … {x=y, y=x} {x=y, x=x+x} {x=x+y, x=y+x} … … {x=y, x=x+x, x=x+y} {x=y, x=x+x, x=x+y} {x=y, y=x, x=x+x, y=y+y, y=x+y, y=y+x, x=x+y, x=y+x}
Preorder examples CP-predicates CP-factoids = { x = c | x Var, c Z } CP-predicates = 2 Order relation 1: P1 set P2 iff P1 P2 Order relation 2: P1 imp P2 iff P1 P2 Is there a difference? {x=5, x=7, x=9} {x=5, x=7} {x=5, x=7, x=9} {x=5, x=7} {x=5, x=7} {x=5, x=7, x=9}
CP preorder example … … Var = {x} {} {x=-3} {x=-2} {x=-1} {x=0} {x=1}
CP preorder example … … … Var = {x, y} {} {x=-3} {x=0} {x=3} {y=-5}
The problem with preorders Equivalent elements have different representations {x=y, x=a+b} S {Q} {x=y, y=a+b} S {Q’} Leads to unpredictability Which result should our static analysis give?
The problem with preorders Equivalent elements have different representations {x=y, x=a+b} assume ya+b {x=y, x=a+b} {x=y, y=a+b} assume ya+b {false} Leads to unpredictability Which result should our static analysis give?
The problem with preorders Equivalent elements have different representations {x=y, x=a+b} assume xa+b {false} {x=y, y=a+b} assume xa+b {x=y, x=a+b} Leads to unpredictability Which result should our static analysis give? May turn a terminating analysis into a non-terminating one Hasse diagram contains cycles In practice some static analyses still use preorders (taking extreme care to ensure termination)
Partial orders
Partially ordered sets (partial orders) A partially ordered set (Poset for short) is a pair (D , ) : D D has the following properties, for all d, d’, d’’ in D Reflexive: d d Transitive: d d’ and d’ d’’ implies d d’’ Anti-symmetric: d d’ and d’ d implies d = d’ If d d’ and d d’ we write d d’ Makes it easier to choose the best element
Partially ordered sets (partial orders) A partially ordered set (Poset for short) is a pair (D , ) : D D has the following properties, for all d, d’, d’’ in D Reflexive: d d Transitive: d d’ and d’ d’’ implies d d’’ Anti-symmetric: d d’ and d’ d implies d = d’ If d d’ and d d’ we write d d’
SAV partial order SAV-predicates SAV-factoids = { x = y | x, y Var } { x = y + z | x, y, z Var } SAV-predicates = 2 Order relation 1: P1 set P2 iff P1 P2 Is this a partial order? Order relation 2: P1 imp P2 iff P1 P2 that is models(P1) models(P2) Is this a partial order? Order relation 3: P1 set* P2 iff reduce(P1) set reduce(P2) Is this a partial order?
Can we define a more precise partial order? CP partial order CP-predicates CP-factoids = { x = c | x Var, c Z } CP-predicates = 2 Order relation 1: P1 set P2 iff P1 P2 Is it a partial order? Order relation 2: P1 imp P2 iff P1 P2 Is it a partial order? Can we define a more precise partial order?
CP partial order CP-predicates CP-factoids false = { x = c | x Var, c Z } CP-predicates = 2 {false} Define reduce : 2 2 reduce(P) = if exists {x=c1, x=c2}P then {false} else P false = { P2 | P=reduce(P) } {false} Order relation: P1 P2 if P1 P2 or P1={false}
Pointed poset A poset (D, ) with a least element is called a pointed poset For all dD we have that d The pointed poset is denoted by (D , , ) We can always transform a poset (D, ) into a pointed poset by adding a special bottom element (D {}, {d | dD}, ) Example: false = { P2 | P=reduce(P) } {false}
chains
Chains If d d’ and d d’ we write d d’ Similarly define d d’ Let (D, ) be a poset An ascending chain is a sequence x1 x2 … xk … A descending chain is a sequence x1 x2 … xk … The height of a poset is the length of the maximal ascending chain What is the height of the SAV poset? What is the height of the CP poset?
Ascending chain example true x0 x0 x<0 x=0 x>0 false
Joining elements By Viviana Pastor (originally posted to Flickr as Harbour Bridge 1) [CC-BY-2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
Bounds Let (D , ) be a poset Let X D be a set of elements from D An element dD is an upper bound (ub) of X iff for every xD we have that xd An element dD is a lower bound (lb) of X iff for every xD we have that dx
Bounds Let (D , ) be a poset Let X D be a set of elements from D An element dD is the least upper bound (lub) of X iff d is the minimal of all upper bounds of X An element dD is the greatest lower bound (glb) of X iff d is the maximal of all lower bounds of X
Bounds example true false the signs lattice (for variable x) x0 x0
x0 and true are upper bounds false
x0 is the least upper bound true x0 x0 x<0 x=0 x>0 false
Join (confluence) operator Assume a poset (D, ) Let X D be a subset of D (finite/infinite) The join of X is defined as X = the least upper bound (LUB) of all elements in X if it exists X = min{ b | forall xX we have that xb} The supremum of the elements in X A kind of abstract union (disjunction) operator Properties of a join operator Commutative: x y = y x Associative: (x y) z = x (y z) Idempotent: x x = x x y = y iff x y
Properties of join Can be used to define partial order x y = y iff x y Monotone: if y z then (x y) (x z) x = x x =
Meet operator Assume a poset (D, ) Let X D be a subset of D (finite/infinite) The meet of X is defined as X = the greatest lower bound (GLB) of all elements in X if it exists X = max{ b | forall xX we have that bx} The infimum of the elements in X A kind of abstract intersection (conjunction) operator Properties of a join operator Commutative: x y = y x Associative: (x y) z = x (y z) Idempotent: x x = x
Complete partial orders
Complete partial order (CPO) A CPO is a partial order where each ascending chain has a supremum
CPO example Is there a join here? x0 x0 x<0 x=0 x>0 false
lattices
Complete lattice A complete lattice (D, , , , , ) is A set of elements D A partial order x y A join operator A meet operator
Join semilattice A complete lattice (D, , , ) is A set of elements D with A partial order x y A join operator
Meet semilattice A complete lattice (D, , , ) is A set of elements D with A partial order x y A meet operator
Powerset lattices For a set of elements X we define the powerset lattice for X as (2X, , , , , X) Notice it is a complete lattice For a set of program states State, we define the collecting lattice (2State, , , , , State)
Composing lattices
One lattice per variable true true x0 x0 y0 y0 x<0 x=0 x>0 y<0 y=0 y>0 false false How can we compose them?
Cartesian product
Cartesian product of complete lattices For two complete lattices L1 = (D1, 1, 1, 1, 1, 1) L2 = (D2, 2, 2, 2, 2, 2) Define the poset Lcart = (D1D2, cart, cart, cart, cart, cart) as follows: (x1, x2) cart (y1, y2) iff x1 1 y1 and x2 2 y2 cart = ? cart = ? cart = ? cart = ? Lemma: L is a complete lattice Define the Cartesian constructor Lcart = Cart(L1, L2)
Cartesian product example (true, true) true x0, true x0, true true, y0 true, y0 x0,y0 x0,y0 x0,y0 x0,y0 … … x0,y<0 x0,y<0 x0,y=0 x0,y=0 x0,y>0 x0,y>0 x>0,y0 x>0,y0 … x<0,y<0 x<0,y=0 x<0,y>0 x=0,y<0 x=0,y=0 x=0,y>0 x>0,y<0 x>0,y=0 x>0,y>0 … … … x<0, false false, y>0 How does it represent (x<0y<0) (x>0y>0)? false (false, false)
Disjunctive completion
Disjunctive completion For a complete lattice L = (D, , , , , ) Define the Powerset lattice L = (2D, , , , , ) = ? = ? = ? = ? = ? Lemma: L is a complete lattice L contains all subsets of D, which can be thought of as disjunctions of the corresponding predicates Define the disjunctive completion constructor L = Disj(L)
The base lattice CPfalse true … … {x=-2} {x=-1} {x=0} {x=1} {x=2} false
The disjunctive completion of CPfalse What is the height of this lattice? true … {x is even} {x is odd} {x is prime} … … … {x=-1 x=1x=-2} {x=0 x=1x=2} … … … {x=-2x=-1} {x=-2x=0} {x=-2x=1} {x=1x=2} … … {x=-2} {x=-1} {x=0} {x=1} {x=2} false
Relational product
Relational product of lattices L1 = (D1, 1, 1, 1, 1, 1) L2 = (D2, 2, 2, 2, 2, 2) Lrel = (2D1D2, rel, rel, rel, rel, rel) as follows: Lrel = ?
Relational product of lattices L1 = (D1, 1, 1, 1, 1, 1) L2 = (D2, 2, 2, 2, 2, 2) Lrel = (2D1D2, rel, rel, rel, rel, rel) as follows: Lrel = Disj(Cart(L1, L2)) Lemma: L is a complete lattice What does it buy us?
Cartesian product example true x0, true x0, true true, y0 true, y0 x0,y0 x0,y0 x0,y0 x0,y0 … … x0,y<0 x0,y<0 x0,y=0 x0,y=0 x0,y>0 x0,y>0 x>0,y0 x>0,y0 … x<0,y<0 x<0,y=0 x<0,y>0 x=0,y<0 x=0,y=0 x=0,y>0 x>0,y<0 x>0,y=0 x>0,y>0 … … … x<0, false false, y>0 How does it represent (x<0y<0) (x>0y>0)? What is the height of this lattice? false
Relational product example true x0 x0 y0 y0 (x<0y<0)(x>0y>0) (x<0y<0)(x>0y=0) (x<0y0)(x<0y0) … false How does it represent (x<0y<0) (x>0y>0)? What is the height of this lattice?
A lattice for collecting semantics
Collecting semantics … … … … label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 1 2 … 3 [x3] [x2] [x1] entry 4 5 … [x3] [x2] [x2] [x-1] [x0] [x1] 2 if x > 0 … [x-2] [x-1] exit [x0] [x1] x := x - 1 3 [x3] [x2] …
Defining the collecting semantics How should we represent the set of states at a single control-flow node by a lattice? How should we represent the sets of states at all control-flow nodes by a lattice?
Finite maps For a complete lattice L = (D, , , , , ) and finite set V Define the poset LVL = (VD, VL, VL, VL, VL, VL) as follows: f1 VL f2 iff for all vV f1(v) f2(v) VL = ? VL = ? VL = ? VL = ? Lemma: L is a complete lattice Define the map constructor LVL = Map(V, L)
The collecting lattice Lattice for a given control-flow node v: ? Lattice for entire control-flow graph with nodes V: ? We will use this lattice as a baseline for static analysis and define abstractions of its elements
The collecting lattice Lattice for a given control-flow node v: Lv=(2State, , , , , State) Lattice for entire control-flow graph with nodes V: LCFG = Map(V, Lv) We will use this lattice as a baseline for static analysis and define abstractions of its elements
Equational definition of the semantics Define variables of type set of states for each control-flow node Define constraints between them R[entry] entry R[2] 2 if x > 0 R[exit] R[3] exit x := x - 1 3
Equational definition of the semantics R[entry] = State R[2] = R[entry] x:=x-1 R[3] R[3] = assume x>0 R[2] R[exit] = assume x0 R[2] A recursive system of equations How can we approximate it using what we have learned so far? R[entry] entry R[2] 2 if x > 0 R[exit] R[3] exit x := x - 1 3
An abstract semantics R[entry] = R[2] = R[entry] x:=x-1# R[3] Abstract transformer for x:=x-1 R[entry] = R[2] = R[entry] x:=x-1# R[3] R[3] = assume x>0# R[2] R[exit] = assume x0# R[2] A recursive system of equations R[entry] entry R[2] 2 if x > 0 R[exit] R[3] exit x := x - 1 3
The meaning of sound analysis result R[entry] = R[2] R[entry] x:=x-1# R[3] R[3] assume x>0# R[2] R[exit] assume x0# R[2] A recursive system of inequations R[entry] entry R[2] 2 if x > 0 R[exit] R[3] exit x := x - 1 3
see you next time