Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 11: Abstract Interpretation III Roman Manevich Ben-Gurion University
Syllabus Semantics Natural Semantics Structural semantics Axiomatic Verification Static Analysis Automating Hoare Logic Control Flow Graphs Equation Systems Collecting Semantics Abstract Interpretation fundamentals LatticesFixed-Points Chaotic Iteration Galois Connections Domain constructors Widening/ Narrowing Interprocedural Analysis Analysis Techniques Numerical Domains CEGARAlias analysis Shape Analysis Crafting your own Soot From proofs to abstractions Systematically developing transformers 2
Previously Solving monotone systems Fixed-points Vanilla static analysis algorithm Chaotic iteration 3
Static analysis Given a system of equations for the collecting semantics A static analysis solves a corresponding system of equations over an abstract domain Questions: – How do you solve the second system? Chaotic Iteration – What is the relation between the solutions? This lecture 4 R[0] = { x Z} // established input R[1] = R[0] R[4] R[2] = assume x>0 R[1] R[3] = assume x 0 R[1] R[4] = x:=x-1 R[2] R[0] # = { x Z} # R[1] # = R[0] R[4] R[2] # = assume x>0 # R[1] R[3] # = assume x 0 # R[1] R[4] # = x:=x-1 # R[2]
Monotone function 5 f 1 x L1L1 L2L2 2 y 3 f(x)f(x) 4 f(y)f(y) f
Important cases of monotonicity Join: f(X, Y) = X Y is monotone in each operand – Prove it! Set lifting function: for a set X and any function g F(X) = { g(x) | x X } is monotone w.r.t. – Prove it! Notice that the collecting semantics function is defined in terms of – Join (set union) – Semantic function for atomic statements lifted to sets of states Conclusion: collecting semantics function is monotone 6
Fixed points L = (D, , , , , ) f : D D monotone Fix(f) = { d | f(d) = d } Red(f) = { d | f(d) d } Ext(f) = { d | d f(d) } Theorem [Tarski 1955] – lfp(f) = Fix(f) = Red(f) Fix(f) – gfp(f) = Fix(f) = Ext(f) Fix(f) 7 Red(f) Ext(f) Fix(f) lfp gfp fn()fn() 1.Does a solution always exist? Yes 2.If so, is it unique? No, but it has least/greatest solutions 3.If so, is it computable? Under some conditions…
Continuous functions Let L = (D, , , ) be a complete partial order – Every ascending chain has an upper bound A function f is continuous if for every increasing chain Y D*, f( Y) = { f(y) | y Y } Lemma: if f is continuous then f is monotone Proof: assume x y Therefore x y=y Then f(y) = f(x y) = f(x) f(y), which means f(x) f(y) 8
Continuous functions Let L = (D, , , ) be a complete partial order – Every ascending chain has an upper bound A function f is continuous if for every increasing chain Y D*, f( Y) = { f(y) | y Y } Lemma: if f is continuous then f is monotone Proof: assume x y Therefore x y=y Then f(y) = f(x y) = f(x) f(y), which means f(x) f(y) 9
Kleene’s fixed point theorem Let L = (D, , , ) be a complete partial order and a continuous function f: D D then lfp(f) = n N f n ( ) That is, take the ascending chain f( ) f(f( )) … f n ( ) … and return the supremum – Why is this an ascending chain? But how do you know if a function f is continuous 10
Continuity and ACC condition Let L = (D, , , ) be a complete partial order – Every ascending chain has an upper bound L satisfies the ascending chain condition (ACC) if every ascending chain eventually stabilizes: d 0 d 1 … d n = d n+1 = d n+2 = … Lemma: Monotone functions on posets satisfying ACC are continuous 11
Resulting algorithm Kleene’s fixed point theorem gives a constructive method for computing lfp(f) over a poset with ACC when f is monotone 12 lfp fn()fn() f()f() f2()f2() … d := while f(d) d do d := f(d) return d Algorithm lfp(f) = n N f n ( ) Mathematical definition
Vanilla algorithm Problem Definition: 1.Lattice of properties L of finite height (ACC) 2.For each statement define a monotone transformer Preparation: 1.Parse program into AST 2.Convert AST into CFG 3.Generate system of equations from CFG Analysis: 1.Initialize each analysis variable with 2.Update all analysis variables of each equation until reaching a fixed point 13 Non-incremental. Most variables don’t change.
Chaotic iteration 14 Input: – A cpo L = (D, , , ) satisfying ACC – L n = L L … L – A monotone function f : D n D n – A system of equations { X[i] | f(X) | 1 i n } Output: lfp(f) A worklist-based algorithm for i:=1 to n do X[i] := WL = {1,…,n} while WL do j := pop WL // choose index non-deterministically N := F[i](X) if N X[i] then X[i] := N add all the indexes that directly depend on i to WL (X[j] depends on X[i] if F[j] contains X[i]) return X
Required knowledge Collecting semantics Abstract semantics (over lattices) Algorithm to compute abstract semantics (chaotic iteration) Connection between collecting semantics and abstract semantics Abstract transformers 15
Today Galois connections Abstract transformers Global soundness 16
Recap We defined a reference semantics – the collecting semantics We defined an abstract semantics for a given lattice and abstract transformers We defined an algorithm to compute abstract least fixed-point when transformers are monotone and lattice obeys ACC Questions: 1.What is the connection between the two least fixed- points? 2.Transformer monotonicity is required for termination – what should we require for correctness? 17
Recap We defined a reference semantics – the collecting semantics We defined an abstract semantics for a given lattice and abstract transformers We defined an algorithm to compute abstract least fixed-point when transformers are monotone and lattice obeys ACC Questions: 1.Does the algorithm terminate? 2.What is the connection between the two least fixed- points? 3.Transformer monotonicity is required for termination – what should we require for correctness? 18
Handling non-monotone transformers Kleene’s fixed point theorem gives a constructive method for computing lfp(f) over a poset with ACC when f is monotone Monotonicity ensures f( ) … f n ( ) … is an ascending chain What if f is not necessarily monotone? How can we ensure termination? 19 d := while f(d) d do d := f(d) return d Algorithm lfp(f) = n N f n ( ) Mathematical definition
Handling non-monotone transformers Define f’(d) = d f(d) Now f’ is extensive: d d f(d) = f’(d) and so f’( ) … f’ n ( ) … is an ascending chain Result is not necessarily the least fixed point – we get a (post)fixed point in finite time (ACC) 20 d := while f’(d) d do d := f’(d) return d Revised algorithm lfp(f) = n N f n ( ) n N f’ n ( ) Mathematical definition
Relating the concrete domain and the abstract domain 21
Galois Connection Given two complete lattices C = (D C, C, C, C, C, C )– concrete domain A = (D A, A, A, A, A, A )– abstract domain A Galois Connection (GC) is quadruple (C, , , A) that relates C and A via the monotone functions – The abstraction function : D C D A – The concretization function : D A D C For every concrete element c D C and abstract element a D A ( (a)) A a and c C ( (c)) Alternatively (c) A a iff c C (a) 22
Galois Connection: c C ( (c)) 23 1 c 2 (c)(c) 3 ( (c)) The most precise (least) element in A representing c CA
Galois Connection: ( (a)) A a 24 1 3 ( (a)) 2 (a)(a) a CA What a represents in C (its meaning)
Example: lattice of equalities Concrete lattice: C = (2 State, , , , , State) Abstract lattice: EQ = { x=y | x, y Var} A = (2 EQ, , , , EQ, ) – Treat elements of A as both formulas and sets of constraints Useful for copy propagation – a compiler optimization (X) = ? (Y) = ? 25
Example: lattice of equalities Concrete lattice: C = (2 State, , , , , State) Abstract lattice: EQ = { x=y | x, y Var} A = (2 EQ, , , , EQ, ) – Treat elements of A as both formulas and sets of constraints Useful for copy propagation – a compiler optimization ( ) = ({ }) = { x=y | x = y} that is x=y (X) = { ( ) | X} = A { ( ) | X} (Y) = { | Y } = models( Y) 26
Galois Connection: c C ( (c)) 27 1 [x 5, y 5, z 5] 2 x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y 3 … [x 6, y 6, z 6] [x 5, y 5, z 5] [x 4, y 4, z 4] … 4 x=x, y=y, z=z The most precise (least) element in A representing [x 5, y 5, z 5] CA
Most precise abstract representation 28 1 c 5 CA 8 9 (c)(c) (c) = {c’ | c (c’)}
Most precise abstract representation 29 1 c 5 CA 8 9 (c)= x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y (c) = {c’ | c (c’)} [x 5, y 5, z 5] x=y, y=z x=y, z=y x=y
Galois Connection: ( (a)) A a 30 1 3 x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y 2 … [x 6, y 6, z 6] [x 5, y 5, z 5] [x 4, y 4, z 4] … x=y, y=z What a represents in C (its meaning) is called a semantic reduction CA
Partial reduction The operator is called a semantic reduction since ( (a)) means the same a a but it is a reduced – more precise version of a An operator reduce : D A D A is a partial reduction if 1.reduce(a) A a and 2. (a)= (reduce(a)) 31
Galois Insertion a: ( (a))=a 32 1 x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y 2 … [x 6, y 6, z 6] [x 5, y 5, z 5] [x 4, y 4, z 4] … CA How can we obtain a Galois Insertion from a Galois Connection? All elements are reduced
Special cases 33
Properties of a Galois Connection The abstraction and concretization functions uniquely determine each other: (a) = {c | (c) a} (c) = {a | c (a)} 34
Abstracting (disjunctive) sets It is usually convenient to first define the abstraction of single elements (s) = ({s}) Then lift the abstraction to sets of elements (X) = A { (s) | s X} 35
The case of symbolic domains An important class of abstract domains are symbolic domains – domains of formulas C = (2 State, , , , , State) A = (D A, A, A, A, A, A ) If D A is a set of formulas then the abstraction of a state is defined as ( ) = ({ }) = A { | } the least formula from D A that s satisfies The abstraction of a set of states is (X) = A { ( ) | s X} The concretization is ( ) = { | } = models( ) 36
Composing Galois connections 37
Inducing along the connections Assume the complete lattices C = (D C, C, C, C, C, C ) A = (D A, A, A, A, A, A ) M = (D M, M, M, M, M, M ) and Galois connections GC C,A =(C, C,A, A,C, A) and GC A,M =(A, A,M, M,A, M) Lemma: both connections induce the GC C,M = (C, C,M, M,C, M) defined by C,M = C,A A,M and M,C = M,A A,C 38
Inducing along the connections 39 1 C,AC,A A,CA,C c 2 C,A(c)C,A(c) 5 CA 3 M A,MA,M 4 M,AM,A c’c’ a’ = A,M ( C,A (c))
Relating abstract transformers to concrete transformers 40
Sound abstract transformer Given two lattices C = (D C, C, C, C, C, C ) A = (D A, A, A, A, A, A ) and GC C,A =(C, , , A) with A concrete transformer f : D C D C an abstract transformer f # : D A D A We say that f # is a sound transformer (w.r.t. f) if c: f(c)=c’ (f # (c)) (c’) For every a and a’ such that (f( (a))) A f # (a) 41
Transformer soundness condition CA f 3 4 f#f# 5 c: f(c)=c’ (f # (c)) (c’)
Transformer soundness condition 2 43 CA 12 f#f# 3 5 f 4 a: f # (a)=a’ f( (a)) (a’)
Best (induced) transformer 44 CA 2 3 f f # (a)= (f( (a))) 1 f#f# 4 Problem: incomputable directly
Best abstract transformer [CC’77] Best in terms of precision – Most precise abstract transformer – May be too expensive to compute Constructively defined as f # = f – Induced by the GC Not directly computable because first step is concretization We often compromise for a “good enough” transformer – Useful tool: partial concretization 45
Developing a sound abstract transformer by example 46
Transformer example C = (2 State, , , , , State) EQ = { x=y | x, y Var} A = (2 EQ, , , , EQ, ) ( ) = ({ }) = { x=y | x = y } that is x=y (S) = { ( ) | S} = A { ( ) | S } ( ) = { | } = models( ) Concrete: x:=y S = { [x y] | S } Abstract: x:=y # S = ? 47
Developing a transformer for EQ - 1 Input has the form S = {a=b} sp(x:=expr, ) = v. x=expr[v/x] [v/x] sp(x:=y, S) = v. x=y[v/x] S[v/x] = … Let’s define helper notations: – Mod(x:=y, S) = {x=a, b=x S} Subset of equalities containing x (will be modified) – Frame(x:=y, S) = S \ Mod(x:=y, S) Subset of equalities not containing x (i.e., the frame) 48
Developing a transformer for EQ - 2 sp(x:=y, S) = v. x=y[v/x] {a=b}[v/x] = … Two cases – x is y: sp(x:=x, S) = S – x is different from y: sp(x:=y, S) = v. x=y Mod(x:=y, S) [v/x] Frame(x:=y, S) [v/x] = x=y Frame(x:=y, S) v. Mod(x:=y, S)[v/x] x=y Frame(x:=y, S) Vanilla transformer: x:=y #1 X = x=y Frame(x:=y, S) Example: x:=y #1 {x=p, q=x, m=n} = {x=y, m=n} Is this the most precise result? 49
Developing a transformer for EQ - 3 x:=y #1 {x=p, x=q, m=n} = {x=y, m=n} {x=y, m=n, p=q} – Where does the information p=q come from? sp(x:=y, S) = x=y Frame(x:=y, S) v. Mod(x:=y, S)[v/x] v. Mod(x:=y, S)[v/x] holds possible equalities between different a’s and b’s – how can we account for that? 50
Developing a transformer for EQ - 4 Define a reduction operator: reduce(S) = if {a=b, b=c} S and {a=c} S then reduce(S {a=c}) if {a=b} S and {b=a} S then reduce(S {b=a}) else S Define x:=y #2 = x:=y #1 reduce x:=y #2 ) {x=p, x=q, m=n}) = {x=y, m=n, p=q} is this the best transformer? 51
Developing a transformer for EQ - 5 x:=y #2 ) {y=z}) = {x=y, y=z} {x=y, y=z, x=z} Idea: apply reduction operator again after the vanilla transformer x:=y #3 = reduce x:=y #1 reduce Observation : after the first time we apply reduce, all subsequent values will be in the image of the abstraction so really we only need to apply it once to the input Finally: x:=y # S = reduce x:=y #1 – Best transformer for reduced elements (elements in the image of the abstraction) 52
Properties of abstract transformers 53
Negative property of best transformers Let f # = f Best transformer does not compose (f(f( (a)))) f # (f # (a)) Best transformer of composed operation (f 2 ) # = (f f) # = f f Composition of best transformers: (f # ) 2 = f # f # = f f 54 Source of precision loss
(f(f( (a)))) f # (f # (a)) 55 CA 2 3 f 1 f#f# f 7 f#f# 8 9 f
Global (fixed point) Soundness theorems 56
Soundness theorem 1 1.Given two complete lattices C = (D C, C, C, C, C, C ) A = (D A, A, A, A, A, A ) and GC C,A =(C, , , A) with 2.Monotone concrete transformer f : D C D C 3.Monotone abstract transformer f # : D A D A 4.Local soundness: a D A : f( (a)) (f # (a)) Then, global soundness: lfp(f) (lfp(f # )) (lfp(f)) lfp(f # ) 57
Soundness theorem 1 58 CA f fn fn … lpf(f) f2 f2 f3f3 f #n … lpf(f # ) f#2 f#2 f#3f#3 f# f# a D A : f( (a)) (f # (a)) a D A : f n ( (a)) (f #n (a)) a D A : lfp(f n )( (a)) (lfp(f #n )(a)) lfp(f) lfp(f # )
Soundness theorem 2 1.Given two complete lattices C = (D C, C, C, C, C, C ) A = (D A, A, A, A, A, A ) and GC C,A =(C, , , A) with 2.Monotone concrete transformer f : D C D C 3.Monotone abstract transformer f # : D A D A 4.Local soundness: c D C : (f(c)) f # ( (c)) Then, global soundness: (lfp(f)) lfp(f # ) lfp(f) (lfp(f # )) 59
Soundness theorem 2 60 CA f fn fn … lpf(f) f2 f2 f3f3 f #n … lpf(f # ) f#2 f#2 f#3f#3 f# f# c D C : (f(c)) f # ( (c)) c D C : (f n (c)) f #n ( (c)) c D C : (lfp(f)(c)) lfp(f # )( (c)) lfp(f) lfp(f # )
A recipe for a sound static analysis Define an “appropriate” operational semantics Define “collecting” structural operational semantics Establish a Galois connection between collecting states and abstract states Local correctness: show that the abstract interpretation of every atomic statement is sound w.r.t. the collecting semantics Global correctness: conclude that the analysis is sound 61
Completeness 62
Completeness Local property: – forward complete: c: (f # (c)) = (f(c)) – backward complete: a: f( (a)) = (f # (a)) A property of domain and the (best) transformer Global property: – (lfp(f)) = lfp(f # ) – lfp(f) = (lfp(f # )) Very ideal but usually not possible unless we change the program model (apply strong abstraction) and/or aim for very simple properties 63
Forward complete transformer CA f 3 4 f#f# c: (f # (c)) = (f(c))
Backward complete transformer 65 CA 12 f#f# 3 5 f a: f( (a)) = (f # (a))
Global (backward) completeness 66 CA f fn fn … lpf(f) f2 f2 f3f3 f #n … lpf(f # ) f#2 f#2 f#3f#3 f# f# a: f( (a)) = (f # (a)) a: f n ( (a)) = (f #n (a)) a D A : lfp(f n )( (a)) = (lfp(f #n )(a)) lfp(f) = lfp(f # )
Global (forward) completeness 67 CA f fn fn … lpf(f) f2 f2 f3f3 f #n … lpf(f # ) f#2 f#2 f#3f#3 f# f# c D C : (f(c)) = f # ( (c)) c D C : (f n (c)) = f #n ( (c)) c D C : (lfp(f)(c)) = lfp(f # )( (c)) lfp(f) = lfp(f # )
implementation 68
Implementing the VE analysis Implement the abstract states in the VE domain 2. Implement the VE abstract domain operations 3. Connect analysis to Soot A = (D A, A, A, A, A, A )
VE factoids 70 Don’t forget to override toString() – used for displaying analysis results
VE abstract state implementation 71 Implement bottom as a special object – must be handled appropriately in all abstract operations
Abstract domain base class 72
VE domain operations: , , , 73 used to match the right transformer for a given statement Handle bottom as a special corner case
VE abstract transformers 74 Find transformer by matching statement Improved transformer with reduction operator which branch of the condition
Matching transformers to statements 75 Extend matcher to easily pattern match statements It is usually safe to ignore many statements (skip) Handle assume x==y Handle assume x!=y Handle x=y Special case: x=x (what happens if we use the general case here?)
Matching transformers to statements 76 Handling x=expr as identity is unsound. Handle soundly by removing all factoids containing x (that is, lhs).
Example transformer x=y # 77 Check the precondition for applying this transformer Handle bottom as a special corner case A transformer is a unary operation parameterized by the type of abstract domain elements Override the apply method with one argument
Example transformer x=expr # 78 As a convention never mutate states – create new ones and modify them (functional programming style)
Next lecture: abstract interpretation IV