Spring 2016 Program Analysis and Verification

Spring 2016 Program Analysis and Verification
Lecture 10: Abstract Interpretation III Galois Connections Roman Manevich Ben-Gurion University

Tentative syllabus Program Verification Program Analysis Basics
Operational semantics Hoare Logic Applying Hoare Logic Weakest Precondition Calculus Proving Termination Data structures Automated Verification Program Analysis Basics From Hoare Logic to Static Analysis Control Flow Graphs Equation Systems Collecting Semantics Using Soot Abstract Interpretation fundamentals Lattices Fixed-Points Chaotic Iteration Galois Connections Domain constructors Widening/ Narrowing Analysis Techniques Numerical Domains Alias analysis Interprocedural Analysis Shape Analysis CEGAR

Previously Solving monotone systems Vanilla static analysis algorithm
Chaotic iteration

Static analysis R[0] = State // established input R[1] = R[0]  R[4] R[2] = assume x>0 R[1] R[3] = assume x0 R[1] R[4] = x:=x-1 R[2] Given a system of equations for the collecting semantics A static analysis solves a corresponding system of equations over an abstract domain Questions: What is the relation between the solutions? Next lecture How do you solve the second system? This lecture R[0]# =  R[1]# = R[0]  R[4] R[2]# = assume x>0# R[1] R[3]# = assume x0# R[1] R[4]# = x:=x-1# R[2]

Fixed points L = (D, , , , , ) f : D  D monotone
Fix(f) = { d | f(d) = d } Red(f) = { d | f(d)  d } Ext(f) = { d | d  f(d) } Theorem [Tarski 1955] lfp(f) = Fix(f) = Red(f)  Fix(f) gfp(f) = Fix(f) = Ext(f)  Fix(f) Red(f) gfp Fix(f) lfp Ext(f) fn()  Does a solution always exist? Yes If so, is it unique? No, but it has least/greatest solutions If so, is it computable? Under some conditions…

Continuous functions Let L = (D, , , ) be a complete partial order
Every ascending chain has an upper bound A function f is continuous if for every increasing chain Y  D*, f(Y) = { f(y) | yY } Lemma: if f is continuous then f is monotone Proof: assume x  y Therefore xy=y Then f(y) = f(xy) = f(x)  f(y), which means f(x)  f(y)

Kleene’s fixed point theorem
Let L = (D, , , ) be a complete partial order and a continuous function f: D  D then lfp(f) = nN fn() That is, take the ascending chain   f()  f(f())  …  fn()  … and return the supremum Why is this an ascending chain? But how do you know if a function f is continuous

Continuity and ACC condition
Let L = (D, , , ) be a complete partial order Every ascending chain has an upper bound L satisfies the ascending chain condition (ACC) if every ascending chain eventually stabilizes: d0  d1  …  dn = dn+1 = dn+2 = … Lemma: Monotone functions on posets satisfying ACC are continuous

Mathematical definition
Resulting algorithm  Kleene’s fixed point theorem gives a constructive method for computing lfp(f) over a poset with ACC when f is monotone lfp(f) = nN fn() Mathematical definition lfp fn() d :=  while f(d)  d do d := f(d) return d Algorithm … f2() f() 

Vanilla algorithm: vector iteration
Problem Definition: Lattice of properties L of finite height (ACC) For each statement define a monotone transformer Preparation: Parse program into AST Lower AST to TAC Convert TAC into CFG Generate system of equations from CFG Analysis: Initialize each analysis variable with  Update all analysis variables of each equation until reaching a fixed point Non-incremental. Most variables don’t change.

Chaotic iteration Input: Output: lfp(f) A worklist-based algorithm
A cpo L = (D, , , ) satisfying ACC Ln = L  L  …  L A monotone function f : Dn Dn A system of equations { X[i] | f(X) | 1  i  n } Output: lfp(f) A worklist-based algorithm for i:=1 to n do X[i] :=  WL = {1,…,n} while WL   do j := pop WL // choose index non-deterministically N := F[j](X) if N  X[j] then X[j] := N add all the indexes that directly depend on j to WL (X[q] depends on X[j] if F[q] contains X[j]) return X

Required knowledge Collecting semantics
Abstract semantics (over lattices) Algorithm to compute abstract semantics Vector iteration Chaotic iteration Connection between collecting semantics and abstract semantics Abstract transformers

Agenda Galois connections Abstract transformers Global soundness

Recap 1/2 We defined a reference semantics – the collecting semantics
We defined an abstract semantics by Choosing an abstract domain (lattice) Developing algorithms for: Testing partial order Join Abstract transformers

Recap 2/2 We defined an algorithm to compute abstract least fixed-point when transformers are monotone and lattice obeys ACC Questions: What is the connection between the two least fixed-points? Transformer monotonicity is required for termination – what should we require for correctness?

Handling non-monotone transformers
Kleene’s fixed point theorem gives a constructive method for computing lfp(f) over a poset with ACC when f is monotone Monotonicity ensures   f()  … fn()  … is an ascending chain What if f is not necessarily monotone? How can we ensure termination? lfp(f) = nN fn() Mathematical definition d :=  while f(d)  d do d := f(d) return d Algorithm

Handling non-monotone transformers
Define f’(d) = d  f(d) Now f’ is extensive: d  d  f(d) = f’(d) and so   f’()  … f’n()  … is an ascending chain Result is not necessarily the least fixed point – we get a (post)fixed point in finite time (ACC) lfp(f) = nN fn()  nN f’n() Mathematical definition d :=  while f’(d)  d do d := f’(d) return d Revised algorithm

Relating the abstract domain to the concrete domain

((a)) A a and c C ((c))
Galois Connection Given two complete lattices C = (DC, C, C, C, C, C) – concrete domain A = (DA, A, A, A, A, A) – abstract domain A Galois Connection (GC) is quadruple (C, , , A) that relates C and A via the monotone functions The abstraction function  : DC  DA The concretization function  : DA  DC For every concrete element cDC and abstract element aDA 1) ((a)) A a and c C ((c)) (c) A a iff c C (a) 2)

(1.1) Galois Connection: c C ((c))
The most precise (least) element in A representing c  3 ((c))  2 (c) c  1

(1.2) Galois Connection: ((a)) A a
What a represents in C (its meaning) C A  a 2 (a) 1   3 ((a))

Example: lattice of equalities
Concrete lattice: C = (2State, , , , , State) Abstract lattice: EQ = { x=y | x, y  Var} A = (2EQ, , , , EQ , ) Treat elements of A as both formulas and sets of constraints Useful for copy propagation – a compiler optimization (X) = ? (Y) = ?

Example: lattice of equalities
Concrete lattice: C = (2State, , , , , State) Abstract lattice: EQ = { x=y | x, y  Var} A = (2EQ, , , , EQ , ) Treat elements of A as both formulas and sets of constraints Useful for copy propagation – a compiler optimization () = ({}) = { x=y |  x =  y} that is   x=y (X) = {() |  X} = A {() |  X} (Y) = { |   Y } = models(Y)

Galois Connection: c C ((c))
3 … [x6, y6, z6] [x5, y5, z5] [x4, y4, z4] … 4 x=x, y=y, z=z   2 x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y   1 [x5, y5, z5] The most precise (least) element in A representing [x5, y5, z5]

Most precise abstract representation
Lemma: (c) = {c’ | c  (c’)} C A 6  7 4 5 2    3 (c) 8  9 c 1

Most precise abstract representation
Lemma: (c) = {c’ | c  (c’)} C A x=y 6  7 x=y, z=y 4 x=y, y=z 5 2   3 (c)= x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y 8  9 c 1 [x5, y5, z5]

Galois Connection: ((a)) A a
What a represents in C (its meaning) C A 2 … [x6, y6, z6] [x5, y5, z5] [x4, y4, z4] …     is called a semantic reduction x=y, y=z 1   3 x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y

Partial/full reduction
The operator    is called a semantic reduction (or full reduction) since ((a)) means the same a a but it is a reduced – more precise version of a An operator reduce : DA  DA is a partial reduction if reduce(a) A a and (a)=(reduce(a))

Galois Insertion a: ((a))=a
How can we obtain a Galois Insertion from a Galois Connection? C A 2 … [x6, y6, z6] [x5, y5, z5] [x4, y4, z4] … All elements are reduced   1 x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y

Special cases

Properties of a Galois Connection
Theorem: the abstraction and concretization functions uniquely determine each other: (a) = {c | (c)  a} (c) = {a | c  (a)}

Abstracting (disjunctive) sets
It is usually convenient to first define the abstraction of single elements (s) = ({s}) Then lift the abstraction to sets of elements (X) = A {(s) | sX}

The case of symbolic domains
An important class of abstract domains are symbolic domains – domains of formulas C = (2State, , , , , State) A = (DA, A, A, A, A, A) If DA is a set of formulas then the abstraction of a state is defined as () = ({}) = A{ |   } the least formula from DA that s satisfies The abstraction of a set of states is (X) = A {() | sX} The concretization is () = { |    } = models()

Composing Galois connections

Inducing along the connections
Assume the complete lattices C = (DC, C, C, C, C, C) A = (DA, A, A, A, A, A) M = (DM, M, M, M, M, M) and Galois connections GCC,A=(C, C,A, A,C, A) and GCA,M=(A, A,M, M,A, M) Lemma: both Galois connections induce the GCC,M= (C, C,M, M,C, M) defined by C,M = C,A  A,M and M,C = M,A  A,C

Inducing along the connections
M A,C M,A c’ 5 4 a’ =A,M(C,A(c)) 3 c C,A(c) 1 C,A 2 A,M

Relating abstract transformers to concrete transformers

Sound abstract transformer
Given two lattices C = (DC, C, C, C, C, C) A = (DA, A, A, A, A, A) and GCC,A=(C, , , A) with A concrete transformer f : DC DC an abstract transformer f# : DA DA We say that f# is a sound transformer (w.r.t. f) if c: f(c)=c’  (f#(c))  (c’) For every a and a’ such that (f((a))) A f#(a)

Transformer soundness condition 1
c: f(c)=c’  (f#(c))  (c’) C A  5 f# 4 1 f 2 3

Transformer soundness condition 2
a: f#(a)=a’  f((a))  (a’) C A 4  f 5 1 f# 2 3

Best (induced) transformer
f#(a)= (f((a))) C A 4 f f# 3 1 2 Problem:  incomputable directly

Best abstract transformer [CC’77]
Best in terms of precision Most precise abstract transformer May be too expensive to compute Constructively defined as f# =   f   Induced by the GC Not directly computable because first step is concretization We often compromise for a “good enough” transformer Useful tool: partial concretization

Developing a sound abstract transformer by example

Transformer example C = (2State, , , , , State)
EQ = { x=y | x, y  Var} A = (2EQ, , , , EQ , ) () = ({}) = { x=y |  x =  y } that is   x=y (S) = {() |  S} = A { () | S } () = { |    } = models() Concrete: x:=y S = { [x y] | S } Abstract: x:=y# S = ?

Developing a transformer for EQ - 1
Input has the form S = {a=b} sp(x:=expr, ) = v. x=expr[v/x]  [v/x] sp(x:=y, S) = v. x=y[v/x]  S[v/x] = … Let’s define helper notations: Mod(x:=y, S) = {x=a, b=x  S} Subset of equalities containing x (will be modified) Frame(x:=y, S) = S \ Mod(x:=y, S) Subset of equalities not containing x (i.e., the frame)

sp(x:=y, S) = v. x=y[v/x]  {a=b}[v/x] = … Two cases x is y: sp(x:=x, S) = S x is different from y: sp(x:=y, S) = v. x=y  Mod(x:=y, S)[v/x]  Frame(x:=y, S)[v/x] = x=y  Frame(x:=y, S)  v. Mod(x:=y, S)[v/x]  x=y  Frame(x:=y, S) Vanilla transformer: x:=y#1 S = {x=y}  Frame(x:=y, S) Example: x:=y#1 {x=p, q=x, m=n} = {x=y, m=n} Is this the most precise result?

x:=y#1 {x=p, x=q, m=n} = {x=y, m=n}  {x=y, m=n, p=q} Where does the information p=q come from? sp(x:=y, S) = x=y  Frame(x:=y, S)  v. Mod(x:=y, S)[v/x] v. Mod(x:=y, S)[v/x] holds possible equalities between different a’s and b’s – how can we account for that?

Define a reduction operator: reduce(S) = if {a=b, b=c}S and {a=c}  S then reduce(S  {a=c}) if {a=b}S and {b=a}  S then reduce(S  {b=a}) else S Define x:=y#2 = x:=y#1  reduce x:=y#2 {x=p, x=q, m=n} = {x=y, m=n, p=q} is this the best transformer?

x:=y#2 {y=z} = {x=y, y=z}  {x=y, y=z, x=z} Solution: apply reduction operator again after the vanilla transformer x:=y#3 = reduce  x:=y#1  reduce Observation: after the first time we apply reduce, all subsequent values will be in the image of the abstraction so really we only need to apply it once to the input Finally: x:=y# S = reduce  x:=y#1 Best transformer for reduced elements (elements in the image of the abstraction)

Properties of abstract transformers

Negative property of best transformers
Let f# =   f   Best transformer does not compose (f(f((a))))  f#(f#(a)) Best transformer of composed operation (f2)# = (f  f)# =   f  f   Composition of best transformers: (f#)2= f#  f# =   f      f   Source of precision loss

(f(f((a))))  f#(f#(a))
C A 9 f 7  f# 5 4 f 8 6 f f# 3 2 1

Global (fixed point) Soundness theorems

Soundness theorem 1 Given two complete lattices C = (DC, C, C, C, C, C) A = (DA, A, A, A, A, A) and GCC,A=(C, , , A) with Monotone concrete transformer f : DC DC Monotone abstract transformer f# : DA DA Local soundness: a DA : f((a))  (f#(a)) Then, global soundness follows: lfp(f)  (lfp(f#)) (lfp(f))  lfp(f#)

Soundness theorem 1 C A lpf(f)  lpf(f#)  fn  f#n  … … f3 f#3
aDA : f((a))  (f#(a))  aDA : fn((a))  (f#n(a))  aDA : lfp(fn)((a))  (lfp(f#n)(a))  lfp(f)   lfp(f#)  C A lpf(f)  lpf(f#)  fn  f#n  … … f3 f#3 f#2  f2  f#  f   

Soundness theorem 2 Given two complete lattices C = (DC, C, C, C, C, C) A = (DA, A, A, A, A, A) and GCC,A=(C, , , A) with Monotone concrete transformer f : DC DC Monotone abstract transformer f# : DA DA Local soundness: c DC : (f(c))  f#((c)) Then, global soundness follows: (lfp(f))  lfp(f#) lfp(f)  (lfp(f#))

Soundness theorem 2 C A lpf(f#)  lpf(f)  f#n  fn  … … f#3 f3
c DC : (f(c))  f#((c))  c DC : (fn(c))  f#n((c))  c DC : (lfp(f)(c))  lfp(f#)((c))  lfp(f)   lfp(f#)  C A lpf(f#)   f  fn  … lpf(f)  f2  f3 f#n  … f#3 f#2  f#  

A recipe for a sound static analysis
Define an “appropriate” operational semantics Define “collecting” structural operational semantics Establish a Galois connection between collecting states and abstract states Local correctness: show that the abstract interpretation of every atomic statement is sound w.r.t. the collecting semantics Global correctness: conclude that the analysis is sound

Completeness

Completeness Local property:
forward complete: c: (f#(c)) = (f(c)) backward complete: a: f((a)) = (f#(a)) A property of domain (assuming the best transformer) Global property: (lfp(f)) = lfp(f#) lfp(f) = (lfp(f#)) Very ideal but usually not possible unless we change the program model Apply very coarse abstraction and/or Aim for very simple properties

Forward complete transformer
c: (f#(c)) = (f(c)) C A 4 1 f 2 f# 3

Backward complete transformer
a: f((a)) = (f#(a)) C A f 5 1 f# 2 3

Global (backward) completeness
a: f((a)) = (f#(a))  a: fn((a)) = (f#n(a))  aDA : lfp(fn)((a)) = (lfp(f#n)(a))  lfp(f)  = lfp(f#)  C A lpf(f)  lpf(f#)  fn  f#n  … … f3 f#3 f#2  f2  f#  f   

Global (forward) completeness
c DC : (f(c)) = f#((c))  c DC : (fn(c)) = f#n((c))  c DC : (lfp(f)(c)) = lfp(f#)((c))  lfp(f)  = lfp(f#)  C A lpf(f#)   f  fn  … lpf(f)  f2  f3 f#n  … f#3 f#2  f#  

see you next time

Spring 2016 Program Analysis and Verification

Similar presentations

Presentation on theme: "Spring 2016 Program Analysis and Verification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spring 2016 Program Analysis and Verification

Similar presentations

Presentation on theme: "Spring 2016 Program Analysis and Verification"— Presentation transcript:

Similar presentations

About project

Feedback