Automating Abstract Interpretation Mooly Sagiv Adapted from Thomas Reps VMCAI’2016 Invited Talk
Determine information about the possible situations that can arise at execution time, without actually running the program on specific inputs Typically: – For each point in the program, find a descriptor that represents (a superset of) the stores that could possibly arise at that point Correctness of an analysis justified via abstract interpretation [Cousot & Cousot 77] Static Analysis in a Nutshell
Automating Abstraction Interpretation Abstract interpretation – A “black art” → hard to work with 20-year quest to raise the level of automation in abstract interpretation – 3-valued logic analysis (TVLA) M. Sagiv, T. Reps, R. Wilhelm, T. Lev-Ami, A. Loginov, R. Manevichs – machine-code analysis (TSL) T. Reps, J. Lim – symbolic-abstraction algorithms T. Reps, M. Sagiv, G. Yorsh, A. Thakur Reps, T. and Thakur, A., “Automating abstract interpretation,” VMCAI, research.cs.wisc.edu/wpis/papers/vmcai16-invited.pdf Radhia Cousot Patrick Cousot
What Does It Mean to Automate Parsing? Steve Johnson source: simple-talk interviewsimple-talk interview A parsing-problem instance Parse(L,s) has two inputs – L = a context-free language – s = a string to be parsed The string changes more frequently than the language A context-free language has a context-free grammar Yacc (and later, Gnu Bison) – Input: a context-free grammar that describes the language L to be parsed – Output: a parsing function, yyparse(), for which executing yyparse() on string s computes Parse(L,s)
What Does It Mean to Automate Program Analysis? Follow a similar scheme... But first, why would you even want to invest the time doing so?
Why is Program Analysis Difficult?
Universe of States Reachable States Bad States Sidestepping Undecidability
Universe of States Reachable States Bad States Overapproximate the reachable states False positive! Sidestepping Undecidability
Why is Program Analysis Difficult?
Large/unbounded base types: int, float, string User-defined types/classes Pointers/aliasing + unbounded #’s of heap-allocated cells Procedure calls/recursion/calls through pointers/dynamic method lookup/overloading Concurrency + unbounded #’s of threads Why is Program Analysis Difficult?
Data – unbounded counters, integer variables, lists, queues Control structures – procedures, process creation Configuration parameters – unbounded number of processes, principals Real-time – discrete or continuous time Sources of Infinity
Some Successes of the Field Static Driver Verifier, a.k.a. SLAM (Microsoft) – Tool for finding possible bugs in Windows device drivers – Complicated back-out protocols in driver APIs when events cancelled or interrupted Astrée (ENS) – Established the absence of run-time errors in Airbus flight software
Example: Parity Analysis f (a,b) = (16 * b + 3) * (2 * a + 1) * + b * + 1 2a * ⋮⋮⋮⋮⋮⋱ * ⋮⋮⋮⋮⋮⋱
b 1 2a 3 16 O ? ? E E E E O O O O Example: Parity Analysis ?OE ???? O?EO E?OE ?OE ???E O?OE EEEE
Abstract values, such as O, E, and ?, represent potentially infinite collections of concrete values ?OE ???? O?EO E?OE ?OE ???E O?OE EEEE
Constant Propagation e.e e.e[i 0] i = 0 j = 0 j = (j+1)/4 i = i+1 printf(i,j) while i 2 [i ?, j ?] e.e[j 0] e.e [i 0, j ?] [i 0, j 0] e.e[i e(i) + 1] e.e[j (e(j)+1)/4] [i 0, j 0] [i 1, j 0] [i 0, j 0]
e.e e.e[i 0] i = 0 j = 0 j = (j+1)/4 i = i+1 printf(i,j) while (…) [i ?, j ?] e.e[j 0] e.e [i 0, j ?] [i ?, j 0] e.e[i e(i) + 1] e.e[j (e(j)+1)/4] [i 0, j 0] [i ?, j 0] [i ?, j 0] i {…,-2,-1,0,1,2, …} j {0} Constant Propagation
What Does It Mean to Automate Abstract Interpretation?
Abstract Interpretation [CC77] α Universe of States x [2,5] y [1,3] {(x 2, y 1), (x 5, y 3)} {(2,1), (2,2), (2,3), (3,1), (3,2), (3,3), (4,1), (4,2), (4,3), (5,1), (5,2), (5,3)} γ α Radhia Cousot Patrick Cousot
Best Transformer [CC79] τ#τ# τ α γ Universe of States safe τ # τ τ τ τ τ τ Radhia Cousot Patrick Cousot Loss of precision However, no algorithms to apply the best transformer create the best transformer γ
Why do we need algorithms for best transformers? Enables parametric semantics – X86 – Libraries Domain constructors – Reduced products Basic blocks and Loop free code – Simpler abstract domains
Challenge: Abstract Interpretation is Inherently Non-Compositional In computer science, we rely on compositionality – languages are expressed using context-free grammars – many concepts and properties defined using inductive definitions – recursive tree traversals are a basic workhorse – software organized into layers Example: (x + (–x)), evaluated in (x ↦ [5,10], y ↦ [10,20]) – [-5, 5] versus [0,0] – Suppose that you have in hand a collection of ``best” abstract- interpretation operators – Their composition may not provide the best (abstract) answer for the composition of the corresponding concrete operations x x [5,10] [-5,5] [-10,-5]
Predicate Abstraction Verify the safety of a program over infinite data using fixed set of predicates (Booleans) P 1, P 2, …, P n The meaning of each predicate is a function from the states into Booleans – P i : {0, 1} The program can be conservatively represented by a Boolean program If safety property holds in the Boolean program then it also holds in original program
A Simple Example X := 0 ; while true do { X := X + 1; assert X > 0 } P 1 = X 0P 2 = X 0 while true do { assert P1 P2 }
A Simple Example (Transition System) X=0X=1X=3X=4X=… X:=0 X:=X+1 X:=0 # X:=X+1 # X:=X+1 # P 1 = X 0P 2 = X 0 Concrete Abstract
Predicate Abstraction: Basics Initial Error Program State Space Abstraction Abstraction: Predicates on program state –Signs: x > 0 –Aliasing:&x &y States satisfying the same predicates are equivalent –Merged into single abstract state
(Predicate) Abstraction: A crash course Initial Error Program State Space Abstraction Q1: Which predicates are required to verify a property ? Q2: How to compute abstract transformers?
The Predicate Abstraction Domain Fixed set of predicates Pred The meaning of each predicate p i Pred is a closed first order formula f i The relational domain is – Join is set union
A Simple Example int x, y; x = 1; y = 2 ; while (*) do { x = x + y ; } assert x > 0; Predicates: p1 = x > 0 p2 = y 0 bool p1, p2; p1 = true ; p2 = true ; while (*) do { p1 = (p1&&p2 ? 1 : *) } assert p1 ;
Existential Abstraction Given a transition system M=(S, S 0, T) and an abstraction : S S # An abstract a transition system M=(S #, S 0 #, T # ) is an existential abstraction of M w.r.t. if – s 0 S 0. (s 0 ) = s 0 # s 0 S 0 # – (s, s’) T. (s) = s # . (s’) = s’ # (s #, s’ # ) T #
Minimal Existential Abstraction Given a transition system M=(S, S 0, T) and an abstraction : S S # An abstract a transition system M=(S #, S 0 #, T # ) is a minimal existential abstraction of M w.r.t. if – s 0 S 0. (s 0 ) = s 0 # s 0 S 0 # – (s, s’) T. (s) = s # . (s’) = s’ # (s #, s’ # ) T # But how does one compute minimal abstraction? – Employ a SAT solver
The SAT Problem Given a propositional formula (Boolean function) – = (a b) ( a b c) Determine if is valid Determine if is satisfiable –Find a satisfying assignment or report that such does not exit For n variables, there are 2 n possible truth assignments to be checked But many practical tools exist a bb cccc
SAT made some progress…
The SMT Problem (Sat Modulu Theory) Given a quantifier free first order formula over some theory equation – = 3x + y = z f(y) = z Determine if is valid Determine if is satisfiable – Find a satisfying assignment or report that such does not exit Tools exist – Z3 (Microsoft) – CVC
Representing States as Formulas [F] states satisfying F {s | s F } F FO formula over prog. vars [F 1 ] [F 2 ]F1 F2F1 F2 [F 1 ] [F 2 ]F 1 F 2 [F][F] F F [F 1 ] [F 2 ]F 1 implies F 2 i.e. F 1 F 2 unsatisfiable
A Simple Example (Again) X := 0 ; while true do { X := X + 1; assert X > 0 } P 1 = X 0P 2 = X 0 How can the SMT solver be used to compute the effect of X := X + 1 on P 1 and P 2 ?
Symbolic Operations: Three Value-Spaces Formulas Concrete Values Abstract Values T T
Symbolic Operations: Three Value-Spaces Formulas Abstract Values T T#T# Concrete Values
Symbolic Operations: Three Value-Spaces FormulasConcrete Values Abstract Values even(x) x=E 2, 4, 16, …
Symbolic Operations: Three Value-Spaces FormulasConcrete Values Abstract Values u1u1 x u x... x
Required Primitive Operations Abstraction (S) = store S (store) ( ) = { } Symbolic concretization ( ) = v 1,v 2 : node u1 ( v 1 ) node u ( v 2 ) v 1 ≠ v 2 v : node u1 ( v ) node u ( v ) ... Theorem prover returning a satisfying structure (store) S u1u1 x u x u1u1 x u
Constant-Propagation Domain (Var Z T ) , where Z T = T Examples: , [x 0, y 43, z 0], [x T, y T, z 0], [x T, y T, z T] Infinite cardinality, but finite height
Three Value-Spaces Formulas Abstract Values Concrete Values [x 0, y 0, z 0] [x 0, y 1, z 0] [x 0, y 2, z 0] (x = 0) (z = 0) [x 0, y T, z 0]
Three Value-Spaces Formulas Abstract Values Concrete Values [x 0, y 0, z 0] [x 0, y 1, z 0] [x 0, y 2, z 0] (x = 0) (z = 0)
Required Primitive Operations Abstraction (S) = store S (store) ([x 0, y 2, z 0]) = [x 0, y 2, z 0] Symbolic concretization ([x 0, y T, z 0]) = (x = 0) (z = 0) Theorem prover returning a satisfying structure (store) S [x 0, y 2, z 0] (x = 0) (z = 0)
Required Primitive Operations Abstraction (S) = store S (store) ([x 0, y 2, z 0]) = [x 0, y 2, z 0] Symbolic concretization ([x 0, y T, z 0]) = (x = 0) (z = 0) Theorem prover returning a satisfying structure (store) S [x 0, y 2, z 0] (z = 0) (x = y*z)
Constant Propagation x = y * z [x 3, y 4, z 1] [x ’ 4, y ’ 4, z ’ 1] T[x = y * z] λe.e[x e(y)*e(z)] T[x := y*z] = df (x ’ = y * z) (y ’ = y) (z ’ = z) (x ’ = y * z) (y ’ = y) (z ’ = z) [x 3, y 4, z 1, x ’ 4, y ’ 4, z ’ 1]
Constant Propagation x = y * z [x 3, y T, z 1] [x’ T, y’ T, z’ 1] T # [x = y * z] λ e. e [ x e(y) # e(z)]
Constant Propagation Startx = 3 if... z = 2 y = x y = z+1 printf(y) λe.λe. λ e. e [ x 3] λe.eλe.e λe.eλe.e λ e. e [ z 2] λ e. e [ y e(x)] λ e. e [ y e(z)+ # 1]
Constant Propagation Startx = 3 if... z = 2 y = x y = z+1 printf(y) λe.λe. λ e. e [ x 3] λe.eλe.e λe.eλe.e λ e. e [ z 2] λ e. e [ y e(x)] λ e. e [ y e(z)+ # 1] [ x T, y T, z T ] [ x 3, y T, z T ] [ x 3, y T, z 2 ] [ x 3, y 3, z 2 ] [ x 3, y 3, z T ]
Abstract Transformer T # [ x := y*z ] [x T, y T, z 0] {[x 3, y 3, z 0], [x 7, y 2, z 0]} [x T, y T, z 0] [x 0, y T, z 0] {[x 0, y 3, z 0], [x 0, y 2, z 0]} T[ x := y*z ]
Best Abstract Transformer [x T, y T, z 0] {[x 0, y 0, z 0], [x 1, y 0, z 0],... [x 0, y 1, z 0], [x 1, y 1, z 0],...} [x 0, y T, z 0] {[x 0, y 0, z 0], [x 0, y 1, z 0],...} T[ x := y*z ]
Three Value-Spaces Formulas Abstract Values Concrete Values (z = 0) [x’ 0,y’ T,z’ 0] α (x ’ = 0) (z ’ = 0) T[x := y*z] αT αT [x T,y T,z 0]
Remainder of the Lecture ( ) – best abstract value that represents Best = T – best abstract transformer
Idea Behind Procedure CP ( ) FormulasConcrete Values Abstract Values ans
Idea Behind Procedure CP ( ) FormulasConcrete Values Abstract Values S S S (S)(S) ans
Idea Behind Procedure CP ( ) FormulasConcrete Values Abstract Values S S S (S)(S) ( ans ) ( ans ) ( ans ) ans
Idea Behind Procedure CP ( ) 11 FormulasConcrete Values Abstract Values S 11 ( ans ) 1 ( ans ) ( ans ) S 1S 1 ans (S)(S)
Idea Behind Procedure CP ( ) 22 FormulasConcrete Values Abstract Values 22 S 2S 2 S (S)(S) ans 2 = 1 ( ans )
( ans ) S 2S 2 Idea Behind Procedure CP ( ) 22 FormulasConcrete Values Abstract Values 22 2 ( ans ) S (S)(S) ans ( ans )
Idea Behind Procedure CP ( ) 5 = false FormulasConcrete Values Abstract Values ans ( ans ) ( ans ) , ( ans )
Procedure (formula ) { ans := := while ( is satisfiable) { Select a store S such that S ans := ans (S) := (ans) } return ans }
Example: CP ((y = 3) (x = 4*y + 1)) Initialization: ans := := (y = 3) (x = 4*y + 1) Iteration 1: S := [x 13, y 3] // A satisfying store ans := ([x 13, y 3]) = [x 13, y 3] (ans) = (x = 13) (y = 3) := (y = 3) (x = 4*y + 1) ((x = 13) (y = 3)) = (y = 3) (x = 4*y + 1) ((x 13) (y 3)) = false Iteration 2: is unsatisfiable Return value: [x 13, y 3]
Procedure CP ( ) (z = 0) (x = y * z) FormulasConcrete Values Abstract Values S ans [x 0,y 43,z 0] [x 0, y 43, z 0]
Procedure CP ( ) FormulasConcrete Values Abstract Values (x = 0) (y = 43) (z = 0) ( ans ) ( ans ) ans S (z = 0) (x = y * z) [x 0,y 43,z 0] [x 0, y 43, z 0]
Example: CP ((z = 0) (x = y * z)) Initialization: ans := := (z = 0) (x = y * z) Iteration 1: S := [x 0, y 43, z 0] // A satisfying store ans := ([x 0, y 43, z 0]) = [x 0, y 43, z 0] (ans) = (x = 0) (y = 43) (z = 0) := (z = 0) (x = y*z) ((x=0) (y=43) (z=0)) = (z = 0) (x = y*z) (y 43)
Procedure CP ( ) (z = 0) (x = y * z) (y 43) FormulasConcrete Values Abstract Values S [x 0,y 24,z 0] [x 0, y 43, z 0] [x 0, y 24, z 0]
Example: CP ((z = 0) (x = y * z))... = (z = 0) (x = y * z) (y 43) Iteration 2: S := [x 0,y 46,z 0] // A satisfying store ans := [x 0,y 43,z 0] ([x 0,y 46,z 0]) = [x 0, y 43, z 0] [x 0,y 46,z 0] = [x 0, y T, z 0] (ans) = (x = 0) (z = 0) := (z=0) (x=y*z) (y 43) ((x=0) (z=0)) = false Iteration 3: is unsatisfiable Return value: [x 0, y T, z 0]
Procedure CP ( ) (z = 0) (x = y * z) (y 43) FormulasConcrete Values Abstract Values S [x 0, y T, z 0] ans (x = 0) (z = 0) (x = 0) (z = 0)
Example: CP (y = x + (-x)) = y = x + (-x) Iteration 1: S := [x 43,y 0] // A satisfying store ans := ([x 43,y 0]) = [x 43 y 0] (ans) = (x = 43) (y = 0) := (y = x + (-x)) (x 43 y 0)
FormulasConcrete Values Abstract Values S [x 43,y 0] [x 43 y 0]
Example: CP (y = x + (-x)) (Cont) := (y = x + (-x)) (x 43 y 0) Iteration 2: S = [x 43,y 0] ans := [x 43, y 0] ([x 46,y 0]) = [x T, y 0] (ans) = (y = 0) := (y = x + (-x)) (x 43 y 0) (y 0) Iteration 3: is unsatisfiable Return value: [x T, y 0]
FormulasConcrete Values Abstract Values S [x 43,y 0] [x 43, y 0] (x = 43) (y = 0)
FormulasConcrete Values Abstract Values [x 43, y 0]
FormulasConcrete Values Abstract Values S [x 46,y 0] [x 43, y 0] [x 46, y 0]
FormulasConcrete Values Abstract Values S [x T, y 0] (y = 0) [x 46,y 0]
FormulasConcrete Values Abstract Values [x T, y 0] unsat
(a) T The Idea Behind Best = T FormulasT Abstract Values (a) (a) a
(a) T The Idea Behind Best = T FormulasT Abstract Values (a) (a) a
(a) T The Idea Behind Best = T FormulasT Abstract Values (a) (a) a ans
(a) T The Idea Behind Best = T FormulasT Abstract Values (a) (a) a ans
Procedure Best Best(two-store-formula T, abs-store a) { ans’ := ’ := (a) T while ( is satisfiable) { Select a store pair (S,S ’) such that (S,S ’) ans’ := ans’ ’(S ’) := ’(ans’) } return ans’ }
Best( (x ’ = y * z) (y ’ = y) (z ’ = z), [x T, y T, z 0] ) Initialization: ans ’ := ’ := (z = 0) (x ’ = y * z) (y ’ = y) (z ’ = z) Iteration 1: (S,S ’ ) := [x 5, y 17, z 0, x ’ 0, y ’ 17, z ’ 0]
(a) T The Idea Behind Best = T Formulas T Abstract Values (a) (a) a [x 5, y 17, z 0] [ x ’ 0, y ’ 17, z ’ 0]
Best( (x ’ = y * z) (y ’ = y) (z ’ = z), [x T, y T, z 0] ) Initialization: ans ’ := ’ := (z = 0) (x ’ = y * z) (y ’ = y) (z ’ = z) Iteration 1: (S,S ’ ) := [x 5, y 17, z 0, x ’ 0, y ’ 17, z ’ 0] ans ’ := [x’ 0, y’ 17, z’ 0] ’ (ans ’ ) = (x ’ = 0) (y ’ = 17) (z ’ = 0) := (z = 0) (x ’ = y*z) (y ’ = y) (z ’ = z) (y ’ 17)
Best( (x ’ = y * z) (y ’ = y) (z ’ = z), [x T, y T, z 0] ) Iteration 2: (S,S ’ ) := [x 12, y 99, z 0, x ’ 0, y ’ 99, z ’ 0] ans ’ := [x’ 0, y’ 17, z’ 0] [x’ 0, y’ 99, z’ 0] = [x’ 0, y’ T, z’ 0] ’ (ans ’ ) = (x ’ = 0) (z ’ = 0) := (z = 0) (x ’ = y * z) (y ’ = y) (z ’ = z) (y ’ 17) ( (x ’ 0) (z ’ 0)) = false Iteration 3: is unsatisfiable Return value: [x’ 0, y’ T, z’ 0]
Best( y = x next, ) u1u1 x u r[x]r[x] r[x]r[x] u4u4 x r[x]r[x] r[x]r[x] r[x]r[x]r[x]r[x] u1u1 u2u2 u3u3 x’x’ r[x]’,r[y]’r[x]’,r[y]’ r[x]’,r[y]’r[x]’,r[y]’ r[x]’,r[y]’r[x]’,r[y]’ r[x]’r[x]’ y’y’ u2u2 x u r[x],r[y] u1u1 r[x]r[x] y... (y ’ (v) v 1 : x(v 1 ) n(v 1,v)) ...
Predicate Abstraction y := 3 x := 4*y + 1 B 1 B 2 B 3 B 4 B 5 B 6 [x 13, y 3] { B 1 (y = 1), B 2 (y = 3), B 3 (y = 4), B 4 (x = 1), B 5 (x = 3), B 6 (x = 4) } y = 3 x {1, 3, 4} [x 13, y 3]
Three Value-Spaces Formulas Abstract Values Concrete Values (y ≠ 1) (y = 3) (y ≠ 4) (x ≠ 1) (x ≠ 3) (x ≠ 4) ( B1, B2, B3, B4, B5, B6) [x 5, y 3] [x 0, y 3] [x 17, y 3]
Three Value-Spaces Formulas Abstract Values Concrete Values (y ≠ 1) (y = 3) (y ≠ 4) (x ≠ 1) (x ≠ 3) (x ≠ 4) ( B1, B2, B3, B4, B5, B6) ( B1, B2, B3, B6) α (y ≠ 1) (y = 3) (y ≠ 4) (x ≠ 4) T[x := x+1] αTαT
Predicate Abstraction Abstract values ( B1, B2, B3, B4, B5, B6) Apply , which performs symbolically (y ≠ 1) (y = 3) (y ≠ 4) (x ≠ 1) (x ≠ 3) (x ≠ 4) Apply T, which implements α T
α PA : Most-Precise Abstract Value [Predicate Abstraction] Formulas Abstract Values Concrete Values (y = 3) (x = 4*y + 1) ( B1, B2, B3, B4, B5, B6) α PA
α PA : Most-Precise Abstract Value [Predicate Abstraction] PA ( ) = false j = 1 k B j if j is valid B j if j is valid true otherwise if is unsatisfiable otherwise PA ((y = 3) (x = 4*y + 1)) = B 1, B 2, B 3, B 4, B 5, B 6 (y = 3) (x = 4*y + 1) (y = 1) (y = 3) (x = 4*y + 1) (y = 3) (y = 3) (x = 4*y + 1) (y = 4)
α PA : Most-Precise Abstract Value [Predicate Abstraction] PA ( ) = false j = 1 k B j if j is valid B j if j is valid true otherwise if is unsatisfiable otherwise (y = 3) (x = 4*y + 1) (x = 1) (y = 3) (x = 4*y + 1) (x = 3) (y = 3) (x = 4*y + 1) (x = 4) PA ((y = 3) (x = 4*y + 1)) = B 1, B 2, B 3, B 4, B 5, B 6
Procedure PA vs. General Concrete Values Formulas Abstract Values PA ii Formulas Concrete Values Abstract Values ii S iS i S ans i = ans i-1 ( S ) ans i-1 ( ans i-1 )
Open Questions Infinite height domains Different algorithms for best transformers –Can we go down from –No counter examples –Use symbolic counter examples Can we operate on formulas directly? Lower bounds on the problem of computing the best transformer
Bibliography Susanne Graf, Hassen Saïdi: Construction of Abstract State Graphs with PVS. CAV 1997: Thomas W. Reps, Shmuel Sagiv, Greta Yorsh: Symbolic Implementation of the Best Transformer. VMCAI 2004: Aditya V. Thakur, Thomas W. Reps: A Generalization of Stålmarck's Method. SAS 2012: Aditya V. Thakur, Matt Elder, Thomas W. Reps: Bilateral Algorithms for Symbolic Abstraction. SAS 2012: Aditya V. Thakur, Thomas W. Reps: A Generalization of Stålmarck's Method. SAS 2012: Vijay D'Silva, Leopold Haller, Daniel Kroening: Abstract satisfaction. POPL 2014:
Summary Requirements –Finite-height abstract domain –Theorem prover that returns a satisfying structure (store) – (S) = s S (S) –Symbolic-concretization operation ( ) ( ) – best abstract value that represents Best(T,a) – best abstract transformer