1 Abstract interpretation Giorgio Levi Dipartimento di Informatica, Università di Pisa
2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter assigning meanings to programs on a suitable concrete domain (concrete computations domain) §an abstract domain modeling some properties of concrete computations and forgetting about the remaining information (abstract computations domain) §we derive an abstract semantics, which allows us to “execute” the program on the abstract domain to compute its abstract meaning, i.e., the modeled property
3 Concrete and Abstract Domains §two complete partial orders l the partial orders reflect precision smaller is better C C concrete domain C), {}, C, , ) l has the structure of a powerset we will see later why A abstract domain (A, bottom, top, lub, glb ) l each abstract value is a description of “a set of” concrete values
4 Concretization C C concrete domain C), {}, C, , ) (A, ) abstract domain (A, bottom, top, lub, glb ) §the meaning of abstract values is defined by a concretization function A(C): A (C)A(C): A (C) A, a A, (a) is the set of concrete computations described by a l that’s why the concrete domain needs to be a powerset §the concretization function must be monotonic A, a 1,a 2 A, a 1 a 2 implies (a 1 ) (a 2 ) l concretization preserves relative precision
5 Abstraction C C concrete domain C), {}, C, , ) (A, ) abstract domain (A, bottom, top, lub, glb ) (C) A every element of (C) should have a unique “best” (most precise) description in A A l this is possible if and only if A is a Moore family closed under glb l in such a case, we can define an abstraction function (C) A : (C) A (C), c (C), (c) is the best abstract description of c l the abstraction function must be monotonic (C) c 1,c 2 (C), c 1 c 2 implies (c 1 ) (c 2 ) l abstraction preserves relative precision
6 Galois connection Galois connection (insertion) (C) x (C) x x A A y A y y ( y A y y) mutually determine each other C C C), {}, C, , ) (A, ) (A, bottom, top, lub, glb ) A(C) : A (C) (concretization) (C) A : (C) A (abstraction) monotonic (C) A there may be loss of information (approximation) in describing an element of (C) by an element of A
7 Concrete semantics F C §the concrete semantics is defined as the least or (greatest) fixpont of a concrete semantic evaluation function F defined on the domain C l this does not necessarily mean that the semantic definition style is denotational! FC § F is defined in terms of primitive semantic operations f i on C F §the abstract semantic evaluation function is obtained by replacing in F each concrete operation f i by a suitable abstract operation (C) F (C) however, since the actual concrete domain is (C), we need first to lift the concrete semantics lfp F to a collecting semantics defined on (C)
8 Collecting semantics F §lifting lfp F to the powerset (to get the collecting semantics) is simply a conceptual operation F l collecting semantics = {lfp F} F c (C) we don’t need to define a brand new collecting semantic evaluation function F c on (C) F l we just need to reason in terms of liftings of all the primitive operations (and of F), while designing the abstract operations and establishing their properties §in the following, by abuse of notation, we will use the same notation for the standard and the collecting (“conceptually” lifted) operations
9 Abstract operations: local correctness A an abstract operator f i defined on A is locally correct wrt a concrete operator f i if (C) x 1,..,x n (C) f i x 1,..,x n ) f i x 1 ,.., x n l the concrete computation step is more precise than the concretization of the “corresponding” abstract computation step l a very weak requirement, which is satisfied, for example, by an abstract operator which always computes the worst abstract value top l the real issue in the design of abstract operations is therefore precision
10 Abstract operations: optimality and completeness §correctness (C) x 1,..,x n (C) f i x 1,..,x n ) f i x 1 ,.., x n §optimality A y 1,..,y n A. f i y 1,..,y n ) f i y 1 ,.., y n the most precise abstract operator f i correct wrt f i l a theoretical bound and basis for the design, rather then an implementable definition §completeness (exactness or absolute precision) (C) x 1,..,x n (C) f i x 1,..,x n )) f i x 1 ,.., x n no loss of information, the abstraction of the concrete computation step is exactly the same as the result of the corresponding abstract computation step
11 From local to global correctness §the composition of locally correct abstract operations is locally correct wrt the composition of concrete operations l composition does not preserve optimality, i.e., the composition of optimal operators may be less precise than the optimal abstract version of the composition F F if we obtain F (abstract semantic evaluation function) by replacing in F every concrete semantic operation by a corresponding (locally correct) abstract operation, the local correctness property still holds (C) FF x (C) F x) F x))) §local correctness implies global correctness, i.e., correctness of the abstract semantics wrt the concrete one FF FF lfp F lfp F gfp F gfp F FF FF (lfp F ) lfp F (gfp F ) gfp F §the abstraction of the concrete semantics is more precise than the abstract semantics
12 FF F ( lfp F ) lfp F : why computing lfp F ? F lfp F cannot be computed in finitely many steps steps are in general required F lfp F can be computed in finitely many steps, if the abstract domain is finite or at least noetherian l does not contain infinite increasing chains l interesting for static program analysis, where the fixpoint computation must terminate l most program properties considered in static analysis are undecidable l we accept a loss of precision (safe approximation) in order to make the analysis feasible
13 Applications §comparative semantics l a technique to reason about semantics at different level of abstraction non-noetherian abstract domain FF abstraction without approximation (completeness) ( lfp F) lfp F §static analysis = effective computation of the abstract semantics l if the abstract domain is noetherian and the abstract operations are computationally feasible l if the abstract domain is non-noetherian or if the fixpoint computation is too complex use widening operators F –which effectively compute an (upper) approximation of lfp F »one example later
14 The abstract interpretation framework C C C), {}, C, , ) (concrete domain ) A (A, bottom, top, lub, glb ) (abstract domain ) A(C) : A (C) monotonic (concretization function) (C) A : (C) A monotonic (abstraction function) (C) x (C) x x A y A y y (Galois connection) (C) f i f i | x 1,..,x n (C) f i x 1,..,x n ) f i x 1 ,.., x n (local correctness) §critical choices l the abstract domain to model the property l the (possibly optimal) correct abstract operations
15 Other approaches and extensions §there exist weaker versions of abstract interpretation l without Galois connections (e.g., concretization function only) l based on approximation operators (widening, narrowing) l without explicit abstract domain (closure operators) §the theory provides also several results on abstract domain design l how to combine domains l how to improve the precision of a domain l how to transform an abstract domain into a complete one l …... l we will look at some of these results in the last lecture
16 A simple abstract interpreter computing Signs §concrete semantics executable specification (in ML) of the denotational semantics of untyped -calculus without recursion §abstract semantics l abstract interpreter computing on the domain Sign
17 The language: syntax §type ide = Id of string §type exp = | Eint of int | Var of ide | Times of exp * exp | Ifthenelse of exp * exp * exp | Fun of ide * exp | Appl of exp * exp
18 A program Fun(Id "x", Ifthenelse(Var (Id "x"), Times (Var (Id "x"), Var (Id "x")), Times (Var (Id "x"), Eint (-1)))) §the ML expression function x -> if x=0 then x * x else x * (-1)
19 Concrete semantics §denotational interpreter §eager semantics §separation from the main semantic evaluation function of the primitive operations l which will then be replaced by their abstract versions §abstraction of concrete values l identity function in the concrete semantics §symbolic “non-deterministic” semantics of the conditional
20 Semantic domains § type eval = | Funval of (eval -> eval) | Int of int | Wrong let alfa x = x § type env = ide -> eval let emptyenv (x: ide) = alfa(Wrong) let applyenv ((x: env), (y: ide)) = x y let bind ((r:env), (l:ide), (e:eval)) (lu:ide) = if lu = l then e else r(lu)
21 Semantic evaluation function § let rec sem (e:exp) (r:env) = match e with | Eint(n) -> alfa(Int(n)) | Var(i) -> applyenv(r,i) | Times(a,b) -> times ( (sem a r), (sem b r)) | Ifthenelse(a,b,c) -> let a1 = sem a r in (if valid(a1) then sem b r else (if unsatisfiable(a1) then sem c r else merge(a1,sem b r,sem c r))) | Fun(ii,aa) -> makefun(ii,aa,r) | Appl(a,b) -> applyfun(sem a r, sem b r)
22 Primitive operations let times (x,y) = match (x,y) with |(Int nx, Int ny) -> Int (nx * ny) | _ -> alfa(Wrong) let valid x = match x with |Int n -> n=0 let unsatisfiable x = match x with |Int n -> if n=0 then false else true let merge (a,b,c) = match a with |Int n -> if b=c then b else alfa(Wrong) | _ -> alfa(Wrong) let applyfun ((x:eval),(y:eval)) = match x with |Funval f -> f y | _ -> alfa(Wrong) let rec makefun(ii,aa,r) = Funval(function d -> if d = alfa(Wrong) then alfa(Wrong) else sem aa (bind(r,ii,d)))
23 From the concrete to the collecting semantics §the concrete semantic evaluation function l sem: exp -> env -> eval §the collecting semantic evaluation function semc: exp -> env -> (eval) l semc e r = {sem e r} all the concrete primitive operations have to be lifted to (eval) in the design of the abstract operations
24 Example of concrete evaluation # let esempio = sem( Fun (Id "x", Ifthenelse (Var (Id "x"), Times (Var (Id "x"), Var (Id "x")), Times (Var (Id "x"), Eint (-1)))) ) emptyenv;; val esempio : eval = Funval # applyfun(esempio,Int 0);; - : eval = Int 0 # applyfun(esempio,Int 1);; - : eval = Int -1 # applyfun(esempio,Int(-1));; - : eval = Int 1 §in the “virtual” collecting version applyfunc(esempio,{Int 0,Int 1}) = {Int 0, Int -1} applyfunc(esempio,{Int 0,Int -1}) = {Int 0, Int 1} applyfunc(esempio,{Int -1,Int 1}) = {Int 1, Int -1}
25 From the collecting to the abstract semantics concrete domain: ( (ceval), ) §concrete (non-collecting) environment: l cenv = ide -> ceval abstract domain: (eval, ) §abstract environment: env = ide -> eval §the collecting semantic evaluation function semc: exp -> env -> (ceval) §the abstract semantic evaluation function l sem: exp -> env -> eval
26 The Sign Abstract Domain ((Z), ) concrete domain ( (Z), ) sets of integers (Sign, ) abstract domain (Sign, )
27 Sign Redefining eval for Sign type ceval = Funval of (ceval -> ceval) | Int of int | Wrong type eval = Afunval of (eval -> eval) | Top | Bottom | Zero | Zerop | Zerom | P | M let alfa x = match x with Wrong -> Top | Int n -> if n = 0 then Zero else if n > 0 then P else M the partial order relation l the relation shown in the Sign lattice, extended with its lifting to functions there exist no infinite increasing chains we might add a recursive function construct and find a way to compute the abstract least fixpoint in a finite number of steps §lub and glb of eval are the obvious ones concrete domain: ceval), {}, ceval, , ) abstract domain: (eval, , Bottom, Top, lub, glb)
28 Concretization function concrete domain: ceval), {}, ceval, , ) abstract domain: (eval, , Bottom, Top, lub, glb) s ( x ) = {}, if x = Bottom {Int(y) |y>0}, if x = P {Int(y) |y 0}, if x = Zerop {Int(0)}, if x = Zero {Int(y)|y 0}, if x = Zerom {Int(y)|y<0}, if x = M ceval, if x = Top {Funval(g) | y eval x s (y , g(x) s (f(y))}, if x = Afunval(f)
29 Abstraction function concrete domain: ceval), {}, ceval, , ) abstract domain: (eval, , Bottom, Top, lub, glb) s ( y ) = glb{ Bottom, if y = {} M, if y {Int(z)| z<0} Zerom, if y {Int(z)| z 0} Zero, if y {Int(0)} Zerop, if y {Int(z)| z 0} P, if y {Int(z)| z>0} Top, if y ceval lub{Afunval(f)| Funval(g) s (Afunval(f))}, if y {Funval(g)} & Funval(g) y} }
30 Galois connection s and s l are monotonic l define a Galois connection
31 Times Sign §optimal (hence correct) and complete (no approximation)
32 Abstract operations l in addition to times and lub let valid x = match x with | Zero -> true | _ -> false let unsatisfiable x = match x with | M -> true | P -> true | _ -> false let merge (a,b,c) = match a with | Afunval(_) -> Top | _ -> lub(b,c) let applyfun ((x:eval),(y:eval)) = match x with |Afunval f -> f y | _ -> alfa(Wrong) let rec makefun(ii,aa,r) = Afunval(function d -> if d = alfa(Wrong) then d else sem aa (bind(r,ii,d))) l sem is left unchanged
33 An example of abstract evaluation # let esempio = sem( Fun (Id "x", Ifthenelse (Var (Id "x"), Times (Var (Id "x"), Var (Id "x")), Times (Var (Id "x"), Eint (-1)))) ) emptyenv;; val esempio : eval = Afunval # applyfun(esempio,P);; - : eval = M # applyfun(esempio,Zero);; - : eval = Zero # applyfun(esempio,M);; - : eval = P # applyfun(esempio,Zerop);; - : eval = Top # applyfun(esempio,Zerom);; - : eval = Zerop # applyfun(esempio,Top);; - : eval = Top applyfunc(esempio,{Int 0,Int 1}) = {Int 0, Int -1} applyfunc(esempio,{Int 0,Int -1}) = {Int 0, Int 1} applyfunc(esempio,{Int -1,Int 1}) = {Int 1, Int -1} §wrt the abstraction of the concrete (collecting) semantics, approximation for Zerop §no abstract operations which “invent” the values Zerop and Zerom l which are the only ones on which the conditional takes both ways and can introduce approximation
34 Recursion l the language has no recursion fixpoint computations are not needed l if (sets of) functions on the concrete domain are abstracted to functions on the abstract domain, we must be careful in the case of recursive definitions a naïve solution might cause the application of a recursive abstract function to diverge, even if the domain is finite we might never get rid of recursion because the guard in the conditional is not valid or satisfiable we cannot explicitely compute the fixpoint, because equivalence on functions cannot be expressed termination can only be obtained by a loop checking mechanism (finitely many different recursive calls) l we will see a different solution in a case where (sets of) functions are abstracted to non functional values the explicit fixpoint computation will then be possible