Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber 317 Textbook: Dataflow Analysis Chapter 2 & Appendix A Monotone Frameworks and Precision
Outline u Lattice Theory u Monotone Dataflow Frameworks u Precision of Data Flow Analysis
Lattice Theory u The Foundation of –Denotational semantics –Program analysis u Special topology theory u Generalizes powersets and integers
Partial Orders u Consider a set P u A partial order is a relation : P P {false, true} such that: – is reflexive p P: p p – is transitive p 1, p 2, p 3 P, p 1 p 2, p 2 p 3 p 1 p 3 – is anti-symmetric p 1, p 2 P : p 1 p 2, p 2 p 1 p 1 =p 2 u Partially ordered sets (Posets) (P, ) u Examples –(R, ) –(P(S), ) –(P(S), ) –(Alphanumeric-Strings, Lexicographic-order)
Upper Bounds u Consider a Poset (P, ) u An element u P is an upper bound of a subset S P if s S: s u u An element u P is a least upper bound of a subset S P if –u is an upper bound of S –For every upper bound u’ of S: u u’ u The least upper bound of every S is unique if exists (denoted by S) u For S={p 1,p 2 } p 1 p 2 = {p 1, p 2 }
Lower Bounds u Consider a Poset (P, ) u An element l P is a lower bound of a subset S P if s S: l s u An element l P is a greatest lower bound of a subset S P if –l is a lower bound of S –For every lower bound l’ of S: l’ l u The greatest lower bound of every S is unique if exists (denoted by S) u For S={p 1, p 2 } p 1 p 2 = {p 1, p 2 }
Complete Lattices u The Poset (L, ) such that every subset S L S and S are both defined is called complete lattice u Denoted by (L, ) = (L, , , , , ) – is the minimum value – = = L – is the maximum value – = L =
Lattices in Program Analysis u The Poset (L, ) describes “potential pieces of abstract information” (known when the analysis begins) ul1l2ul1l2 –l 1 is at least as precise as l 2 –l 2 describes at least the program states described by l 1 – describes an empty set of program states – describes all the program states (trivial solution) u l 1 l 2 is the effect of integrating l 1 and l 2 from different control-flow paths u l 1 l 2 is the effect of integrating l 1 and l 2 from the same control-flow path
Lemma A.2 u Given a Poset (P, ) the following claims are equivalent –(i) P is a complete lattice –(ii) for every subset S P S is defined –(iii) for every subset S P S is defined
Chains u Consider a Poset (P, ) u A chain is subset S P which is totally ordered –for every s 1, s 2 S: s 1 s 2 or s 1 s 2 u P satisfies the ascending chain condition if all the ascending chains in L is finite u P has a finite height h if all chains contains at most h+1 elements
Construction of Complete Lattices u It is possible to construct lattices from other lattices (like compound data-types) u Allows natural generalizations of static analysis algorithms u Examples: –Cartesian products –Total function space
Cartesian Products u Consider lattices –(L 1, 1, 1, 1, 1, 1 ) –(L 2, 2, 2, 2, 2, 2 ) u Define L = (L 1 L 2, ) where (l 1, l 2 ) (u 1, u 2 ) if l 1 1 u 1 and l 2 2 u 2 u L is a complete lattice – S = ( 1 {l 1 : l 2 : (l 1, l 2 ) S}, 2 {l 2 : l 1 : (l 1, l 2 ) S}) – S = ( 1 {l 1 : l 2 : (l 1, l 2 ) S}, 2 {l 2 : l 1 : (l 1, l 2 ) S}) – = ( 1, 2 ) – = ( 1, 2 ) –If L 1 has a finite height h 1 and L 2 has a finite height h 2 then...
Total Function Space u Consider –A lattice (L 1, 1, 1, 1, 1, 1 ) –A set S u Define L = (S L 1, ) where f 1 f 2 if for every s S: f 1 (s) 1 f 2 (s) u L is a complete lattice –( Y)(s) = 1 {f(s) : f Y} –( Y)(s) = 1 {f(s) : f Y} – (s) = 1 – (s)= 1 –If L 1 has a finite height h 1 and S is finite then...
Properties of Functions u Consider a function f: L 1 L 2 where (L 1, 1, 1, 1, 1, 1 ) and (L 2, 2, 2, 2, 2, 2 ) complete lattice u f is strict if f( 1 )= 2 u f is monotone (or order-preserving) if s 1, b 1 L 1 : s 1 1 b 1 f(s 1 ) 2 f(b 1 ) u f is additive (or distributive) if s 1, b 1 L 1 : f(s 1 1 b 1 ) = f(s 1 ) 2 f(b 1 )
Fixed Points u Consider a function f: L L where (L, , , , , ) is a complete lattice u Let Fix(f) be the sets of fixed points of f Fix(f) = { l | f(l) = l } –lfp(f) is the least element in Fix(f) (unique if exists) –gfp(f) is the greatest element in Fix(f) (unique if exists) u Let Pre(f) be the sets of pre fixed points of f Pre(f) = { l | f(l) l } (Red(f)) u Let Post(f) be the sets of post fixed points of f Post(f) = { l | l f(l) } (Ext(f)) u Tarski’s Theorem: if f is monotone then: –lfp(f) = Pre(f) –gfp(f) = Post(f)
Constructive Version of Tarski’s Theorem u Define the sequence: –l 0 = –l i+1 = f(l i ) u l i lfp(f) u If L has height h l h =lfp(f) u Improvements –stop when no more changes occur –Chaotic iterations
Monotone Frameworks u Generalizes Kill/Gen Problems u a complete lattice (L, , , , , ) describes the “potential pieces of information” u The initial value at entry is specified by L u The effect of every basic block at l is described by a monotone function f l :L L (transfer function) u Solve the following system of equations (forward)
Instances of Monotone Frameworks u Kill/Gen Problems – = or = –f l (entry(l)) = (entry (l) - kill(l)) gen(l) u May be uninitialized (garbage) variables u Constant propagation u Truly-live variables u Points-to analysis
May-be-garbage variables u A variable may-be-garbage at a label l if there may be a path to l in which it is either uninitialized or set using an uninitilized variable [x := 5] 1 ; if [z > 2] 2 then [y := 17] 3 ; else [skip] 4 ; [t := y + x] 5 ;
May-be-garbage variables(cont) u L = (P(Var * ), , , , , Var * ) u Initial value =Var * u Transfer functions
Constant Propagation u Determine variables with constant values u Information Lattice –Extended integer lattice (L 1, 1, 1, 1, 1, 1 ) » L 1 = Z { 1, 1 } » 1 1 z 1 1 –Define L = (S L 1, ) where S=Var * u Transfer functions A cp : AExp (L L 1 )
Chaotic Iterations for l Lab * do DF entry (l) := DF exit (l) := DF entry (init(S * )) := WL= Lab * while WL != do Select and remove an arbitrary l WL if (temp != DF exit (l)) DF exit (l) := temp for l' such that (l,l') flow(S*) do DF entry (l') := DF entry (l') DF exit (l) WL := WL {l’}
Complexity of Chaotic Iterations u Parameters: –|Lab| labels –k is the maximum outdegree of flow(S*) –A lattice of height h –c is the maximum cost of »applying f l » »L comparisons u Complexity O(|Lab| h * c * k)
Soundness of Chaotic Iterations u define abstraction : Collecting-States L u Show that for every l: – ({ [b] l (s) | s CS }) f l ( (CS)) u Conclude that the DF solution of Chaotic iterations satisfies for every l: – (CS entry (l)) DF entry (l) – (CS exit (l)) DF exit (l) u But it may be that Chaotic iterations yield DF entry (l) = and yet (CS entry (l))= u How to measure precision?
Precision of Chaotic Iterations u Optimal – (CS entry (l)) = DF entry (l) – (CS exit (l)) = DF exit (l) u Join-over-all-paths - No loss of information w.r.t. straight line code u Relatively optimal (induced) w.r.t. the abstraction u Compare at run-time u Good enough for the used optimization
The Join-Over-All-Paths (JOP) u Let paths(init(S * ), l) denote the potentially infinite set paths from init(S * ) to l (written as sequences of labels) u For a sequence of labels [l 1, l 2, …, l n ] define f [l 1, l 2, …, l n ]: L L by composing the effects of basic blocks f [l 1, l 2, …, l n ](s) = f l n (… (f l 2 (f l 1 (s)) …) u JOP l = {f[l 1, l 2, …, l]( ) [l 1, l 2, …, l] paths(init(S * ), l)}
JOP vs. Least Solution u The DF solution obtained by Chaotic iteration satisfies for every l: –JOP l DF entry (l) u If every f l is additive (distributive) for all the labels l –JOP l = DF entry (l)
Static Analysis problems beyond Monotone Frameworks u Infinite heights –integer intervals –Linear relationships between variables u Bi-directional problems u Procedures
Conclusions u Many dataflow problems can be solved via the Chaotic Iteration Algorithm u Provide a tool to understand precision