Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs, Data-flow Analysis Data-flow Frameworks --- today’s class –Specific Analyses, Applications, etc. Software Testing Dynamic Program Analysis
Announcement Handout Homework 1, due February 17 th
Outline The four classical data-flow problems, continue –Solving data-flow problems Data-flow frameworks Reading: Compilers: Principles, Techniques and Tools, by Aho, Lam, Sethi and Ullman, Chapter 9.2 and 9.3
Dataflow Problems May ProblemsMust Problems Forward Problems Reaching Definitions Available Expressions Backward Problems Live Uses of Variables Very Busy Expressions
Similarities There is a finite set U of dataflow facts: –Reaching Definitions: the set of all definitions in program –Live Uses of Variables: the set of all variables –Available Expressions and Very Busy Expressions: the set of all expressions in program The solution at a program point i (i.e., in(i), out(i)) is a subset of U (e.g., for each definition it either reaches program point i or does not).
Similarities Dataflow equations are of the form: out(i) = (in(i)-kill(i)) gen(i) Dataflow equations are transfer functions: –Transfer function F i takes in(i) and computes the out(i): out(i) = F i (in(i))
The Worklist Algorithm /* initially all in RD sets are empty */ for m := 2 to n do in RD (m) := Ø; in RD (1) = UNDEF W := {1,2,…,n} /* put every node on the worklist */ while W ≠ Ø do { remove k from W; new = { in RD (m) pres(m) gen(m) }; if new ≠ in RD (k) then { in RD (k) = new; for j succ(k) do add j to W } out(m) or F m (in(m)
Dataflow Frameworks Lattices –Partial ordering –Meet, Join, Lattice, and Chain Monotone functions The “Maximal Fixed Point” (MFP) solution The “Meet Over all Paths” (MOP) solution
Lattice Theory Partial ordering (denoted by ≤ or ) –Relation between pairs of elements –Reflexive x ≤ x –Anti-symmetric x ≤ y, y ≤ x implies x=y –Transitive x ≤ y, y ≤ z implies x ≤ z Poset (set S, ≤) 0 Element 0 ≤ x, for every x in S 1 Element x ≤ 1, for every x in S We don’t necessarily need 0 and 1 element.
Poset Example {} {a}{b}{c} {a,b}{b,c}{a,c} {a,b,c} U = {a,b,c} The poset is 2 U, ≤ is set inclusion
Lattice Theory Greatest lower bound (glb) l1,l2 in poset S, a in poset S is the glb(l1,l2) If a ≤ l1 and a ≤ l2 then for any b in S, b ≤ l1, b ≤ l2 implies b ≤ a If glb exists, it is unique. Why? It is called the meet (denoted by Λ or┌┐) of l1 and l2. Least upper bound (lub) l1, l2 in poset S, c in poset S is the lub(l1,l2) If c ≥ l1 and c ≥ l2 then for any d in S, d ≥ l1, d ≥ l2 implies d ≥ c If lub exists, it is unique. It is called the join (denoted by V or└┘) of l1 and l2.
Definition of a Lattice (L, Λ, V) L, a poset under ≤ such that every pair of elements has a glb (meet) and lub (join) A lattice need not contain a 0 or 1 element A finite lattice must contain 0 and 1 elements Not every poset is a lattice If a ≤ x for every x in L, then a is the 0 element of L If x ≤ a for every x in L, then a is the 1 element of L
A poset but not a lattice There is no lub(3,4) in this poset so it is not a lattice. Even if we put a lub(3,4), is it going to be a lattice? 5
Examples of Lattices H = (2 U, ∩, U ) where U is a finite set –glb(s1,s2) is (s1Λs2) which is s1∩s2 –lub(s1,s2) is (s1Vs2) which is s1 U s2 J = (N 1, gcd, lcm) –Partial order is integer divide on N 1 –lub(n1,n2) is (n1Vn2) which is lcm(n1,n2) –glb(n1,n2) is (n1Λn2) which is gcd(n1,n2)
Chain A poset C where for every pair of elements c1, c2 in C, either c1 ≤ c2 or c2 ≤ c1. –E.g., {} ≤ {a} ≤ {a,b} ≤ {a,b,c} And from the lattice J as shown here, 1 ≤ 2 ≤ 6 ≤ 30 1 ≤ 3 ≤ 15 ≤ Lattices are used in dataflow analysis to reason about the solution obtainable through fixed-point iteration.
Dataflow Lattices: Reaching Definitions {} {(x,1)}{(x,4)}{(a,3)} {(x,1),(x,4)}{(x,4),(a,3)}{(x,1),(a,3)} {(x,1),(x,4),(a,3)} U = all definitions:{(x,1),(x,4),(a,3)} The poset is 2 U, ≤ is the subset relation 1. x:=a*b 2. if y<=a*b 3. a:=a+1 4. x:=a*b 5. goto 3 0 1
Dataflow Lattices: Available Expressions {(a*b),(a+1),(y*z)} {(a*b),(y*z)} {(a*b),(a+1)} {(a+1),(y*z)} {(a*b)}{(y*z)} {} U = all expressions: {(a*b),(a+1),(y*z)} The poset is 2 U, ≤ is the superset relation 1. x:=a*b 2. if y*z<=a*b 3. a:=a+1 4. x:=a*b 5. goto 2 {(a+1)} 1 0
Monotone Dataflow Frameworks Framework parameters in(i)= V out(j) out(i)=F i (in(i)) where: –in(i), out(i) are elements of a property space –combination operator V is U for the may problems and ∩ for the must problems –F i is the transfer function associated with node i –2 other parameters: the set of initial/final CFG nodes, and the initial analysis information at them! j in pred(i )
Monotone Frameworks (cont.) The property space: 1.A complete lattice (L, ≤ ) 2.L satisfies the Ascending Chain Condition (i.e., all ascending chains are finite) The combination operator: V, and lub(Y) = V Y Reaching Definitions: L = P (Var i), and ≤ is set inclusion. Thus, V is U. The lattice has finite height, therefore it satisfies the ACC. Available Expressions: What is (L, ≤)? What is V? ACC? Live Uses: What is (L, ≤)? What is V? ACC?
Monotone Frameworks (cont.) The transfer functions: F i : L L. Formally, there is space F such that 1.F contains all F i, 2.F contains the identity function id(x) = x 3.F is closed under composition. 4.Each F i is monotone.
Monotonicity It is defined as (1) a ≤ b f(a) ≤ f(b) An equivalent definitions is (2) f(x) V f(y)≤ f(x V y) Lemma: The two definitions are equivalent. First, we show that (1) implies (2). Second, we show that (2) implies (1).
Distributivity A distributive framework: A monotone framework with distributive transfer functions: f(x V y) = f(x) V f(y).