Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses and applications Software Testing Dynamic Program Analysis
Outline The four classical data-flow problems –Reaching definitions –Live variables –Available expressions –Very busy expressions Data-flow frameworks Reading: Compilers: Principles, Techniques and Tools, by Aho, Lam, Sethi and Ullman, Chapter 9.2
Four Classical Data-flow Problems Reaching definitions (Reach) Live uses of variables (Live) Available expressions (Avail) Very busy expressions (VeryB) Def-use chains built from Reach, and the dual Use-def chains, built from Live, play role in many optimizations Avail enables global common subexpression elimination VeryB is used for conservative code motion
Classical Data-flow Problems How to formulate the analysis using data-flow equations defined on the control flow graph? Forward and backward data-flow problems May and must data-flow problems out(i) = gen(i) (in(i) – kill(i)) in(i) = gen(i) (out(i) – kill(i)) Forward: Backward:
Problem 1: Reaching Definitions For each CFG node n, compute the set of definitions that reach n. i in RD (i) = { out RD (j) | j is predecessor of i } j: a=b+c out RD (i)= gen(i) (in RD (i)– kill(i)) kill(j): all definitions of a gen(j): this definition of a, (a,j)
Example 1. x:=read() 2. y:=1 3. if x<2 then 4. y:=x*y 5. x:=x-1 6. goto 3 7. … in RD (1) = Ø in RD (2) = out RD (1) in RD (3) = out RD (2) out RD (6) in RD (4) = out RD (3) in RD (5) = out RD (4) in RD (6) = out RD (5) in RD (7) = out RD (3) out RD (1) = (in RD (1)-D x ) {(x,1)} out RD (2) = (in RD (2)-D y ) {(y,2)} out RD (3) = in RD (3) out RD (4) = (in RD (4)-D y ) {(y,4)} out RD (5) = (in RD (5)-D x ) {(x,5)} out RD (6) = in RD (6)
Example 1. x:=read() 2. y:=1 3. if x<2 then 4. y:=x*y 5. x:=x-1 6. goto 3 7. … in RD (1) = Ø in RD (2) = {(x,1)} in RD (3) = {(x,1),(x,5),(y,2),(y,4)} in RD (4) = {(x,1),(x,5),(y,2),(y,4)} in RD (6) = {(x,5),(y,4)} in RD (7) = {(x,1),(x,5),(y,2),(y,4)} out RD (1) = {(x,1)} out RD (2) = {(x,1), (y,2)} out RD (3) = {(x,1),(x,5),(y,2),(y,4)} out RD (4) = {(x,1),(x,5),(y,4)} in RD (5) = {(x,1),(x,5),(y,4)} out RD (5) = {(x,5),(y,4)} in RD (6) = {(x,5),(y,4)}
Reaching Definitions m1 m2 m3 j in RD (m1) Forward, may dataflow problem in RD (j) in RD (m3)in RD (m2)
Equivalent Equations where: pres(m) is the set of definitions preserved through node m gen(m) is the set of definitions generated at node m pred(j) is the set of immediate predecessors of node j
Problem 2: Live Uses of Variables For each node n, compute the set of variables live on exit from n. i: out LV (i) = { in LV (j) | j is a successor of i } in LV (i)= gen(i) (out LV (i) – kill(i)) 1.x:=2; 2. y:=4; 3. x:=1; (if (y>x) then 5. z:=y; else 6. z:=y*y); 7. x:=z; What variables are live on exit from statement 1? Statement 3? x = y+z Q: What is gen(i)? Q: What is kill(i)?
Example 1. x:=2 2. y:=4 3. x:=1 4. if (y>x) 5. z:=y6. z:=y*y 7. x := z
Live Uses of Variables m1 m2 m3 j out LV (j) Backward, may dataflow problem out LV (m1) out LV (m2) out LV (m3)
Equivalent equations where: pres(m) is the set of uses preserved through node m (roughly, correspond to variables whose defs are preserved) gen(m) is the set of uses generated at node m succ(j) is the set of immediate successors of node j
Problem 3: Available Expressions An expression X op Y is available at node n if every path from entry to n evaluates X op Y, and after every evaluation prior to reaching n, there are NO subsequent assignments to X or Y X op Y X = … Y = … n ρ
Global Common Subexpressions z=a*b r=2*z q=a*b u=a*b z=u/2 w=a*b
Global Common Subexpressions t1=a*b z=t1 r=2*z t1=a*b q=t1 u=t1 z=u/2 w=a*b Can we eliminate w=a*b?
Available Expressions m1 m2 m3 j Forward, must dataflow problem in AE (j) = ? out AE (j) = ? gen(j) = ? kill(j) = ? x=y+z in AE (m1) in AE (m2) in AE (m3) in AE (j)
Example 1.x = a + b 2.y = a * b 3.if y <= a + b then goto 7 4.a = a x = a + b 6.goto 3 7.…
Problem 4: Very Busy Expressions An expression X op Y is very busy at node n, if along EVERY path from n to the end of the program, we come to a computation of X op Y BEFORE any redefinition of X or Y. X = … Y = … t1=X op Y n
Very Busy Expressions m1 m2 m3 j out VB (j) out VB (m1) out VB (m2) out VB (m3)
Very Busy Expressions where: pres(m) is the set of expressions preserved through node m gen(m) is the set of expressions generated at node m succ(j) is the set of immediate successors of node j
Dataflow Problems May ProblemsMust Problems Forward Problems Reaching Definitions Available Expressions Backward Problems Live Uses of Variables Very Busy Expressions
Similarities There is a finite set, U, of data-flow facts: –Reaching Definitions: the set of all definitions: e.g., { (x,1),(y,2),(x,4),(y,5) } –Available Expressions and Very Busy Expressions: the set of all arithmetic expressions e.g., { a+b,a*b,a+1 } –Live Uses: the set of all variables e.g., { x,y,z } The solution at a node is a subset of U (e.g., every definition either reaches node i or does not).
Similarities Equations (i.e., transfer functions) always have the form: out(i) = F i (in(i)) = (in(i) – kill(i)) gen(i) = (in(i) pres(i)) gen(i) A note: what makes the 4 classical problems special is that sets pres(i) and gen(i) are constants, i.e., they do not depend on in(i) Set union and set intersection can be implemented as logical OR and AND respectively
The worklist algorithm for data-flow Analysis: Reaching Definitions change = true; Initialize in RD (m) = Ø for m=2…n in RD (1) = UNDEF while (change) do { change = false; while ( j s.t. in RD (j) ≠ ((in RD (m) pres(m)) gen(m) ) { in RD (j) = ((in RD (m) pres(m)) gen(m) change = true; }
A Better Algorithm /* initially all in RD sets are empty */ for m := 2 to n do in RD (m) := Ø; in RD (1) = UNDEF W := {1,2,…,n} /* put every node on the worklist */ while W ≠ Ø do { remove j from W; new = {in RD (m) pres(m) gen(m) }; if new ≠ in RD (j) then { in RD (j) = new; for k succ(j) do add k to W }
An Implementation Use bitstring representation for sets: 1 bit position per variable definition For each control flow graph node j pres(j) –has 0 in bit positions corresponding to definitions of variables defined at node j –has 1 in bit positions corresponding to definitions of variables not defined at node j gen(j) –has 1 in bit positions corresponding to definitions at node j –has 0 in bit positions for all other definitions (i.e., definitions not at node j)
Detailed Algorithm W = empty // initialize the worklist for (i = 1; i < n+1; i++) // i varies over nodes for (j = 1; j < m+1; j++) { // j over definitions if (k pred(i) with j gen(k)) then { set j bit to 1 in in RD (i); add (j,i) to W} else { set j bit to 0 in in RD (i);} while (W not empty) do { remove (j,i) from W if (j pres(i)) then { for (k succ(i)) if (j bit in in RD (k) == 0) then { set j bit to 1 in in RD (k); add (j,k) to W } } } First loop (for) passes gen sets to successors. Second loop (while) performs worklist propagation.
Example, Bitvector Calculation i=0 k=0 i<0 mod(i,3) = 0? k:=k-1k:=k+1 i:=i+1 exit (i,1),(k,1) (k,4)(k,5) (i,6) B1 B2 B3 B4 B5 B6 Definitions and basic blocks are given unique identifiers
Initialization i=0 k=0 i<0 mod(i,3) = 0? k:=k-1k:=k+1 i:=i+1 exit B1 B2 B3 B4 B5 B6 B1 B2 B3 B4 B5 B6 pres: gen: Bits: i1,k1,k4,k5,i6 (i,1),(k,1) (k,4)(k,5) (i,6)
After Initialization Loop i=0 k=0 i<0 mod(i,3) = 0? k:=k-1k:=k+1 i:=i+1 exit B1 B2 B3 B4 B5 B B1 B2 B3 B4 B5 B6 pres: gen: Bits: i1,k1,k4,k5,i6 (i,1),(k,1) (k,4)(k,5) (i,6)
Propagation Loop Worklist W = {(i1,2),(k1,2),(i6,2),(k4,6),(k5,6)} Choose (i1,2); pres(2) = 11111, so Reach(3) = and we add (i1,3) to W. Then choose (k1,2) off W and set Reach(3) = and we add (k1,3) to W. Then choose (i6,2) off W and set Reach(3) = and add (i6,3) to W. Now W = {(k4,6),(k5,6), (i1,3), (k1,3), (i6,3)} Iteration continues until worklist is empty.
After Steps in Previous Slide i=0 k=0 i<0 mod(i,3) = 0? k:=k-1k:=k+1 i:=i+1 exit B1 B2 B3 B4 B5 B (i,1),(k,1) (k,4)(k,5) (i,6)
After Steps in Previous Slide i=0 k=0 i<0 mod(i,3) = 0? k:=k-1k:=k+1 i:=i+1 exit B1 B2 B3 B4 B5 B (i,1),(k,1) (k,4)(k,5) (i,6)
After Steps in Previous Slide i=0 k=0 i<0 mod(i,3) = 0? k:=k-1k:=k+1 i:=i+1 exit B1 B2 B3 B4 B5 B (i,1),(k,1) (k,4)(k,5) (i,6)
Solution (skipping some steps) i=0 k=0 i<0 mod(i,3) = 0? k:=k-1k:=k+1 i:=i+1 exit B1 B2 B3 B4 B5 B (i,1),(k,1) (k,4)(k,5) (i,6)