Iterative Data-flow Analysis C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. New lecture, 2003
COMP 512, Fall Review Last class: Looked at Global Common Subexpression Elimination ( Cocke 70 ) Defined the available expressions problem as the key to finding opportunities and proving safety A VAIL (n 0 ) = Ø A VAIL (b) = x pred(b) (DEE XPR (x) (A VAIL (x) E XPR K ILL (x) )) Looked at an algorithm to solve these equations Compute initial information: DEE XPR & E XPR K ILL Apply an iterative solver to find a fixed-point solution Today Why does the iterative solver work?
COMP 512, Fall Data-flow Analysis Definition Data-flow analysis is a collection of techniques for compile-time reasoning about the run-time flow of values Almost always involves building a graph Problems are trivial on a basic block Global problems control-flow graph (or derivative) Whole program problems call graph (or derivative) Usually formulated as a set of simultaneous equations Sets attached to nodes and edges Semilattice to describe values We solved A VAIL with an iterative fixed-point algorithm Desired result is usually meet over all paths solution “What is true on every path from the entry?” “Can this happen on any path from the entry?” Related to the safety of optimization ( how we use the results )
COMP 512, Fall Data-flow Analysis Limitations 1. Precision – “up to symbolic execution” Assume all paths are taken 2.Solution – cannot afford to compute M OP solution Large class of problems where M OP = M FP = L FP Not all problems of interest are in this class 3.Arrays – treated naively in classical analysis Represent whole array with a single fact 4.Pointers – difficult ( and expensive ) to analyze Imprecision rapidly adds up Need to ask the right questions Summary For scalar values, we can quickly solve simple problems Good news: Simple problems can carry us pretty far *
COMP 512, Fall Data-flow Analysis Semilattice A semilattice is a set L and a meet operation such that, a, b, & c L : 1. a a = a 2. a b = b a 3. a (b c) = (a b) c imposes an order on L, a, b, & c L : 1. a ≥ b a b = b 2. a > b a ≥ b and a ≠ b A semilattice has a bottom element, denoted 1. a L, a = 2. a L, a ≥
COMP 512, Fall Data-flow Analysis How does this relate to data-flow analysis? Choose a semilattice to represent the facts Attach a meaning to each a L Each a L is a distinct set of known facts With each node n, associate a function f n : L L f n models behavior of code in block corresponding to n Let F be the set of all functions that the code might generate Example — A VAIL Semilattice is (2 E, ), where E is the set of all expressions & is Set are bigger than | variables |, is Ø For a node n, f n has the form f n (x) = a n (x b n ) Where a n is DEE xpr (n) and b n is E XPR K ILL (n)
COMP 512, Fall Concrete Example: Available Expressions m a + b n a + b A p c + d r c + d B y a + b z c + d G q a + b r c + d C e b + 18 s a + b u e + f D e a + 17 t c + d u e + f E v a + b w c + d x e + f F E = { a+b, c+d, e+f, a+17, b+18 } 2 E is the set of all subsets of E 2 E = [ {a+b, c+d, e+f, a+17, b+18}, {a+b, c+d, e+f, a+17}, {a+b, c+d, e+f, b+18}, {a+b, c+d, a+17, b+18}, {a+b, e+f, a+17, b+18}, {c+d, e+f, a+17, b+18}, {a+b, c+d, e+f}, {a+b, c+d, b+18}, {a+b, c+d, a+17}, {a+b, e+f, a+17}, {a+b, e+f, b+18},{a+b, a+17, b+18}, {c+d, e+f, a+17}, {c+d, e+f, b+18}, {c+d, a+17, b+18},{e+f, a+17, b+18}, {a+b, c+d},{a+b, e+f},{a+b, a+17}, {a+b, b+18},{c+d, e+f},{c+d, a+17}, {c+d, b+18},{e+f, a+17},{e+f, b+18}, {a+17, b+18},{a+b}, {c+d}, {e+f}, {a+17}, {b+18}, {} ]
COMP 512, Fall Concrete Example: Available Expressions The Lattice { } {a+b} {c+d} {e+f} {a+17} {b+18} {a+b, c+d} {a+b, a+17} {c+d, e+f} {c+d, b+18} {e+f, b+18} {a+b, e+f} {a+b, b+18} {c+d, a+17} {e+f, a+17} {a+17, b+18} {a+b, c+d, e+f} {a+b, c+d, b+18} {a+b, c+d, a+17} {a+b, e+f, a+17} {a+b, e+f, b+18} {a+b, a+17, b+18} {c+d, e+f, a+17} {c+d, e+f, b+18} {c+d, a+17, b+18} {e+f, a+17, b+18}, {a+b, c+d, e+f, a+17} {a+b, c+d, e+f, b+18} {a+b, c+d, a+17, b+18} {a+b, e+f, a+17, b+18} {c+d, e+f, a+17, b+18} {a+b, c+d, e+f, a+17, b+18}, * Comparability (transitive)
COMP 512, Fall Concrete Example: Available Expressions The Lattice { } {a+b} {c+d} {e+f} {a+17} {b+18} {a+b, c+d} {a+b, a+17} {c+d, e+f} {c+d, b+18} {e+f, b+18} {a+b, e+f} {a+b, b+18} {c+d, a+17} {e+f, a+17} {a+17, b+18} {a+b, c+d, e+f} {a+b, c+d, b+18} {a+b, c+d, a+17} {a+b, e+f, a+17} {a+b, e+f, b+18} {a+b, a+17, b+18} {c+d, e+f, a+17} {c+d, e+f, b+18} {c+d, a+17, b+18} {e+f, a+17, b+18}, {a+b, c+d, e+f, a+17} {a+b, c+d, e+f, b+18} {a+b, c+d, a+17, b+18} {a+b, e+f, a+17, b+18} {c+d, e+f, a+17, b+18} {a+b, c+d, e+f, a+17, b+18}, * meet
COMP 512, Fall Round-robin Iterative Algorithm Termination: does it halt? Correctness: what answer does it produce? Speed: how quickly does it find that answer? A VAIL (b 0 ) Ø for i 1 to N A VAIL (b i ) { all expressions } change true while (change) change false for i 0 to N T EMP x pred (b) (D EF (x) (A VAIL (x) NK ILL (x) )) if A VAIL (b i ) ≠ T EMP then change true A VAIL (b i ) T EMP The round-robin solver is easier to analyze than the worklist solver.
COMP 512, Fall Round-robin Iterative Algorithm Termination Makes sweeps over the nodes Halts when some sweep produces no change A VAIL (b 0 ) Ø for i 1 to N A VAIL (b i ) { all expressions } change true while (change) change false for i 0 to N T EMP x pred (b) (D EF (x) (A VAIL (x) NK ILL (x) )) if A VAIL (b i ) ≠ T EMP then change true A VAIL (b i ) T EMP
COMP 512, Fall Iterative Data-flow Analysis Any finite semilattice is bounded Some infinite semilattices are bounded … … 0 ….001 ….002 … Real constants Termination If every f n F is monotone, i.e., x ≤ y f(x) ≤ f(y), and If the lattice is bounded, i.e., every descending chain is finite Chain is sequence x 1, x 2, …, x n where x i L, 1 ≤ i ≤ n x i > x i+1, 1 ≤ i < n chain is descending Then The set at each node can only change a finite number of times The iterative algorithm must halt on an instance of the problem
COMP 512, Fall Iterative Data-flow Analysis Correctness ( What does it compute? ) If every f n F is monotone, i.e., x ≤ y f(x) ≤ f(y), and If the semilattice is bounded, i.e., every descending chain is finite Chain is sequence x 1, x 2, …, x n where x i L, 1 ≤ i ≤ n x i > x i+1, 1 ≤ i < n chain is descending Given a bounded semilattice S and a monotone function space F k such that f k ( ) = f j ( ) j > k f k ( ) is called the least fixed-point of f over S If L has a T, then k such that f k ( T ) = f j ( T ) j > k and f k ( T ) is called the maximal fixed-point of f over S optimism
COMP 512, Fall Iterative Data-flow Analysis Correctness If every f n F is monotone, i.e., f(x y) ≤ f(x) f(y), and If the lattice is bounded, i.e., every descending chain is finite Chain is sequence x 1, x 2, …, x n where x i L, 1 ≤ i ≤ n x i > x i+1, 1 ≤ i < n chain is descending Then The round-robin algorithm computes a least fixed-point ( LFP ) The uniqueness of the solution depends on other properties of F Unique solution it finds the one we want Multiple solutions we need to know which one it finds
COMP 512, Fall Iterative Data-flow Analysis Correctness Does the iterative algorithm compute the desired answer? Admissible Function Spaces 1. f F, x,y L, f (x y) = f (x) f (y) 2. f i F such that x L, f i (x) = x 3.f,g F h F such that h(x ) = f (g(x)) 4. x L, a finite subset H F such that x = f H f ( ) If F meets these four conditions, then an instance of the problem will have a unique fixed point solution (instance graph + initial values) LFP = MFP = MOP order of evaluation does not matter Not distributive fixed point solution may not be unique *
COMP 512, Fall Iterative Data-flow Analysis If a data-flow framework meets those admissibility conditions then it has a unique fixed-point solution The iterative algorithm finds the (best) answer The solution does not depend on order of computation Algorithm can choose an order that converges quickly Intuition Choose an order so that changes propagate as far as possible on each “sweep” Process a node’s predecessors before the node Cycles pose problems, of course Ignore back edges when computing the order? *
COMP 512, Fall Ordering the Nodes to Maximize Propagation Postorder Reverse Postorder Reverse postorder visits predecessors before visiting a node Use reverse preorder for backward problems Reverse postorder on reverse CFG is reverse preorder N+1 - postorder number
COMP 512, Fall Iterative Data-flow Analysis Speed For a problem with an admissible function space & a bounded semilattice, If the functions all meet the rapid condition, i.e., f,g F, x L, f (g( )) ≥ g( ) f (x) x then, a round-robin, reverse-postorder iterative algorithm will halt in d(G)+3 passes over a graph G d(G) is the loop-connectedness of the graph w.r.t a DFST Maximal number of back edges in an acyclic path Several studies suggest that, in practice, d(G) is small ( <3 ) For most CFGs, d(G) is independent of the specific DFST Sets stabilize in two passes around a loop Each pass does O(E ) meets & O(N ) other operations *
COMP 512, Fall Iterative Data-flow analysis What does this mean? Reverse postorder Easily computed order that increases propagation per pass Round-robin iterative algorithm Visit all the nodes in a consistent order ( RPO ) Do it again until the sets stop changing Rapid condition Most classic global data-flow problems meet this condition These conditions are easily met Admissible framework, rapid function space Round-robin, reverse-postorder, iterative algorithm The analysis runs in ( effectively ) linear time
COMP 512, Fall Some problems are not admissible Global constant propagation First condition in admissibility f F, x,y L, f (x y) = f (x) f (y) Constant propagation is not admissible Kam & Ullman time bound does not hold There are tight time bounds, however, based on lattice height Require a variable-by-variable formulation … a b + c Function “f” models block’s effects f( S1 ) = {a=7,b=3,c=4} f( S2 ) = {a=7,b=1,c=6} f(S1 S2) = Ø S1 : {b=3,c=4} S2 : {b=1,c=6}
COMP 512, Fall Some admissible problems are not rapid Interprocedural May Modify sets Iterations proportional to number of parameters Not a function of the call graph Can make example arbitrarily bad Proportional to length of chain of bindings… shift(a,b,c,d,e,f) { local t; … call shift(t,a,b,c,d,e); f = 1; … } Assume call-by-reference Compute the set of variables (in shift) that can be modified by a call to shift How long does it take? shift abcdef Nothing to do with d(G)
COMP 512, Fall Extra Slides Start Here
COMP 512, Fall Computing Available Expressions A VAIL (b) = x pred(b) (DEE XPR (x) (A VAIL (x) E XPR K ILL (x) )) where E XPR K ILL (b) is the set of expression killed in b, and DEE XPR (b) is the set of expressions defined in b and not subsequently killed in b Initial condition A VAIL (n 0 ) = Ø, because nothing is computed before n 0 The other node’s A VAIL sets will be computed over their preds. N 0 has no predecessor.
COMP 512, Fall Making Theory Concrete Computing A VAIL for the example A VAIL (A) = Ø A VAIL (B) = { a+b } ( Ø all ) = { a+b } A VAIL (C)= { a+b } A VAIL (D) = { a+b,c+d } ({ a+b } all ) = { a+b,c+d } A VAIL (E) = { a+b,c+d } A VAIL (F) = [{ b+18,a+b,e+f } ({ a+b,c+d } { all - e+f })] [{ a+17,c+d,e+f } ({ a+b,c+d } { all - e+f })] = { a+b,c+d,e+f } A VAIL (G)= [ { c+d } ({ a+b } all )] [{ a+b,c+d,e+f } ({ a+b,c+d,e+f } all )] = { a+b,c+d } m a + b n a + b A p c + d r c + d B y a + b z c + d G q a + b r c + d C e b + 18 s a + b u e + f D e a + 17 t c + d u e + f E v a + b w c + d x e + f F *
COMP 512, Fall Redundancy Elimination Wrap-up AlgorithmAcronymCredits Local Value NumberingLVNBalke, 1967 Superlocal Value NumberingSVNMany Dominator-based Value Num’gDVNTSimpson, 1996 Global CSE (with A VAIL )GCSECocke, 1970 SCC-based Value Numbering † SCCVN/VDCMSimpson, 1996 Partitioning Algorithm † AWZAlpern et al, 1988 … and there are many others … † We have not seen these ones (yet). Three general approaches Hash-based, bottom-up techniques Data-flow techniques Partitioning Each has strengths & weaknesses
COMP 512, Fall Making Theory Concrete Comparing the techniques m a + b n a + b A p c + d r c + d B y a + b z c + d G q a + b r c + d C e b + 18 s a + b u e + f D e a + 17 t c + d u e + f E v a + b w c + d x e + f F LVN SVN DVN GRE DVN GRE The VN methods are ordered LVN ≤ SVN ≤ DVN (≤ SCCVN) GRE is different o Based on names, not value o Two phase algorithm Analysis Replacement
COMP 512, Fall Redundancy Elimination Wrap-up Comparisons Better results in loops
COMP 512, Fall Redundancy Elimination Wrap-up Generalizations Hash-based methods are fastest AWZ (& SCCVN) find the most cases Expect better results with larger scope Experimental data Ran LVN, SVN, DVNT, AWZ Used global name space for DVNT Requires offline replacement Exposes more opportunities Code was compiled with lots of optimization How did they do? D VNT beat A WZ Improvements grew with scope D VNT vs. S CC V N was ± 1% D VNT 6x faster than S CC V N S CC V N 2.5x faster than A WZ * The partitioning method based on DFA minimization
COMP 512, Fall Redundancy Elimination Wrap-up Conclusions Redundancy elimination has some depth & subtlety Variations on names, algorithms & analysis matter Compile-time speed does not have to sacrifice code quality DVNT is probably the method of choice Results quite close to the global methods ( ± 1% ) Much lower costs than SCCVN or AWZ
COMP 512, Fall Lattice Theory This stuff is somewhat dry Everybody stand up and stretch