Static Single Assignment March 20, 2008
Last Week … Data-Flow Analysis Gather conservative, approximate information about what a program does Result: some property that holds every time the instruction executes The Data-Flow Abstraction Execution of an instruction transforms program state To analyze a program, we must consider all possible sequences of program points (paths) Summarize all possible program states with finite set of facts
The General Approach Setting up and solving systems of equations that relate information at various points in the program such as out[S] = gen[S] È ( in[S] - kill[S] ) where S is a statement in[S] and out[S] are information before and after S gen[S] and kill[S] are information generated and killed by S definition of in, out, gen, and kill depends on the desired information
Data-Flow Analysis (cont.) Properties: either a forward analysis (out as function of in) or a backward analysis (in as a function of out). either an “along some path” problem or an “along all paths” problem. Data-flow analysis must be conservative Definitions: point between two statements (or before the first statements and after the last) path is a sequence of consecutive points in the control-flow graph
Reaching Definitions Reaching definitions: set of definitions that may reach (along one or more paths) a given point gen[S]: definition d is in gen[S] if d may reach the end of S, independently of whether it reaches the beginning of S. kill[S]: the set of definitions that never reach the end of S, even if they reach the beginning. Equations: in[S] = È (P a predecessor of S) out[P ] out[S] = gen[S] È ( in[S] - kill[S] )
Reaching Definitions (cont.) Algorithm: for each basic block B: out[B] := gen[B]; (1) do change := false; for each basic block B do in[B] = È (P a predecessor of B) out[P ]; (2) old-out = out[B]; (3) out[B] = gen[B] È (in[B] - kill[B]); (4) if (out[B] != old-out) then change := true; (5) end while change
Example for Reaching Definitions Compute gen/kill and iterate (visiting order: b1, b2, b3, b4) gen[b1] := {d1, d2, d3} kill[b1] := {d4, d5, d6, d7} gen[b2] := {} kill[b2] := {} gen[b3] := {} kill[b3] := {} gen[b4] := {} kill[b4] := {} i := m-1 d1 j := n d2 a := u1 d3 i := i+1 d4 j := j-1 d5 b1 b2 a := u2 d6 b3 i := u3 d7 b4 b1 b2 b3 b4 initial in[B] 000 0000 out[B] 000 0000 pass1 in[B] 000 0000 out[B] 000 0000 pass2 in[B] 000 0000 out[B] 000 0000 pass3 in[B] 000 0000 out[B] 000 0000
Generalizations: Other Data-Flow Analyses Reaching definitions is a (forward; some-path) analysis For backward analysis: interchange in / out sets in the previous algorithm, lines (1-5) For all-path analysis: intersection is substituted for union in line (2)
Common Subexpression Elimination Rule used to eliminate subexpression within a basic block The subexpression was already defined The value of the subexpression is not modified i.e. none of the values needed to compute the subexpression are redefined What about eliminating subexpressions across basic blocks?
Available Expressions An expression x+y is available at a point p: if every path from the initial node to p evaluates x+y, and after the last such evaluation, prior to reaching p, there are no subsequent assignments to x or y. Definitions: forward, all-path, e-gen[S]: expressions definitely generated by S, e.g. “z := x+y”: expression “x+y” is generated e-kill[S]: expressions that may be killed by S e.g. “z := x+y”: all expression containing “z” are killed. order: compute e-gen and then e-kill, e.g. “x:= x+y”
Available Expressions (cont.) Algorithm: for each basic block B: out[B] := e-gen[B]; (1) do change := false; for each basic block B do in[B] = Ç (P a predecessor of B) out[P]; (2) old-out = out[B]; (3) out[B] = e-gen[B] È (in[B] - e-kill[B]); (4) if (out[B] != old-out) then change := true; (5) end while change difference: line (2), use intersection instead of union
Pointer Analysis Identify the memory locations that may be addressed by a pointer may be formalized as a system of data-flow equations. Simple programming model: pointer to integer (or float, arrays of integer, arrays of float) no pointer to pointers allowed Definitions: in[S]: the set of pairs (p, a), where p is a pointer, a is a variable, and p might point to a before statement S. out[S]: the set of pairs (p, a), where p might point to a after statement S. gen[S]: the new pairs (p, a) generated by the statement S. kill[S]: the pairs (p, a) killed by the statement S.
Pointer Analysis (cont.) input set S: a=b+c gen [S ] = { } kill[S ] = { } input set S: p = &a gen [S ] = { (p, a) } kill[S, input set ] = { (p, b) | (p, b) is in input set } input set S: p = q gen [S, input set ] = { (p, b) | (q, b) is in input set } kill[S, input set ] = { (p, b) | (p, b) is in input set }
Pointer Analysis (cont.) Algorithm: for each basic block B: out[B] := gen [B ]; (1) do change := false; for each basic block B do in[B] = È (P a predecessor of B) out[P]; (2) old-out = out[B]; (3) out[B] = gen[B, in[B] ] È (in[B] - kill[B, in[B] ] ) (4) if (out[B] != old-out) then change := true; (5) end while change difference: line (4): gen and kill are functions of B and in[B].
Summary Iterative algorithm: solve data-flow problem for arbitrary control flow graph To solve a new data-flow problem: define gen/kill accordingly determine properties: forward / backward some-path / all-path
Static Single Assignment March 20, 2008
What is SSA? Many data-flow problems have been formulated To limit the number of analyses, use a single analysis to perform multiple transformations SSA Compiler optimization algorithms enhanced or enabled by SSA: Constant propagation Dead code elimination Global value numbering Partial redundancy elimination Register allocation
High-Level Definition Each assignment to a variable is given a unique name All of the uses reached by that assignment are renamed Easy for straight-line code V 4 V0 4 V + 5 V0 + 5 V 6 V1 6 V + 7 V1 + 7 What about control flow? -nodes: combine values from distinct edges (join points)
What About Control Flow? if (…) X 5 X 3 Y X B1 B2 B3 B4 if (…) X0 5 X1 3 X2 (X0, X1) Y0 X2 B1 B4 B2 B3 Before SSA After SSA
What About Control Flow? X 1 X X + 1 B1 B2 B1 X0 1 X1 (X2, X0) X2 X1 + 1 B2 Before SSA After SSA
SSA Construction Algorithm High-Level Sketch 1. Insert -functions 2. Rename values
SSA Construction Algorithm 1. Insert -functions at every join for every name 2. Solve reaching definitions 3. Rename each use to the def that reaches it (will be unique) What’s wrong with this approach Too many -functions Precision, Space, Time Need to relate edges to -functions parameters To do better, we need a more complex approach Builds maximal SSA
SSA Construction Algorithm 1. Insert -functions a.) calculate dominance frontiers b.) find global names for each name, build a list of blocks that define it c.) insert -functions global name n block b in which n is assigned block d in b’s dominance frontier insert a -function for n in d add d to n’s list of defining blocks Moderately complex Compute list of blocks where each name is assigned & use as a worklist This adds to the worklist ! { Creates the iterated dominance frontier Use a checklist to avoid putting blocks on the worklist twice; keep another checklist to avoid inserting the same -function twice.
SSA Construction Algorithm 2. Rename variables in a pre-order walk over dominator tree (use an array of stacks, one stack per global name) Staring with the root block, b a.) generate unique names for each -function and push them on the appropriate stacks b.) rewrite each operation in the block i. Rewrite uses of global names with the current version (from the stack) ii. Rewrite definition by inventing & pushing new name c.) fill in -function parameters of successor blocks d.) recurse on b’s children in the dominator tree e.) <on exit from block b > pop names generated in b from stacks
SSA Construction Algorithm Computing Dominance First step in -function insertion computes dominance Recall that n dominates m iff n is on every path from n0 to m Every node dominates itself n’s immediate dominator is its closest dominator, IDOM(n) DOM(n0 ) = { n0 } DOM(n) = { n } (ppreds(n) DOM(p)) Initially, DOM(n) = N, n≠n0
Dominance Frontiers Dominance Frontiers B1 B2 B3 B4 B5 B6 B7 B0 B1 B2 B3 B4 B5 B6 B7 B0 Dominance Frontiers V dominates some predecessor of w V does not strictly dominate w Intuitively: The fringe just beyond the region V dominates Dominance Tree Flow Graph n B0 B1 B2 B3 B4 B5 B6 B7 DOM(n) 0,1 0,1,2 0,1,3 0,1,3,4 0,1,3,5 0,1,3,6 0,1,7 IDOM(n) -- 1 3 DF(n) 7 6
Example Dominance Frontiers Dominance Frontiers & -Function Insertion A definition at n forces a -function at m iff n DOM(m) but n DOM(p) for some p preds(m) DF(n ) is fringe just beyond region n dominates B1 B2 B3 B4 B5 B6 B7 B0 x (...) in 7 forces -function in DF(7) = {1} x ... x (...) DF(4) is {6}, so in 4 forces -function in 6 x (...) in 6 forces -function in DF(6) = {7} Dominance Frontiers in 1 forces -function in DF(1) = Ø (halt ) For each assignment, we insert the -functions
Example Computing Dominance Frontiers Dominance Frontiers Only join points are in DF(n) for some n Leads to a simple, intuitive algorithm for computing dominance frontiers For each join point x (i.e., |preds(x)| > 1) For each CFG predecessor of x Run up to IDOM(x ) in the dominator tree, adding x to DF(n) for each n between x and IDOM(x ) B1 B2 B3 B4 B5 B6 B7 B0 For some applications, we need post-dominance, the post-dominator tree, and reverse dominance frontiers, RDF(n ) Just dominance on the reverse CFG Reverse the edges & add unique exit node We will use these in dead code elimination Dominance Frontiers
SSA Construction Algorithm (Reminder) 1. Insert -functions at every join for every name a.) calculate dominance frontiers b.) find global names for each name, build a list of blocks that define it c.) insert -functions global name n block b in which n is assigned block d in b’s dominance frontier insert a -function for n in d add d to n’s list of defining blocks Needs a little more detail
SSA Construction Algorithm Finding global names Different between two forms of SSA Minimal uses all names Semi-pruned uses names that are live on entry to some block Shrinks name space & number of -functions Pays for itself in compile-time speed For each “global name”, need a list of blocks where it is defined Drives -function insertion b defines x implies a -function for x in every c DF(b) Pruned SSA adds a test to see if x is live at insertion point Otherwise, don’t need a -function
Excluding local names avoids ’s for y & z a (a,a) b (b,b) c (c,c) d (d,d) y a+b z c+d i i+1 B7 i > 100 i ... B0 b ... c ... d ... B2 c (c,c) i (i,i) a ... B1 B3 B4 B5 B6 Excluding local names avoids ’s for y & z With all the -functions Lots of new ops Renaming is next Assume a, b, c, & d defined before B0
SSA Construction Algorithm (reminder) 2. Rename variables in a pre-order walk over dominator tree (use an array of stacks, one stack per global name) Staring with the root block, b a.) generate unique names for each -function and push them on the appropriate stacks b.) rewrite each operation in the block i. Rewrite uses of global names with the current version (from the stack) ii. Rewrite definition by inventing & pushing new name c.) fill in -function parameters of successor blocks d.) recurse on b’s children in the dominator tree e.) <on exit from block b > pop names generated in b from stacks
Before processing B0 Assume a, b, c, & d defined before B0 a (a,a) b (b,b) c (c,c) d (d,d) y a+b z c+d i i+1 B7 i > 100 i ... B0 b ... c ... d ... B2 i (i,i) a ... B1 B3 B4 B5 B6 Before processing B0 Assume a, b, c, & d defined before B0 i has not been defined a b c d i Counters Stacks 1 1 1 1 a0 b0 c0 d0 20
End of B0 a b c d i Counters Stacks 1 1 1 1 1 21 a0 b0 c0 d0 i0 B0 a (a0,a) b (b0,b) c (c0,c) d (d0,d) i (i0,i) a ... c ... B1 End of B0 b ... c ... d ... B2 a ... d ... B3 d ... B4 c ... B5 d (d,d) c (c,c) b ... B6 a b c d i Counters Stacks 1 1 1 1 1 B7 a (a,a) b (b,b) c (c,c) d (d,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 i > 100 21
End of B1 a b c d i Counters Stacks 3 2 3 2 2 22 a0 b0 c0 d0 i0 a1 b1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B1 End of B1 b ... c ... d ... B2 a ... d ... B3 d ... B4 c ... B5 d (d,d) c (c,c) b ... B6 a b c d i Counters Stacks 3 2 3 2 2 B7 a (a,a) b (b,b) c (c,c) d (d,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 a2 c2 i > 100 22
End of B2 a b c d i Counters Stacks 3 3 4 3 2 23 a0 b0 c0 d0 i0 a1 b1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B1 End of B2 b2 ... c3 ... d2 ... B2 a ... d ... B3 d ... B4 c ... B5 d (d,d) c (c,c) b ... B6 a b c d i Counters Stacks 3 3 4 3 2 B7 a (a2,a) b (b2,b) c (c3,c) d (d2,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 a2 b2 c2 d2 c3 i > 100 23
Before starting B3 a b c d i Counters Stacks 3 3 4 3 2 24 a0 b0 c0 d0 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B1 Before starting B3 b2 ... c3 ... d2 ... B2 a ... d ... B3 d ... B4 c ... B5 d (d,d) c (c,c) b ... B6 a b c d i Counters Stacks 3 3 4 3 2 B7 a (a2,a) b (b2,b) c (c3,c) d (d2,d) y a+b z c+d i i+1 i ≤ 100 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 a2 c2 i > 100 24
End of B3 a b c d i Counters Stacks 4 3 4 4 2 25 a0 b0 c0 d0 i0 a1 b1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B1 End of B3 b2 ... c3 ... d2 ... B2 a3 ... d3 ... B3 d ... B4 c ... B5 d (d,d) c (c,c) b ... B6 a b c d i Counters Stacks 4 3 4 4 2 B7 a (a2,a) b (b2,b) c (c3,c) d (d2,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 a2 c2 d3 a3 i > 100 25
End of B4 a b c d i Counters Stacks 4 3 4 5 2 26 a0 b0 c0 d0 i0 a1 b1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B1 End of B4 b2 ... c3 ... d2 ... B2 a3 ... d3 ... B3 d4 ... B4 c ... B5 d (d4,d) c (c2,c) b ... B6 a b c d i Counters Stacks 4 3 4 5 2 B7 a (a2,a) b (b2,b) c (c3,c) d (d2,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 a2 c2 d3 a3 d4 i > 100 26
End of B5 a b c d i Counters Stacks 4 3 5 5 2 27 a0 b0 c0 d0 i0 a1 b1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B1 End of B5 b2 ... c3 ... d2 ... B2 a3 ... d3 ... B3 d4 ... B4 c4 ... B5 d (d4,d3) c (c2,c4) b ... B6 a b c d i Counters Stacks 4 3 5 5 2 B7 a (a2,a) b (b2,b) c (c3,c) d (d2,d) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 a2 c2 d3 a3 c4 i > 100 27
End of B6 a b c d i Counters Stacks 4 4 6 6 2 28 a0 b0 c0 d0 i0 a1 b1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B1 End of B6 b2 ... c3 ... d2 ... B2 a3 ... d3 ... B3 d4 ... B4 c4 ... B5 d5 (d4,d3) c5 (c2,c4) b3 ... B6 a b c d i Counters Stacks 4 4 6 6 2 B7 a (a2,a3) b (b2,b3) c (c3,c5) d (d2,d5) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 a2 b3 c2 d3 a3 c5 d5 i > 100 28
Before B7 a b c d i Counters Stacks 4 4 6 6 2 29 a0 b0 c0 d0 i0 a1 b1 a1 (a0,a) b1 (b0,b) c1 (c0,c) d1 (d0,d) i1 (i0,i) a2 ... c2 ... B1 Before B7 b2 ... c3 ... d2 ... B2 a3 ... d3 ... B3 d4 ... B4 c4 ... B5 d5 (d4,d3) c5 (c2,c4) b3 ... B6 a b c d i Counters Stacks 4 4 6 6 2 B7 a (a2,a3) b (b2,b3) c (c3,c5) d (d2,d5) y a+b z c+d i i+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 a2 c2 i > 100 29
End of B7 a b c d i Counters Stacks 5 5 7 7 3 30 a0 b0 c0 d0 i0 a1 b1 a1 (a0,a4) b1 (b0,b4) c1 (c0,c6) d1 (d0,d6) i1 (i0,i2) a2 ... c2 ... B1 End of B7 b2 ... c3 ... d2 ... B2 a3 ... d3 ... B3 d4 ... B4 c4 ... B5 d5 (d4,d3) c5 (c2,c4) b3 ... B6 a b c d i Counters Stacks 5 5 7 7 3 B7 a4 (a2,a3) b4 (b2,b3) c6 (c3,c5) d6 (d2,d5) y a4+b4 z c6+d6 i2 i1+1 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 a2 b4 c2 d6 i2 a4 c6 i > 100 30
Semi-pruned only names live in 2 or more blocks are “global names”. a1 (a0,a4) b1 (b0,b4) c1 (c0,c6) d1 (d0,d6) i1 (i0,i2) a2 ... c2 ... B1 After renaming Semi-pruned SSA form We’re finished … b2 ... c3 ... d2 ... B2 a3 ... d3 ... B3 d4 ... B4 c4 ... B5 d5 (d4,d3) c5 (c2,c4) b3 ... B6 B7 a4 (a2,a3) b4 (b2,b3) c6 (c3,c5) d6 (d2,d5) y a4+b4 z c6+d6 i2 i1+1 Semi-pruned only names live in 2 or more blocks are “global names”. i > 100 31
SSA Construction Algorithm (Pruned SSA) What’s this “pruned SSA” stuff? Minimal SSA still contains extraneous -functions Inserts some -functions where they are dead Would like to avoid inserting them Two ideas Semi-pruned SSA: discard names used in only one block Significant reduction in total number of -functions Needs only local Live information (cheap to compute) Pruned SSA: only insert -functions where value is live Inserts even fewer -functions, but costs more to do Requires global Live variable analysis (more expensive) In practice, both are simple modifications to step 1.
SSA Deconstruction At some point, we need executable code Few machines implement operations Need to fix up the flow of values Basic idea Insert copies -function pred’s Simple algorithm Works in most cases Adds lots of copies Most of them coalesce away X17 (x10,x11) ... x17 ... ... x17 X17 x10 X17 x11
Summary Static Single Assignment is a standard form for intermediate code Used in many optimizations Simple … except for join points