Optimizations using SSA CS 671 March 25, 2008
Last Time – SSA Form Generating SSA form Inserting -functions using dominance frontiers Renaming variables if (…) X 5 X 3 Y X B1 B2 B3 B4 if (…) X0 5 X1 3 X2 (X0, X1) Y0 X2 B1 B4 B2 B3 Before SSA After SSA
Dominance Tree B1 B2 B3 B4 B5 B6 B7 B0 a b c d i Counters Stacks 5 5 7 a1 (a0,a4) b1 (b0,b4) c1 (c0,c6) d1 (d0,d6) i1 (i0,i2) a2 ... c2 ... B1 b2 ... c3 ... d2 ... B2 a3 ... d3 ... B3 Dominance Tree d4 ... B4 c4 ... B5 d5 (d4,d3) c5 (c2,c4) b3 ... B6 B7 a b c d i a4 (a2,a3) b4 (b2,b3) c6 (c3,c5) d6 (d2,d5) y a4+b4 z c6+d6 i2 i1+1 Counters Stacks 5 5 7 7 3 a0 b0 c0 d0 i0 a1 b1 c1 d1 i1 a2 b4 c2 d6 i2 i > 100 a4 c6 30
Today – Using SSA in Optimizations SSA simplifies many optimization algorithms Simplifies def-use chains Examples: Dead code elimination, constant propagation
Dead-Code Elimination Dead code is either: Unreachable code Assignments where the result is never used Examples “y in 1” is dead “x in 1” is partially dead along path 1-2-4 but not 1-3-4 “z in 4,5” is never used in relevant computations here: only x, y are relevant x=a+b y=c+d x= z=z+1 y= z=x+y out(x,y) 1 2 3 4 5 6
Dead-Code Elimination SSA makes dead-code analysis particularly simple Defn: A variable is live at its definition iff its list of uses is not empty There can be no other definition of the variable The definition of a variable dominates every use So there must be a path from definition to use while (there is some variable v with no uses && the statement that defines v has no side effects) delete the statement that defines v When deleting v x y or v (x, y) remove x, y from use list
Dead-Code Elimination in SSA Form W a list of all variables in SSA program while W is not empty remove some variable v from W if v’s list of uses is empty let S be v’s statement of definition if S has no other side effects delete S from the program for each variable xi used by S delete S from the list of uses of xi W W {xi}
Simple Constant Propagation For any statement of the form v c for some constant c Any use of v can be replaced with a use of c Any -function of the form v (c1, c2, …, cn) where all the ci are equal, can be replaced by v c Easy to detect using SSA Easy to implement using work-list algorithm
Simple Constant Prop in SSA Form W a list of all statements in SSA program while W is not empty remove some statement S from W if S is v (c, c,…, c) for some constant c replace S by v c if S is v c for some constant c delete S from the program for each statement T that uses v substitute c for v in T W W {T}
Other Transformations … Can be incorporated into the work-list algorithm All can be done in linear time Examples Copy propagation Constant conditions Unreachable code
Copy Propagation A single argument -function x (y) or a copy assignment x y can be deleted and y substituted every use of x i1 1 j1 1 k1 0 j2 (j4, j1) k2 (k4, k1) if k2 < 100 if j2 < 20 return j2 j3 i1 k3 k2 + 1 j5 k2 k5 k2 + 2 j4 (j3, j5) k4 (k3, k5)
Constant Conditions if (a < b) goto L1 else L2 where a and b are constant becomes goto L1 (or goto L2) Extraneous control-flow edge must be deleted -functions must be adjusted (to account for predecessor-1) j = 1 if (j < 20) goto L1 else goto L2 L1 L2
Unreachable Code Deleting a predecessor may cause L2 to become unreachable All statements in L2 can be deleted Use-lists of all variables used in L2 must be adjusted L2 can be deleted (and its successors updated) j = 1 goto L1 L1 L2
Conditional Constant Propagation Is j always equal to 1? Simple constant propagation missed this opportunity! if j < 20 if k < 100 return j i 1 j 1 k 0 j i k k + 1 j k k k + 2
SSA Conditional Constant Propagation Keeps track of the result of conditional branches Only propagate definitions when the flow graph is marked executable When propagating constants, ignore edges at join nodes that are not executable. Does not assume that a variable is non-constant until there is evidence Does not assume that we execute a given block until there is evidence
SSA Conditional Constant Propagation Uses a lattice: [x] = T No evidence that any assignment to v is executed [x] = 4 Evidence of x 4 has been seen [x] = Evidence that x may have two different values Tracks the run-time value of variables New information can only move a variable down the lattice T Never defined ci cj ck cl cm cn ... Defined as c Overdefined
Constant Propagation (cont.) Side effect of the meet operator: Ç T c ^ T T c ^ (c =c ) ? c c 1 ^ 1 1 c : ^ ^ ^ ^ ^ x y z z = f(x, y)
Executability Also track the executability of each block: [B] = false We have seen no evidence that block B can ever be executed [B] = true We have seen evidence that block B can be executed Start with all blocks: [B] = false The start block B1 is executable: [B1] = true For any executable block B with one successor C: [C] = true For executable branches if x<y goto L1 else L2: [x] = T or [y] = T [L2] = true and [L2] = true
An Example Start with all variables: [x] = T Start with all blocks: [B] = false Calculate and 1 i1 1 j1 1 k1 0 x [x] i1 j1 j2 j3 j4 j5 k1 k2 k3 k4 k5 2 j2 (j4, j1) k2 (k4, k1) if k2 < 100 B [B] 1 2 3 4 5 6 7 3 4 if j2 < 20 return j2 5 6 j3 i1 k3 k2 + 1 j5 k2 k5 k2 + 2 7 j4 (j3, j5) k4 (k3, k5)
Using SSA – Dead code elimination Conceptually similar to mark-sweep garbage collection Mark useful operations Everything not marked is useless Need an efficient way to find and to mark useful operations Start with critical operations Work back up SSA edges to find their antecedents Define critical I/O statements, linkage code (entry & exit blocks), return values, calls to other procedures Algorithm will use post-dominators & reverse dominance frontiers
Using SSA – Dead code elimination Mark for each op i clear i’s mark if i is critical then mark i add i to WorkList while (Worklist ≠ Ø) remove i from WorkList (i has form “xy op z” ) if def(y) is not marked then mark def(y) add def(y) to WorkList if def(z) is not marked then mark def(z) add def(z) to WorkList for each b RDF(block(i)) mark the block-ending branch in b add it to WorkList Sweep for each op i if i is not marked then if i is a branch then rewrite with a jump to i’s nearest useful post-dominator if i is not a jump then delete i Notes: Eliminates some branches Reconnects dead branches to the remaining live code Find useful post-dominator by walking post-dom tree Entry & exit nodes are useful
Using SSA – Dead code elimination When is a branch useful? When a useful operation depends on its existence j control dependent on i one path from i leads to j, one doesn’t This is the reverse dominance frontier of j (RDF(j)) Algorithm uses RDF(n ) to mark branches as live In the CFG, j is control dependent on i if 1. a non-null path p from i to j j post-dominates every node on p after i 2. j does not strictly post-dominate i
Using SSA – Dead Code Elimination What’s left? Algorithm eliminates useless definitions & some useless branches Algorithm leaves behind empty blocks & extraneous control-flow Two more issues Simplifying control-flow Eliminating unreachable blocks Both are CFG transformations (no need for SSA)
Eliminating Useless Control Flow Transformations Both sides of branch target Bi Neither block must be empty Replace it with a jump to Bi Simple rewrite of last op in B1 How does this happen? Rewriting other branches How do we find it? Check each branch B1 B2 Eliminating redundant branches Branch, not a jump
Eliminating Useless Control Flow Transformations Merging an empty block Empty B1 ends in a jump Coalesce B1 with B2 Move B1’s incoming edges Eliminates extraneous jump Faster, smaller code How does this happen? Eliminate operations in B1 How do we find it? Test for empty block Eliminating empty blocks B2 B1 empty
Eliminating Useless Control Flow Transformations Coalescing blocks Neither block must be empty B1 ends with a jump B2 has 1 predecessor Combine the two blocks Eliminates a jump How does this happen? Simplifying edges out of B1 How do we find it? Check target of jump |preds | Combining non-empty blocks B1 B2 B1 B2 B1 and B2 should be a single basic block If one executes, both execute, in linear order. *
Eliminating Useless Control Flow Transformations Jump to a branch B1 ends with jump, B2 is empty Eliminates pointless jump Copy branch into end of B1 Might make B2 unreachable How does this happen? Eliminating operations in B2 How do we find this? Jump to empty block Hoisting branches from empty blocks B1 B2 empty
Eliminating Useless Control Flow The Algorithm OnePass() for each block i, in postorder if i ends in a conditional branch then if both targets are identical then replace the branch with a jump if i ends in a jump to j then if i is empty then replace transfers to i with transfers to j if j has only one predecessor coalesce i and j if j is empty & j ends in a conditional branch then rewrite i’s jump with j’s branch Clean() until CFG stops changing compute postorder
Eliminating Useless Control Flow What about an empty loop? By itself, CLEAN cannot eliminate the loop Loop body branches to itself Branch is not redundant Doesn’t end with a jump Key is to eliminate self-loop Add a new transformation? Then, B1 merges with B2 B0 B0 B2 B1 B0 B2 B1 Targets two distinct blocks! B1 B2 New transformation must recognize that B1 is empty. Presumably, it has code to test exit condition & (probably) increment an induction variable. This requires looking at code inside B1 and doing some sophisticated pattern matching. This is awfully complicated.
Eliminating Useless Control Flow What about an empty loop? How to eliminate <B1,B1> ? Pattern matching ? Useless code elimination ? What does DEAD do to B1? Remember, it is empty So, B1 RDF(B2) B1’s branch is useless DEAD rewrites it as a jump B0 B2 B1 *
Eliminating Useless Control Flow What about an empty loop? How to eliminate <B1,B1> ? Pattern matching ? Useless code elimination ? What does DEAD do to B1? Remember, it is empty So, B1 RDF(B2) B1’s branch is useless DEAD rewrites it as a jump DEAD converts it to a form where CLEAN handles it B0 B2 B1 B0 B2 B1 DEAD
Dead Code Elimination Summary Useless Computations DEAD (Mark and Sweep) Useless Control-flow CLEAN Unreachable Blocks Execution counts Other Techniques Constant propagation can eliminate branches Algebraic identities eliminate some operations Redundancy elimination Creates useless operations, or Eliminates them
Using SSA In general, using SSA leads to Cleaner formulations Better results Faster algorithms We’ve seen two SSA-based algorithms. Dead-code elimination Constant propagation These optimizations leave behind other inefficiencies