Data Flow Analysis Compiler Baojian Hua
Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer
Middle End AST translation IR1 asm other IR and translation translation IR2
Optimizations AST translation IR1 asm other IR and translation translation IR2 opt
General Scheme for Optimization Analysis control flow, data flow, dependency, … to obtain conservative static knowledge of the program being optimized approximation of the dynamic Rewriting rewrite the program dependent on the knowledge obtained above IR IR ’ static information analysis rewriting
“ Conservative Static ” Cjump (x==5? L1: L2) y = 1y = 2 print (y) Can we substitute y with the value 2? This amounts to prove that x is always equal to 5! Suppose x is an input from user, it ’ s impossible to know it ’ s value statically. So one must be conservative to use the static knowledge.
Liveness Analysis
Motivation Low level IRs assume an infinite number of abstract “ registers ” good for code generations but bad for execution on a real machine machine has a finite number of registers so how to leverage this? The goal of register allocation (optimization) is to put infinite variables into a few registers need liveness analysis
Example Consider this TAC: Three variables: a, b, and c. And assume that the target machine has only one register: r. Is it possible to put all three variables “ a ”, “ b ” and “ c ” in register “ r ” ? a = 1 b = a + 2 c = b + 3 return c
Example Calculate which variable is “ live ” at a given program point. {c}{c} {b}{b} {a}{a} The “ liveness ” information gives live ranges. Live ranges don ’ t overlap, thus all three variables can be put into one reg ’. Consider this TAC: a = 1 b = a + 2 c = b + 3 return c
Example Register allocation: a => r b => r c => r {c}{c} {b}{b} {a}{a} Code rewriting: r = 1 r = r + 2 r = r + 3 return r Consider this TAC: a = 1 b = a + 2 c = b + 3 return c
Data Flow Equations for Liveness Inside basic blocks (backward): in = use[n] \/ (out - def[n]) // Example: a = 1 b = a + 2 c = b + 3 return c // Example: a = 1 b = a + 2 c = b + 3 return a + c int out
For general CFG Equations: in[n] = use[n]\/(out[n]-def[n]) out[n] = \/ s ∈ succ[n] in[s] Fixpoint algorithm init in out sets with {} loop until no set changes use[n] def[n] in[n] out[n]
Example in/out 1{} {} {a} … 2{} {a} {}{a} {b,c} … 3{} {b,c} {}{b,c}{b} … 4{} {b} {}{b}{a,c} … 5{} {a} {a}{a,c} … 6{} {c} {} … a = 0 b = a + 1 c = c + b a = b * 2 a<N return c node def{a}{b}{c}{a}{} use{}{a}{b, c}{b}{a, N}{c} {a,c}{a,c} {b,c}{b,c} {b,c}{b,c} {a,c}{a,c} {a,c}{a,c} Final live_out Loop the nodes with order: 1, 2, 3, 4, 5, 6 {c}{c} in[n] = use[n] \/ (out[n]-def[n]) out[n] = \/ s\in succ[n] in[s]
Interference Graph a = 0 b = a + 1 c = c + b a = b * 2 a<N return c {a,c}{a,c} {b,c}{b,c} {b,c}{b,c} {a,c}{a,c} {a,c}{a,c} Final live_out {c}{c} For any two variable x and y, if they are live simultaneously, then draw an (undirected) edge x->y. a b c
Speeding-up the analysis Ordering the nodes for liveness analysis: reverse top-sort order You do this in lab 5 Once a variable Careful selection of set representation Careful data structure engineering Say: bit-vector Basic block You do this in lab 5
Basic Blocks Step 1: calculate def and use for each basic block b one pass backward calculation Step 2: do liveness analysis on each block just as discussed above Step 3: calculate liveness information for each statement in each block one pass backward calculation
Example out/in 3{} {} {c} 2{} {c} {a,c}{a,c} 1{} {a,c} {c} a = 0 b = a + 1 c = c + b a = b * 2 a<N return c block123 def{a}{a,b,c}{} use{}{a,c}{c} This set does NOT contain variable “ b ”. Why? Blocks are reverse topo- sort ordered live_out for each block {a,c} {} Backward calculation of live_out for each statement. {a,c} {b,c}
Reaching Definition
a = 0 b = a + 1 c = c + b a = b * 2 a<N return c E.g., can we substitute the variable a with 0? The problem: at any program point, we ’ d like to know where the value of a variable x is defined. If so, we are doing the so- called constant propagation optimization.
Implementation a = 0 b = a + 1 c = c + b a = b * 2 a<N return c Number each definition: Here we number the four definition with 5, 6, 7, 8, which have no special meaning, just: 1. they are different from the block number, and 2. they are all unique.) 5: 6: 7: 8:
Equations a = 0 b = a + 1 c = c + b a = b * 2 a<N return c Calculate def and kill for each block, based on the equation for statement: def[d: x= … ] = {d} kill[d: x= … ] = defs(x)-{d} 5: 6: 7: 8: def[1] = {5} kill[1] = {8} def[2] = {6,7,8} kill[2] = {5} def[3] = {} kill[3] = {}
Data Flow Equation Forward calculation: in[b] = \/ q ∈ pred(b) out[b] out[b] = def[b]\/(in[b]-kill[b])
Fixpoint algorithm a = 0 b = a + 1 c = c + b a = b * 2 a<N return c : 6: 7: 8: block123 def{5}{6,7,8}{} kill{8}{5}{} in/out 1{} {} {5} 2{} {5} {6,7,8}{5,6,7,8} {6,7,8} 3{} {6,7,8} in[b] = \/ q ∈ pred(b) out[b] out[b] = def[b]\/(in[b]-kill[b]) {} {5,6,7,8} {6,7,8}
Constant Propagation a = 0 b = a + 1 c = c + b a = b * 2 a<N return c : 6: 7: 8: {} {5,6,7,8} {6,7,8} Can we substitute the variable a here with the constant “ 0 ” ? No! Because there are two definitions for “ a ” which may reach this point: 5 and 8.
Available Expressions
a = 0 b = a + 1 c = c + b a = a + 1 a<N return c E.g., has the right-side expression “ a+1 ” been calculated and thus available here? So the second calculation can be avoided! The problem: at a given program point, we ’ d like to know whether or not the value of an expression e has been calculated and is also available. 1.The expression e must be calculated on every path to the point, and 2.variables used in e must not been redefined after the initial calculation.
Implementation a = 0 b = a + 1 c = c + b a = a + 1 a<N return c Calculate gen and kill for each block, based on the equation for statement. (Tiger table 17.4) gen[1] = {} kill[1] = {a+1} gen[2] = {} kill[2] = ALL gen[3] = {} kill[3] = {} All possible expressions: ALL={a+1, c+b}
Implementation a = 0 b = a + 1 c = c + b a = a + 1 a<N return c Calculate in/out for each block, based on the fixpoint algorithm. gen[1] = {} kill[1] = {a+1} gen[2] = {} kill[2] = ALL gen[3] = {} kill[3] = {} All available expressions: ALL={a+1, c+b} in/out 1{} ALL{} 2ALL {} 3ALL {}
Implementation a = 0 b = a + 1 c = c + b a = a + 1 a<N return c Calculate in/out for each statement, based on the in/out for each block. {} All available expressions: ALL={a+1, c+b} in/out 1{} ALL{} 2ALL {} 3ALL {} {a+1} {}
Common Sub-expression Elimination (CSE) a = 0 b = a + 1 c = c + b a = a + 1 a<N return c E.g., has the right-side expression “ a+1 ” been calculated and thus available here? So the second calculation can be avoided! After the available expression analysis, we know “ a+1 ” is available, so the second calculation can be omitted! return c {} {a+1} {} b But with which variable the expression “ a+1 ” should be substituted? We need to do reaching expression analysis... (Read the text and do homework!)