Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Flow Analysis Compiler Baojian Hua

Similar presentations


Presentation on theme: "Data Flow Analysis Compiler Baojian Hua"— Presentation transcript:

1 Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

2 Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer

3 Middle End AST translation IR1 asm other IR and translation translation IR2

4 Optimizations AST translation IR1 asm other IR and translation translation IR2 opt

5 General Scheme for Optimization Analysis control flow, data flow, dependency, … to obtain conservative static knowledge of the program being optimized approximation of the dynamic Rewriting rewrite the program dependent on the knowledge obtained above IR IR ’ static information analysis rewriting

6 “ Conservative Static ” Cjump (x==5? L1: L2) y = 1y = 2 print (y) Can we substitute y with the value 2? This amounts to prove that x is always equal to 5! Suppose x is an input from user, it ’ s impossible to know it ’ s value statically. So one must be conservative to use the static knowledge.

7 Liveness Analysis

8 Motivation Low level IRs assume an infinite number of abstract “ registers ” good for code generations but bad for execution on a real machine machine has a finite number of registers so how to leverage this? The goal of register allocation (optimization) is to put infinite variables into a few registers need liveness analysis

9 Example Consider this TAC: Three variables: a, b, and c. And assume that the target machine has only one register: r. Is it possible to put all three variables “ a ”, “ b ” and “ c ” in register “ r ” ? a = 1 b = a + 2 c = b + 3 return c

10 Example Calculate which variable is “ live ” at a given program point. {c}{c} {b}{b} {a}{a} The “ liveness ” information gives live ranges. Live ranges don ’ t overlap, thus all three variables can be put into one reg ’. Consider this TAC: a = 1 b = a + 2 c = b + 3 return c

11 Example Register allocation: a => r b => r c => r {c}{c} {b}{b} {a}{a} Code rewriting: r = 1 r = r + 2 r = r + 3 return r Consider this TAC: a = 1 b = a + 2 c = b + 3 return c

12 Data Flow Equations for Liveness Inside basic blocks (backward): in = use[n] \/ (out - def[n]) // Example: a = 1 b = a + 2 c = b + 3 return c // Example: a = 1 b = a + 2 c = b + 3 return a + c int out

13 For general CFG Equations: in[n] = use[n]\/(out[n]-def[n]) out[n] = \/ s ∈ succ[n] in[s] Fixpoint algorithm init in out sets with {} loop until no set changes use[n] def[n] in[n] out[n]

14 Example in/out 1{} {} {a} … 2{} {a} {}{a} {b,c} … 3{} {b,c} {}{b,c}{b} … 4{} {b} {}{b}{a,c} … 5{} {a} {a}{a,c} … 6{} {c} {} … a = 0 b = a + 1 c = c + b a = b * 2 a<N return c 1 2 3 4 5 6 node123456 def{a}{b}{c}{a}{} use{}{a}{b, c}{b}{a, N}{c} {a,c}{a,c} {b,c}{b,c} {b,c}{b,c} {a,c}{a,c} {a,c}{a,c} Final live_out Loop the nodes with order: 1, 2, 3, 4, 5, 6 {c}{c} in[n] = use[n] \/ (out[n]-def[n]) out[n] = \/ s\in succ[n] in[s]

15 Interference Graph a = 0 b = a + 1 c = c + b a = b * 2 a<N return c 1 2 3 4 5 6 {a,c}{a,c} {b,c}{b,c} {b,c}{b,c} {a,c}{a,c} {a,c}{a,c} Final live_out {c}{c} For any two variable x and y, if they are live simultaneously, then draw an (undirected) edge x->y. a b c

16 Speeding-up the analysis Ordering the nodes for liveness analysis: reverse top-sort order You do this in lab 5 Once a variable Careful selection of set representation Careful data structure engineering Say: bit-vector Basic block You do this in lab 5

17 Basic Blocks Step 1: calculate def and use for each basic block b one pass backward calculation Step 2: do liveness analysis on each block just as discussed above Step 3: calculate liveness information for each statement in each block one pass backward calculation

18 Example out/in 3{} {} {c} 2{} {c} {a,c}{a,c} 1{} {a,c} {c} a = 0 b = a + 1 c = c + b a = b * 2 a<N return c 1 2 3 block123 def{a}{a,b,c}{} use{}{a,c}{c} This set does NOT contain variable “ b ”. Why? Blocks are reverse topo- sort ordered live_out for each block {a,c} {} Backward calculation of live_out for each statement. {a,c} {b,c}

19 Reaching Definition

20 a = 0 b = a + 1 c = c + b a = b * 2 a<N return c 1 2 3 E.g., can we substitute the variable a with 0? The problem: at any program point, we ’ d like to know where the value of a variable x is defined. If so, we are doing the so- called constant propagation optimization.

21 Implementation a = 0 b = a + 1 c = c + b a = b * 2 a<N return c 1 2 3 Number each definition: Here we number the four definition with 5, 6, 7, 8, which have no special meaning, just: 1. they are different from the block number, and 2. they are all unique.) 5: 6: 7: 8:

22 Equations a = 0 b = a + 1 c = c + b a = b * 2 a<N return c 1 2 3 Calculate def and kill for each block, based on the equation for statement: def[d: x= … ] = {d} kill[d: x= … ] = defs(x)-{d} 5: 6: 7: 8: def[1] = {5} kill[1] = {8} def[2] = {6,7,8} kill[2] = {5} def[3] = {} kill[3] = {}

23 Data Flow Equation Forward calculation: in[b] = \/ q ∈ pred(b) out[b] out[b] = def[b]\/(in[b]-kill[b])

24 Fixpoint algorithm a = 0 b = a + 1 c = c + b a = b * 2 a<N return c 1 2 3 5: 6: 7: 8: block123 def{5}{6,7,8}{} kill{8}{5}{} in/out 1{} {} {5} 2{} {5} {6,7,8}{5,6,7,8} {6,7,8} 3{} {6,7,8} in[b] = \/ q ∈ pred(b) out[b] out[b] = def[b]\/(in[b]-kill[b]) {} {5,6,7,8} {6,7,8}

25 Constant Propagation a = 0 b = a + 1 c = c + b a = b * 2 a<N return c 1 2 3 5: 6: 7: 8: {} {5,6,7,8} {6,7,8} Can we substitute the variable a here with the constant “ 0 ” ? No! Because there are two definitions for “ a ” which may reach this point: 5 and 8.

26 Available Expressions

27 a = 0 b = a + 1 c = c + b a = a + 1 a<N return c 1 2 3 E.g., has the right-side expression “ a+1 ” been calculated and thus available here? So the second calculation can be avoided! The problem: at a given program point, we ’ d like to know whether or not the value of an expression e has been calculated and is also available. 1.The expression e must be calculated on every path to the point, and 2.variables used in e must not been redefined after the initial calculation.

28 Implementation a = 0 b = a + 1 c = c + b a = a + 1 a<N return c 1 2 3 Calculate gen and kill for each block, based on the equation for statement. (Tiger table 17.4) gen[1] = {} kill[1] = {a+1} gen[2] = {} kill[2] = ALL gen[3] = {} kill[3] = {} All possible expressions: ALL={a+1, c+b}

29 Implementation a = 0 b = a + 1 c = c + b a = a + 1 a<N return c 1 2 3 Calculate in/out for each block, based on the fixpoint algorithm. gen[1] = {} kill[1] = {a+1} gen[2] = {} kill[2] = ALL gen[3] = {} kill[3] = {} All available expressions: ALL={a+1, c+b} in/out 1{} ALL{} 2ALL {} 3ALL {}

30 Implementation a = 0 b = a + 1 c = c + b a = a + 1 a<N return c 1 2 3 Calculate in/out for each statement, based on the in/out for each block. {} All available expressions: ALL={a+1, c+b} in/out 1{} ALL{} 2ALL {} 3ALL {} {a+1} {}

31 Common Sub-expression Elimination (CSE) a = 0 b = a + 1 c = c + b a = a + 1 a<N return c 1 2 3 E.g., has the right-side expression “ a+1 ” been calculated and thus available here? So the second calculation can be avoided! After the available expression analysis, we know “ a+1 ” is available, so the second calculation can be omitted! return c 1 2 3 {} {a+1} {} b But with which variable the expression “ a+1 ” should be substituted? We need to do reaching expression analysis... (Read the text and do homework!)


Download ppt "Data Flow Analysis Compiler Baojian Hua"

Similar presentations


Ads by Google