Control Flow Analysis (Chapter 7) Mooly Sagiv (with Contributions by Hanne Riis Nielson)
Outline What is Control Flow Analysis? Motivating Example Structure of an optimizing compiler A motivating example Constructing basic blocks Depth first search Finding dominators Reducibility Interval and Structural Analysis Conclusions
Control Flow Analysis Input: A sequence of IR Output: –A partition of the IR into basic blocks –A control flow graph –The loop structure
Compiler Structure Symbol table and access routines OS Interface String of characters Scanner tokens Semantic analyzer Parser Code Generator IR AST Object code
Optimizing Compiler Structure String of characters Front-End IR Control Flow Analysis CFG Data Flow Analysis CFG+information Program Transformations instruction selection Object code
An Example Reaching Definitions A definition --- an assignment to variable An assignment d reaches a program point block if there exists an execution path to the this point in which the value assigned at d is still active
Running Example unsigned int fib(unsigned int m) {unsigned int f0=0, f1=1, f2, i; if (m <= 1) { return m; } else { for (i=2, i <=m, i++) { f2=f0+f1; f0=f1; f1 =f2;} return f2; } } 1: receive m(val) 2: f0 0 3: f1 1 4: if m <= 1 goto L3 5: i 2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: i i : goto L1 13: L3: return m 1 1, 2 1, 2, 3 1, 2, 3, 5 1, 2, 3, 5, 8, 9, 10, 11 1, 3, 5, 8, 9, 10, 11 1, 5, 8, 9, 10, 11 1, 8, 9, 10, 11
1: receive m(val) 2: f0 0 3: f1 1 4: if m <= 1 goto L3 5: i 2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: i i : goto L1 13: L3: return m entry exit 2, 3 2, 3, 5,8,9, 10, 11 2,3 2, 3, 5,8,9, 10, 11
Approaches for Data Flow Analysis Iterative –Compute natural loops and iterate on CFG Interval Based –Reduce the CFG to single node –Inductively define the data flow solution Structural –Identify control flow structures in the CFG –Inductively define the data flow solution
1: receive m(val) 2: f0 0 3: f1 1 4: if m <= 1 goto L3 5: i 2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: i i : goto L1 13: L3: return m entry exit 2, 3 2, 3, 5 2, 3, 5, 8,9, 10, 11 2, 3, 5 2,3 2, 3, 5, 8,9, 10, 11,8, 9, 10, 11
1: receive m(val) 2: f0 0 3: f1 1 4: if m <= 1 goto L3 5: i 2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: i i : goto L1 13: L3: return m entry exit {9, 10}, {1, 2, 3} {11}, {5} {2, 3, 5}, {8, 9, 10, 11}
entry exit {9, 10}, {1, 2, 3} {11}, {5} , {8, 9, 10, 11}
entry exit {9, 10}, {1, 2, 3} , {8, 9, 10, 11, 5}
entry exit {9, 10}, {1, 2, 3} , {8, 9, 10, 11, 5}
entry exit {9, 10}, {1, 2, 3} , {8, 9, 10, 11, 5}
entry exit {9, 10}, {1, 2, 3} , {8, 9, 10, 11, 5}
entry exit , {1, 2, 3, 8, 9, 10, 11, 5}
entry exit , {1, 2, 3, 8, 9, 10, 11, 5}
Finding Basic Blocks A basic block is the maximal sequence of straight-line IR instructions –no fork-join A leader IR instruction –the entry of a routine –a target of a branch –instruction immediately following branch
Constructing basic blocks Input: a sequence of MIR instructions Output: a list of basic blocks where each MIR instruction occurs in exactly one block Method: determine the leaders of the basic blocks: - the first instruction in the procedure is a leader - any instruction that is the target of a jump is a leader - any instruction after branch is a leader for each leader its basic block consists of - the leader and - all instructions up to but not including the next leader or the end of the program
Running Example unsigned int fib(unsigned int m) {unsigned int f0=0, f1=1, f2, i; if (m <= 1) { return m; } else { for (i=2, i <=m, i++) { f2=f0+f1; f0=f1; f1 =f2;} return f2; } } 1: receive m(val) 2: f0 0 3: f1 1 4: if m <= 1 goto L3 5: i 2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: i i : goto L1 13: L3: return m
Running Example unsigned int fib(unsigned int m) {unsigned int f0=0, f1=1, f2, i; if (m <= 1) { return m; } else { for (i=2, i <=m, i++) { f2=f0+f1; f0=f1; f1 =f2;} return f2; } } 1: receive m(val) 2: f0 0 3: f1 1 4: if m <= 1 goto L3 5: i 2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: i i : goto L1 13: L3: return m
Running Example unsigned int fib(unsigned int m) {unsigned int f0=0, f1=1, f2, i; if (m <= 1) { return m; } else { for (i=2, i <=m, i++) { f2=f0+f1; f0=f1; f1 =f2;} return f2; } } 1: receive m(val) 2: f0 0 3: f1 1 4: if m <= 1 goto L3 5: i 2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: i i : goto L1 13: L3: return m B1 B2 B3 B4 B5 B6
Constructing Control Flow Graph (CFG) Special entry block r without successors Special exit block without predecessors There is an edge m n –m= entry and the first instruction in n begins the procedure –n=exit and the last instruction in m is return or the last instruction in the procedure –there is a branch from the last instruction in m into the first instruction in n –the first instruction in n immediately follows the last non-branch instruction in m
Running Example 1: receive m(val) 2: f0 0 3: f1 1 4: if m <= 1 goto L3 5: i 2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: i i : goto L1 13: L3: return m B1 B2 B3 B4 B5 B6
1: receive m(val) 2: f0 0 3: f1 1 4: if m <= 1 goto L3 5: i 2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: i i : goto L1 13: L3: return m entry exit
How to treat call instructions? A call is an atomic instruction A call ends a basic block Replace the call by the procedure body (inline) A call is a “goto” into the procedure A call is handled in a special way
Potential Difficulties Gotos outside procedure boundaries Exit/Trap calls Exception handling Computed gotos setjump(), lonjump() calls
Approaches for Data Flow Analysis Iterative –Compute natural loops and iterate on CFG Interval Based –Reduce the CFG to single node –Inductively define the data flow solution Structural –Identify control flow structures in the CFG
Identifying Natural Loops A basic block m dominates a basic block n if every path from entry to n includes m The domination relationship is: reflexive, transitive, and anti-symmetric can be represented as a tree A back edge m n n dominates m The natural loop contains the blocks on the paths from n to m
1: receive m(val) 2: f0 0 3: f1 1 4: if m <= 1 goto L3 5: i 2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: i i : goto L1 13: L3: return m entry exit B0 B1 B2 B3 B5 B6 B7 B4
Reducible Flow Graphs All the loops are natural Can be “reduced” into a single node via a sequence of special transformations –Example T1, T2 transformations Every loop has a single entry Result from “well structured” programs Most programs compiled into reducible flow graphs
T1/T2 Transformations T1 T2
Bad Example B4B5 B1 B2B3
Node Splitting B1 B2 B3 B4 B5 B1 B2 B3 B4 B5 B3a
Why can’t we construct loops from source? Language dependent Non uniform Source to source transformations Most programming languages support “wild” GOTOs
Depth-first spanning tree Input: a flow graph G = (N,E,r) Output:a depth-first spanning tree (N,T) Method:T := Ø; for each node n in N do mark n unvisited; call DFS(r) Using:procedure DFS(n) is mark n visited; for each n s in E do if s is not visited then add the edge n s to T; call DFS(s)
Better DFS Implementations Explicit stack instead of recursion Pointer reversal
Pre-ordering Input: a flow graph G=(N,E,r) Output:a depth-first spanning tree (N,T) and ordering Pre of N Method:T := Ø; for each node n in N do mark n unvisited; i := 1; call DFS(r) Using:procedure DFS(n) is mark n visited; Pre(n) := i; i := i + 1; for each n s in E do if s is not visited then add the edge n s to T; call DFS(s);
Computing dominators Input:a flow graph G=(N,E,r) Output:for each node n, a set DOM(n) of dominators Method:DOM(r) := { r }; for each n in N \ { r } do DOM(n) := N; while changes in some DOM(n) do for each n in N \ { r } do DOM(n) := { n } U { DOM(p) | p n is in E }
1: receive m(val) 2: f0 0 3: f1 1 4: if m <= 1 goto L3 5: i 2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2 f0 + f1 9: f0 f1 10: f1 f2 11: i i : goto L1 13: L3: return m entry exit B0 B1 B2 B3 B5 B6 B7 B4
Other Algorithms for Finding Dominators Lengauer & Tarjan e log n algorithm Harel linear time algorithm Thorup linear time algorithm Alstrup & Lauridsen incremental algorithm
Computing natural loops Input:a flow graph G=(N,E,r) and a backedge m n Output:a set, loop, of the nodes in the natural loop of m n Method:stack := empty; loop := {n}; call add(m); while stack is not empty do pop d from the stack; for each p with p d in E do call add(p) Using:procedure add(p) is if p is not in loop then loop := loop U {p}; push p on the stack
Issues Natural loops with disjoint headers are disjoint or nested within each other But what about loops which share a header?
Two Loops with the same header B1: i =1 if (i >= 100) goto B4 else if ((i %10)==0) goto B3 else B2:.... i++; goto B1 B3:.... i++; goto B1 B4:... B1: if (i < j) goto B2 else if (i > j) goto B3 else goto B4 B2:.... i++; goto B1 B3:.... i++; goto B1 B4:...
Strongly connected components Input:a flow graph G = (N,E,r) Output:a set of strongly connected components Method: for all n in N do mark n unvisited i := 1; stack := empty while there exists unvisited node n do call SCC(n) Using:procedure SCC(n) is...
procedure SCC(n) is mark n visited; Pre(n) := i; Low(n) := i; (lowest number for node in SCC) i := i+1; push n on the stack; for each n -> s in E do if s is not visited then call SCC(s); Low(n) := min(Low(n),Low(s)) else if Pre(s) < Pre(n) and s is on the stack (back or cross edge) then Low(n) := min(Low(n),Pre(s)); if Low(n) = Pre(n) (n is the root of an SCC) then SCC := Ø; repeat pop d off the stack; SCC := SCC U {d} until d = n; return SCC
Structural Analysis Identify “common” structures in the control flow graph (even irreducible) Reduce the CFG into “simple-regions” Shift some dataflow analysis from compile- time to compiler-generation-time Can be efficiently implemented via DFS
Block Schema B1 B2 Bn
Conditionals B1 B2 B1 B2 B3 B0 B1 B2 Bn
Loops B1 B2 B1 B2 B1 B2B3