Control Flow Analysis (Chapter 7)

Control Flow Analysis (Chapter 7)

Outline What is Control Flow Analysis?
Structure of an optimizing compiler Constructing basic blocks Depth first search Finding dominators Reducibility Interval and Structural Analysis Conclusions

Control Flow Analysis Input: A sequence of IR Output:
A partition of the IR into basic blocks A control flow graph The loop structure

Compiler Structure String of characters Scanner tokens Parser
Symbol table and access routines AST OS Interface Semantic analyzer IR Code Generator Object code

Optimizing Compiler Structure
String of characters Front-End IR Control Flow Analysis CFG Data Flow Analysis CFG+information Program Transformations Object code IR instruction selection

An Example Reaching Definitions
A definition --- an assignment to variable An assignment d reaches a program point block if there exists an execution path to the this point in which the value assigned at d is still active

Running Example MIR intermediate code for C routine 1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m unsigned int fib(unsigned int m) {unsigned int f0=0, f1=1, f2, i; if (m <= 1) { return m; } else { for (i=2, i <=m, i++) { f2=f0+f1; f0=f1; f1 =f2;} return f2; } 1 1, 2 1, 2, 3 1, 2, 3, 5 1, 2, 3, 5, 8, 9, 10, 11 1, 2, 3, 5, 8, 9, 10, 11 1, 3, 5, 8, 9, 10, 11 1, 5, 8, 9, 10, 11 1, 8, 9, 10, 11

Our first task is in analysing program to discover its control structure.
Control structure is obvious in source code but difficult in case of Intermediate code, so we create visual representation, namely flow chart as shown in the following figure. Flow chart is in the following diagram

Identify basic blocks: where each basic block is informally a straight-line sequence of code that can be entered only at the beginning and exited only at the end. Node 1-4 basic block B1 Node 8-11 form Block B6 Node 12 into block B2 Node 5 into B3 Node 6 into B4 Node 7 into B5

Approaches for Control Flow Analysis
Iterative Compute natural loops and iterate on CFG Interval Based Reduce the CFG to single node Inductively define the data flow solution Structural Identify control flow structures in the CFG

entry  1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m  2, 3 2, 3, 5 ,8, 9, 10, 11 2, 3, 5, 8,9, 10, 11 2, 3, 5 , 8, 9, 10, 11 2, 3, 5, 8,9, 10, 11 2,3 exit

entry exit {9, 10}, {1, 2, 3} {11}, {5} {2, 3, 5}, {8, 9, 10, 11}
1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m {9, 10}, {1, 2, 3} {11}, {5} {2, 3, 5}, {8, 9, 10, 11} exit

entry {11}, {5} , {8, 9, 10, 11} {9, 10}, {1, 2, 3} exit

entry  1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m  2, 3 2, 3, 5,8,9, 10, 11 2, 3, 5, 8,9, 10, 11 2, 3, 5, 8,9, 10, 11 2, 3, 5,8,9, 10, 11 2,3 exit

entry {9, 10}, {1, 2, 3} , {8, 9, 10, 11, 5} exit

entry , {8, 9, 10, 11, 5} {9, 10}, {1, 2, 3} exit

entry , {1, 2, 3, 8, 9, 10, 11, 5} exit

Finding Basic Blocks A basic block is the maximal sequence of straight-line IR instructions no fork-join A leader IR instruction the entry of a routine a target of a branch instruction immediately following branch

Constructing basic blocks
Input: a sequence of MIR instructions Output: a list of basic blocks where each MIR instruction occurs in exactly one block Method: determine the leaders of the basic blocks: - the first instruction in the procedure is a leader - any instruction that is the target of a jump is a leader - any instruction after branch is a leader for each leader its basic block consists of - the leader and - all instructions up to but not including the next leader or the end of the program

Running Example unsigned int fib(unsigned int m)
{unsigned int f0=0, f1=1, f2, i; if (m <= 1) { return m; } else { for (i=2, i <=m, i++) { f2=f0+f1; f0=f1; f1 =f2;} return f2; } 1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m

Running Example B1 B2 B3 B4 B5 B6 unsigned int fib(unsigned int m)
{unsigned int f0=0, f1=1, f2, i; if (m <= 1) { return m; } else { for (i=2, i <=m, i++) { f2=f0+f1; f0=f1; f1 =f2;} return f2; } 1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m B1 B2 B3 B4 B5 B6

Constructing Control Flow Graph (CFG)
Special entry block r without successors Special exit block without predecessors There is an edge m  n m= entry and the first instruction in n begins the procedure n=exit and the last instruction in m is return or the last instruction in the procedure there is a branch from the last instruction in m into the first instruction in n the first instruction in n immediately follows the last non-branch instruction in m

Running Example B1 B2 B3 B4 B5 B6 1: receive m(val) 2: f0  0
4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m B1 B2 B3 B4 B5 B6

entry exit 1: receive m(val) 2: f0  0 3: f1  1
4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m exit

How to treat call instructions?
A call is an atomic instruction A call ends a basic block Replace the call by the procedure body (inline) A call is a “goto” into the procedure A call is handled in a special way

Potential Difficulties
Gotos outside procedure boundaries Exit/Trap calls Exception handling Computed gotos setjump(), lonjump() calls

DFS, Pre order Traversal & Post Order Traversal
All these four apply to rooted, directed graph and to flow graph. DFS:- visits the descendants of a node in the graph before visiting any of its siblings that are not also its descendants.

Rooted directed graph with Depth-First presentation of it.

DFS algorithm

The Depth First presentation includes all the graph’s nodes and the edges that make up the depth first order displayed as a tree. The edges that are part of the DFS are called tree edges and that are not part of Depth First spanning tree is divide into 3 classes: 1. forward edges – F 2. Back edges – B 3. Cross Edge - C

To get DFS for a given graph, an instance of DFS that computes both a Depth First Spanning tree and pre order and post order traversal of the graph G = <N, E> with root r. DFS for the given graph is : 1,2,3,4,5,6,7,8 BFS for the given graph is :1,2,6,3,4,5,7,8

Dominators and Post dominators
To determine the loops in a flow graph, we first define a binary relation called dominance on flow graph nodes. Node d dominates node i, written d dom i, if every possible execution path from entry to i includes d. dom is reflexive (Every node dominates itself), transitive (if a dom b and b dom c, then a dom c) and antisymmetric (if a dom b and b dom a , then b=a)

Immediate dominance idom : such that for a ≠ b , a idom b iff, a dom b and there does not exist a node c such that c ≠ b for which a dom c and c dom b, and we write idom(b) to denote the immediate dominance of b. Clearly, the immediate dominator of a node is unique.

The immediate dominance relation forms a tree of the nodes of a flow graph whose root is the entry node, whose edges are the immediate dominances, and whose paths display all the dominance relationships. d strictly dominates i, written d sdom i, if d dominates i and d ≠ i Node p post dominates node i, written p pdom i, if every possible execution path from i to exit includes p, i.e i dom p in flow graph with all the edges reversal and entry and exit interchanged.

Loops and strongly connected components
Back edge shows loops: back edge d to c which shows loop

Given a back edge mn, the natural loop of mn is the subgraph consisting of the set of nodes containing n and all the nodes from which m can be reached in the flowgraph without passing through n and the edge set connecting all the nodes in its node set. Node n is the loop header Algorithm is used to compute set of nodes and the set of edges of the loop it needed.

Many optimizations require moving code from inside a loop to just before its header. To get this, we introduced the concept of pre header. Pre header is initially an empty block place just before the header of a loop, such that all the edges that previously went to the header from outside the loop now go to pre header and there is a single new edge from pre header to the header

Loop without and with pre header

If two loops have the same header then they are either nested loop or disjoint.
The most general looping structure that may occur is a strongly connected components (SCC) of a flow graph, which is a sub graph Gs = <Ns, Es> such that every node by a path that includes only edges in Es. A strongly connected component is maximal if every strongly connected component containing it is the component itself.

Reducibility It results from several kinds of transformations that can be applied to flow graphs that collapse sub graphs into single nodes and hence reduce the flow graph. The pattern make flow graphs irreducible are called improper regions and they are multiple entry strongly connected components of a flow graph.

The simplest improper region is the two entry loop & three entry loop as in the above fig.
It is easy to show how to produce an infinite sequence of distinct improper regions beginning with these two.

Practical approaches to deal with irreducibility
1. Iterative data flow analysis on irreducible regions and to plug the results into the data flow equations for the rest of the flow graphs. 2. Use technique called node splitting that transforms irreducible regions into reducible regions. 3. To perform an induced iteration on the lattice of monotone functions from the lattice to itself (chap 8).

The result of node splitting to B3

Interval analysis and control trees
Interval analysis is a name given to several approaches to both control and data flow analysis In control flow analysis interval analysis refers to dividing up the flow graph into regions of various sorts consolidating each region into a new node. A flow graph resulting from one or more such transformations is called an abstract flow graph.

The result of applying a sequence of such transformations produces a control tree, defined as follows: 1. the root of control tree is an abstract graph representing the original flow graph. 2. the leaves of the control trees are individual basic blocks. 3. the nodes between the root and the leaves are abstract nodes representing regions of the flow graph. 4.The edges represent relationship between each abstract node and the regions that are its descendants.

Ex T1-T2 Analysis T1: collapses a one -node self loop to a single node T2: collapses a sequence of two nodes such that the first is the only predecessor of the second to a single node

Structural Analysis Identify “common” structures in the control flow graph (even irreducible) Reduce the CFG into “simple-regions” Shift some dataflow analysis from compile-time to compiler-generation-time Can be efficiently implemented via DFS

Examples of acyclic regions used in structural analysis

Examples of cyclic regions used in structural analysis

Structural analysis of a flow graph

Control Flow Analysis (Chapter 7)

Similar presentations

Presentation on theme: "Control Flow Analysis (Chapter 7)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Control Flow Analysis (Chapter 7)

Similar presentations

Presentation on theme: "Control Flow Analysis (Chapter 7)"— Presentation transcript:

Similar presentations

About project

Feedback