Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Loops.

Similar presentations


Presentation on theme: "Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Loops."— Presentation transcript:

1 Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Loops

2 “Advanced Compiler Techniques” Content  Concepts : Dominators Depth - First Ordering Back edges Graph depth Reducibility  Natural Loops  Efficiency of Iterative Algorithms  Dependences & Loop Transformation 2

3 “Advanced Compiler Techniques” Loops are Important!  Loops dominate program execution time Needs special treatment during optimization  Loops also affect the running time of program analyses e. g., A dataflow problem can be solved in just a single pass if a program has no loops 3

4 “Advanced Compiler Techniques” Dominators  Node d dominates node n if every path from the entry to n goes through d. written as : d dom n  Quick observations :  Every node dominates itself.  The entry dominates every node.  Common Cases : The test of a while loop dominates all blocks in the loop body. The test of an if - then - else dominates all blocks in either branch. 4

5 “Advanced Compiler Techniques” Dominator Tree  Immediate dominance : d idom n d dom n, d  n, no m s. t. d dom m and m dom n  Immediate dominance relationships form a tree 1 3 5 2 4 1 3 5 2 4 5

6 “Advanced Compiler Techniques” Finding Dominators  A dataflow analysis problem : For each node, find all of its dominators. Direction : forward Confluence : set intersection Boundary : OUT [ Entry ] = { Entry } Initialization : OUT [ B ] = All nodes Equations :  OUT [ B ] = IN [ B ] U { B }  IN [ B ] =  p is a predecessor of B OUT [ p ] 6

7 Example: Dominators 7 1 3 5 2 4 {1,5} {1,4} {1,2,3} {1,2} {1} {1,2} “Advanced Compiler Techniques”

8 Depth-First Search  Start at entry.  If you can follow an edge to an unvisited node, do so.  If not, backtrack to your parent ( node from which you were visited ). 8

9 “Advanced Compiler Techniques” Depth-First Spanning Tree  Root = entry.  Tree edges are the edges along which we first visit the node at the head. 1 5 3 4 2 9

10 “Advanced Compiler Techniques” Depth-First Node Order  The reverse of the order in which a DFS retreats from the nodes. 1-4-5-2-3  Alternatively, reverse of postorder traversal of the tree. 3-2-5-4-1 1 3 5 2 4 10

11 “Advanced Compiler Techniques” Four Kinds of Edges 1. Tree edges. 2. Advancing edges ( node to proper descendant ). 3. Retreating edges ( node to ancestor, including edges to self ). 4. Cross edges ( between two nodes, neither of which is an ancestor of the other. 11

12 “Advanced Compiler Techniques” A Little Magic  Of these edges, only retreating edges go from high to low in DF order. Example of proof : You must retreat from the head of a tree edge before you can retreat from its tail.  Also surprising : all cross edges go right to left in the DFST. Assuming we add children of any node from the left. 12

13 Example: Non-Tree Edges 13 1 3 5 2 4 Retreating Forward Cross “Advanced Compiler Techniques”

14 14 Back Edges  An edge is a back edge if its head dominates its tail.  Theorem : Every back edge is a retreating edge in every DFST of every flow graph. Converse almost always true, but not always. Back edge Head reached before tail in any DFST Search must reach the tail before retreating from the head, so tail is a descendant of the head “Advanced Compiler Techniques”

15 Example: Back Edges 15 1 3 5 2 4 {1,5} {1,4} {1,2,3} {1,2} {1} “Advanced Compiler Techniques”

16 16 Reducible Flow Graphs  A flow graph is reducible if every retreating edge in any DFST for that flow graph is a back edge.  Testing reducibility : Remove all back edges from the flow graph and check that the result is acyclic.  Hint why it works : All cycles must include some retreating edge in every DFST. In particular, the edge that enters the first node of the cycle that is visited. “Advanced Compiler Techniques”

17 DFST on a Cycle 17 Depth - first search reaches here first Search must reach these nodes before leaving the cycle So this is a retreating edge “Advanced Compiler Techniques”

18 Why Reducibility?  Folk theorem : All flow graphs in practice are reducible.  Fact : If you use only while - loops, for - loops, repeat - loops, if - then (- else ), break, and continue, then your flow graph is reducible. 18

19 Example: Remove Back Edges 19 1 3 5 2 4 Remaining graph is acyclic. “Advanced Compiler Techniques”

20 Example: Nonreducible Graph 20 A CB In any DFST, one of these edges will be a retreating edge. A B C A B C But no heads dominate their tails, so deleting back edges leaves the cycle. “Advanced Compiler Techniques”

21 21 Why Care About Back/Retreating Edges? 1. Proper ordering of nodes during iterative algorithm assures number of passes limited by the number of “ nested ” back edges. 2. Depth of nested loops upper - bounds the number of nested back edges. “Advanced Compiler Techniques”

22 DF Order and Retreating Edges  Suppose that for a RD analysis, we visit nodes during each iteration in DF order.  The fact that a definition d reaches a block will propagate in one pass along any increasing sequence of blocks.  When d arrives at the tail of a retreating edge, it is too late to propagate d from OUT to IN. The IN at the head has already been computed for that round. 22

23 Example: DF Order 23 1 3 5 2 4 d d d d d d d d d d Definition d is Gen ’ d by node 2. The first pass The second pass “Advanced Compiler Techniques”

24 Depth of a Flow Graph  The depth of a flow graph with a given DFST and DF - order is the greatest number of retreating edges along any acyclic path.  For RD, if we use DF order to visit nodes, we converge in depth +2 passes. Depth +1 passes to follow that number of increasing segments. 1 more pass to realize we converged. 24

25 Example: Depth = 2 25 1->4->7 ---> 3->10->17 ---> 6->18->20 increasing retreating increasing retreating Pass 1 Pass 2 Pass 3 “Advanced Compiler Techniques”

26 Similarly...  AE also works in depth +2 passes. Unavailability propagates along retreat - free node sequences in one pass.  So does LV if we use reverse of DF order. A use propagates backward along paths that do not use a retreating edge in one pass. 26

27 “Advanced Compiler Techniques” In General...  The depth +2 bound works for any monotone framework, as long as information only needs to propagate along acyclic paths. Example : if a definition reaches a point, it does so along an acyclic path. 27

28 However...  Constant propagation does not have this property. 28 a = b b = c c = 1 L : a = b b = c c = 1 goto L “Advanced Compiler Techniques”

29 Why Depth+2 is Good  Normal control - flow constructs produce reducible flow graphs with the number of back edges at most the nesting depth of loops. Nesting depth tends to be small. A study by Knuth has shown that average depth of typical flow graphs = ~ 2.75. 29

30 Example: Nested Loops 30 3 nested while - loops ; depth = 3 3 nested repeat - loops ; depth = 1 “Advanced Compiler Techniques”

31 Natural Loops  A natural loop is defined by : A single entry - point called header  a header dominates all nodes in the loop A back edge that enters the loop header  Otherwise, it is not possible for the flow of control to return to the header directly from the " loop " ; i. e., there really is no loop. 31

32 “Advanced Compiler Techniques” Find Natural Loops  The natural loop of a back edge a -> b is { b } plus the set of nodes that can reach a without going through b  Remove b from the flow graph, find all predecessors of a  Theorem : two natural loops are either disjoint, identical, or nested. 32

33 Example: Natural Loops 33 1 3 5 2 4 Natural loop of 3 -> 2 Natural loop of 5 -> 1 “Advanced Compiler Techniques”

34 Relationship between Loops  If two loops do not have the same header they are either disjoint, or one is entirely contained ( nested within ) the other innermost loop : one that contains no other loop.  If two loops share the same header Hard to tell which is the inner loop Combine as one 1 2 3 4 34

35 Basic Parallelism  Examples : FOR i = 1 to 100 a [ i ] = b [ i ] + c [ i ] FOR i = 11 TO 20 a [ i ] = a [ i -1] + 3 FOR i = 11 TO 20 a [ i ] = a [ i -10] + 3  Does there exist a data dependence edge between two different iterations?  A data dependence edge is loop - carried if it crosses iteration boundaries  DoAll loops : loops without loop - carried dependences 35 “Advanced Compiler Techniques”

36 Data Dependence of Variables  True dependenc e  Anti - dependenc e a = = a a = = a  Output dependenc e  Input dependenc e 36

37 Affine Array Accesses  Common patterns of data accesses : ( i, j, k are loop indexes ) A [ i ], A [ j ], A [ i -1], A [0], A [ i + j ], A [2* i ], A [2* i +1], A [ i, j ], A [ i -1, j +1]  Array indexes are affine expressions of surrounding loop indexes Loop indexes : i n, i n -1,..., i 1 Integer constants : c n, c n -1,..., c 0 Array index : c n i n + c n -1 i n -1 +... + c 1 i 1 + c 0 Affine expression : linear expression + a constant term ( c 0 ) 37 “Advanced Compiler Techniques”

38 Formulating Data Dependence Analysis FOR i := 2 to 5 do A [ i -2] = A [ i ]+1; 38  Between read access A [ i ] and write access A [ i -2] there is a dependence if : there exist two iterations ir and iw within the loop bounds, s. t. iterations ir & iw read & write the same array element, respectively ∃ integers i w, i r 2 ≤i w, i r ≤ 5 i r = i w -2  Between write access A [ i -2] and write access A [ i -2] there is a dependence if : ∃ integers i w, i v 2 ≤i w, i v ≤ 5 i w – 2= i v –2 To rule out the case when the same instance depends on itself : add constraint iw ≠ iv “Advanced Compiler Techniques”

39 Memory Disambiguation  Undecidable at Compile Time read ( n ) For i = … a [ i ] = a [ n ] 39 “Advanced Compiler Techniques”

40 Domain of Data Dependence Analysis  Only use loop bounds and array indexes that are affine functions of loop variables for i = 1 to n for j = 2 i to 100 a [ i + 2 j + 3][4 i + 2 j ][ i * i ] = … … = a [1][2 i + 1][ j ]  Assume a data dependence between the read & write operation if there exists : ∃ integers i r, j r, i w, j w 1 ≤ i w, i r ≤ n 2 i w ≤ j w ≤ 100 2 i r ≤ j r ≤ 10 i w + 2 j w + 3 = 1 4 i w + 2 j w = 2 i r + 1  Equate each dimension of array access ; ignore non - affine ones No solution  No data dependence Solution  There may be a dependence 40 “Advanced Compiler Techniques”

41 Iteration Space 41  An abstracti on for loops  Iteration is represent ed as coordinat es in iteration space. for i= 0, 5 for j = 0, 3 a[i, j] = 3 i j “Advanced Compiler Techniques”

42 Iteration Space 42  An abstracti on for loops for i = 0, 5 for j = i, 3 a[i, j] = 0 i j “Advanced Compiler Techniques”

43 Iteration Space 43  An abstracti on for loops for i = 0, 5 for j = i, 7 a[i, j] = 0 i j “Advanced Compiler Techniques”

44 Affine Access 44 “Advanced Compiler Techniques”

45 Affine Transform 45 i j u v “Advanced Compiler Techniques”

46 Loop Transformation 46 for i = 1, 100 for j = 1, 200 A [ i, j ] = A [ i, j ] + 3 end_for for u = 1, 200 for v = 1, 100 A [ v, u ] = A [ v, u ]+ 3 end_for “Advanced Compiler Techniques”

47 Old Iteration Space 47 for i = 1, 100 for j = 1, 200 A [ i, j ] = A [ i, j ] + 3 end_for “Advanced Compiler Techniques”

48 New Iteration Space 48 for u = 1, 200 for v = 1, 100 A [ v, u ] = A [ v, u ]+ 3 end_for “Advanced Compiler Techniques”

49 Old Array Accesses 49 for i = 1, 100 for j = 1, 200 A [ i, j ] = A [ i, j ] + 3 end_for “Advanced Compiler Techniques”

50 New Array Accesses 50 for u = 1, 200 for v = 1, 100 A [ v, u ] = A [ v, u ]+ 3 end_for “Advanced Compiler Techniques”

51 Interchange Loops? 51 for i = 2, 1000 for j = 1, 1000 A [ i, j ] = A [ i -1, j +1]+3 end_for e. g. dependence vector d old = (1,-1) i j for u = 1, 1000 for v = 2, 1000 A [ v, u ] = A [ v -1, u +1]+3 end_for “Advanced Compiler Techniques”

52 Interchange Loops?  A transformation is legal, if the new dependence is lexicographically positive, i. e. the leading non - zero in the dependence vector is positive.  Distance vector (1,-1) = (4,2)- (3,3)  Loop interchange is not legal if there exists dependence (+, -) 52 “Advanced Compiler Techniques”

53 GCD Test 53  Is there any dependence?  Solve a linear Diophantine equation 2* i w = 2* i r + 1 for i = 1, 100 a [2* i ] = … … = a [2* i +1] + 3 “Advanced Compiler Techniques”

54 GCD  The greatest common divisor ( GCD ) of integers a 1, a 2, …, a n, denoted gcd ( a 1, a 2, …, a n ), is the largest integer that evenly divides all these integers.  Theorem : The linear Diophantine equation has an integer solution x 1, x 2, …, x n iff gcd ( a 1, a 2, …, a n ) divides c 54 “Advanced Compiler Techniques”

55 Examples 55  Example 1: gcd (2,-2) = 2. No solutions  Example 2: gcd (24,36,54) = 6. Many solutions “Advanced Compiler Techniques”

56 Loop Fusion 56 for i = 1, 1000 A [ i ] = B [ i ] + 3 end_for for j = 1, 1000 C [ j ] = A [ j ] + 5 end_for for i = 1, 1000 A [ i ] = B [ i ] + 3 C [ i ] = A [ i ] + 5 end_for Better reuse between A [ i ] and A [ i ] “Advanced Compiler Techniques”

57 Loop Distribution 57 for i = 1, 1000 A [ i ] = A [ i -1] + 3 end_for for i = 1, 1000 C [ i ] = B [ i ] + 5 end_for for i = 1, 1000 A [ i ] = A [ i -1] + 3 C [ i ] = B [ i ] + 5 end_for 2 nd loop is parallel “Advanced Compiler Techniques”

58 Register Blocking for j = 1, 2* m for i = 1, 2* n A [ i, j ] = A [ i -1, j ] + A [ i -1, j -1] end_for for j = 1, 2* m, 2 for i = 1, 2* n, 2 A [ i, j ] = A [ i -1, j ] + A [ i -1, j -1] A [ i, j +1] = A [ i -1, j +1] + A [ i -1, j ] A [ i +1, j ] = A [ i, j ] + A [ i, j -1] A [ i +1, j +1] = A [ i, j +1] + A [ i, j ] end_for Better reuse between A [ i, j ] and A [ i, j ] “Advanced Compiler Techniques” 58

59 Virtual Register Allocation for j = 1, 2* M, 2 for i = 1, 2* N, 2 r 1 = A [ i -1, j ] r 2 = r 1 + A [ i -1, j -1] A [ i, j ] = r 2 r 3 = A [ i -1, j +1] + r 1 A [ i, j +1] = r 3 A [ i +1, j ] = r 2 + A [ i, j -1] A [ i +1, j +1] = r 3 + r 2 end_for Memory operation s reduced to register load / stor e 8 MN loads to 4 MN loads “Advanced Compiler Techniques” 59

60 Scalar Replacement for i = 2, N +1 = A [ i - 1]+1 A [ i ] = end_for t 1 = A [1] for i = 2, N +1 = t 1 + 1 t 1 = A [ i ] = t 1 end_for Eliminate loads and stores for array references “Advanced Compiler Techniques” 60

61 Unroll-and-Jam for j = 1, 2* M for i = 1, N A [ i, j ] = A [ i -1, j ] + A [ i -1, j -1] end_for for j = 1, 2* M, 2 for i = 1, N A [ i, j ]= A [ i -1, j ]+ A [ i - 1, j -1] A [ i, j +1]= A [ i - 1, j +1]+ A [ i -1, j ] end_for Expose more opportunity for scalar replacement “Advanced Compiler Techniques” 61

62 Large Arrays for i = 1, 1000 for j = 1, 1000 A [ i, j ] = A [ i, j ] + B [ j, i ] end_for Suppose arrays A and B have row - major layout B has poor cache locality. Loop interchange will not help. “Advanced Compiler Techniques” 62

63 Loop Blocking for v = 1, 1000, 20 for u = 1, 1000, 20 for j = v, v +19 for i = u, u +19 A [ i, j ] = A [ i, j ] + B [ j, i ] end_for Access to small blocks of the arrays has good cache locality. “Advanced Compiler Techniques” 63

64 Loop Unrolling for ILP for i = 1, 10 a [ i ] = b [ i ]; * p =... end_for for I = 1, 10, 2 a [ i ] = b [ i ]; * p = … a [ i +1] = b [ i +1]; * p = … end_for Large scheduling regions. Fewer dynamic branches Increased code size “Advanced Compiler Techniques” 64

65 “Advanced Compiler Techniques” Next Time  Homework 9.6.2, 9.6.4, 9.6.7  Single Static Assignment ( SSA ) Readings : Cytron '91, Chow '97 65


Download ppt "Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Loops."

Similar presentations


Ads by Google