CMPUT680 - Fall 2006 Topic A: Data Dependence in Loops José Nelson Amaral
CMPUT Compiler Design and Optimization2 Reading Wolfe, Michael, High Performance Compilers for Parallel Computing, Addison-Wesley, 1996 Chapter 5 Randy Allen, Ken Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-based Approach, Morgan Kauffman, 200. Chapter 2.
CMPUT Compiler Design and Optimization3 Basic Concept and Motivation A loop-carried data dependence occurs when a memory access in the iteration i of a loop cannot occur before an access in some iteration i-k is performed. zThere is data dependence between an access a at iteration i-k and an access b at iteration i if: ya and b access the same memory location yThere is a path from a to b yEither a or b is a write
CMPUT Compiler Design and Optimization4 Three Types of Data Dependence X = = X... Flow dependence = X X =... Anti-dependence X =... 0 Output dependence
CMPUT Compiler Design and Optimization5 Data Dependence Example 1: S1: A = 0 S2: B = A S3: A = B + 1 S4: C = A S1 S2 S3 S4 S 2 is flow dependent on S 1 S 1 f S 2 S 1 S 2 (Wolfe, pp. 138) S 1 is the source and S 2 is the target of the dependence.
CMPUT Compiler Design and Optimization6 Data Dependence S 2 S 3 : S 3 is flow-dependent on S 2 S 1 0 S 3 : S 3 is output-dependent on S 1 S 2 -1 S 3 : S 3 is anti-dependent on S 2 S1 S2 S3 S4 Example 1: S1: A = 0 S2: B = A S3: A = B + 1 S4: C = A
CMPUT Compiler Design and Optimization7 Parameterized Dependences DO I = 1, N S 1 A(I+1) = A(I) + B(I) ENDDO “Statement S 1 depends upon itself.” DO I = 1, N S 1 A(I+2) = A(I) + B(I) ENDDO “Statement S 1 depends on an instance of itself two iterations previous.” We need to be able to describe such dependences formally. (Allen-Kennedy, pp. 39)
CMPUT Compiler Design and Optimization8 Loop Normalization DO I = L, U STEP S …. ENDDO Given a loop of the form: The normalized value of an iteration k can be obtained from: Normalized(k) = (k-L+S)/S DO I = 5, 26 STEP 3 …. ENDDO Example … 26 Iteration Space … 8 Normalized Iteration Space (Allen-Kennedy, pp. 39)
CMPUT Compiler Design and Optimization9 Data Dependences Loop carried: between two statements instances in two different iterations of a loop. Loop independent: between two statements instances in the same loop iteration. Lexically forward: the source comes before the target. Lexically backward: otherwise. The right-hand side of an assignment is considered to precede the left-hand side.
CMPUT Compiler Design and Optimization10 Review of Linear Algebra Lexicographic Order Two n-vectors a and b are equal, a = b, if a i = b i, 1 i n. We say that a is less than b, a<b, if a i <b i, 1 i n. We say that a is lexicographically less than b, at level j, a « j b, if a i = b i, 1 i < j and a j <b j. We say that a is lexicographically less than b, a « b, if there is a j, 1 j n, such that a « j b. (Wolfe, pp. 86)
CMPUT Compiler Design and Optimization11 Lexicographic Order Example of vectors
CMPUT Compiler Design and Optimization12 Properties of Lexicographic Order Let n 1, and i, j, k denote arbitrary vectors in R n 1 For each u in 1 u n, the relation « u in R n is irreflexive and transitive. 2 The n relations « u are pairwise disjoint: i « u j and i « v j imply that u = v. 3 If i j, there is a unique integer u such that 1 u n and exactly one of the following two conditions holds: i « u j or j « u i. 4 i « u j and j « v k together imply that i « w k, where w = min (u,v).
CMPUT Compiler Design and Optimization13 Data Dependence in Loops An Example Find the dependence relations due to the array X in the program below: (S 1 ) for i = 2 to 9 do (S 2 ) X[i] = Y[i] + Z[i] (S 3 ) A[i] = X[i-1] + 1 (S 4 )end for Solution To find the data dependence relations in a simple loop, we can unroll the loop and see which statement instances depend on which others: i = 2i = 3i = 4 (s2) X[2]=Y[2]+Z[2] X[3] =Y[3]+Z[3] X[4]=Y[4]+Z[4] (s3) A[2]=X[1]+1 A[3] =X[2]+1 A[4]=X[3]+1 (Wolfe, pp. 140)
CMPUT Compiler Design and Optimization14 There is a loop-carried, lexically forward, flow dependence from S 2 to S 3. Data Dependence in Loops S2S2 S3S3 (1,3) Data dependence graph for statements in a loop (1,3) := iteration distance is 1, latency is 3. (S 1 ) for i = 2 to 9 do (S 2 ) X[i] = Y[i] + Z[i] (S 3 ) A[i] = X[i-1] + 1 (S 4 )end for i = 2i = 3i = 4 (s2) X[2]=Y[2]+Z[2] X[3]=Y[3]+Z[3] X[4]=Y[4]+Z[4] (s3) A[2]=X[1]+1 A[3]=X[2]+1 A[4]=X[3]+1
CMPUT Compiler Design and Optimization15 zIteration space and iteration-space- dependence-graph Example Show the iteration space dependence graph for the loop in our example. Solution Iteration space dependence graph We need an abstraction for this. Iteration Space (an informal introduction)
CMPUT Compiler Design and Optimization16 Iteration Space (an informal introduction) (S 1 ) for i = 3 to 9 do (S 2 ) X[i] = Y[i] + Z[i] (S 3 ) A[i] = X[i-2] + 1 (S 4 ) B[i] = A[i-1] + 2 (S 5 )end for i 2 S (X) = [3; 4; 5; 6; 7; 8; 9] i 3 T (X) = [1; 2; 3; 4; 5; 6; 7] i 3 S (A) = [3; 4; 5; 6; 7; 8; 9] i 4 T (A) = [2; 3; 4; 5; 6; 7; 8] For each dependency, there is an iteration vector for the source and one for the target Iteration Vector: a vector formed by the index variable used to access an array in the loop. S2S2 S3S3 S4S4
CMPUT Compiler Design and Optimization17 d(X) = i 3 T (X) - i 2 S (X) d(X) = [-2; -2; -2; -2; -2; -2; -2] d(A) = i 4 T (A) - i 3 S (A) d(A) = [-1; -1; -1; -1; -1; -1; -1] i 2 S (X) = [3; 4; 5; 6; 7; 8; 9] i 3 T (X) = [1; 2; 3; 4; 5; 6; 7] i 3 S (A) = [3; 4; 5; 6; 7; 8; 9] i 4 T (A) = [2; 3; 4; 5; 6; 7; 8] Distance Vector: a vector formed by the difference between the iteration vectors of the source and target of a dependency. (S 1 ) for i = 3 to 9 do (S 2 ) X[i] = Y[i] + Z[i] (S 3 ) A[i] = X[i-2] + 1 (S 4 ) B[i] = A[i-1] + 2 (S 5 )end for Iteration Space (an informal introduction) S2S2 S3S3 S4S4
CMPUT Compiler Design and Optimization18 dir(X) = [<;<;<;<;<;<;<] dir(A) = [<;<;<;<;<;<;<] The elements of a direction vector are, and =. Other authors use +, -, 0. (S 1 ) for i = 3 to 9 do (S 2 ) X[i] = Y[i] + Z[i] (S 3 ) A[i] = X[i-2] + 1 (S 4 ) B[i] = A[i-1] + 2 (S 5 )end for Direction Vector: contain only information about the direction of the dependence but no iteration distance information. Iteration Space (an informal introduction) S2S2 S3S3 S4S4
CMPUT Compiler Design and Optimization19 zEach element of the direction vector can be stored in two bits. zGiven a distance vector, we can compute the direction vector, but not vice-versa. Iteration Space (an informal introduction)
CMPUT Compiler Design and Optimization20 Example Show the index variable iteration vectors and normalized iteration vectors for the iterations in the loop below: (1)for i = 2 to 6 do (2) for j = 6 to 2 by -2 do (3) A[i, j] = A[i, j+2] +1 (4) end for (5)end for Solution Since there are two nested loops, the iteration space has two dimensions. Iteration Space (an informal introduction)
CMPUT Compiler Design and Optimization21 i Iteration space dependence graph corresponding to the index variable iteration vectors j Iteration Space (an informal introduction) (1)for i = 2 to 6 do (2) for j = 6 to 2 by -2 do (3) A[i, j] = A[i, j+2] +1 (4) end for (5)end for
CMPUT Compiler Design and Optimization22 Distance/Direction Vectors zIt is often convenient to deal with incompletely specified direction vectors Example 1: {(0, 0, 0, 1), (0, -1, 0, 1), (0, 0, 1, 1), (0, -1, 1, 1)} ==> {(0, 0, 0, 1)} Example 2: {(0, -1, 0, -1), (0, 0, 0, -1), (0, 1, 0, -1)} ==> {(0, *, 0, -1)}
CMPUT Compiler Design and Optimization23 Distance/Direction Vectors zLet a, b denote two vectors in R n and s their direction vector. Then a « b if and only if s has one of the following forms: (1, *, *, …, *) (0, 1, *, …, *) (0, 0, 1, *, …, *) (0, 0, …, 0, 1). More precisely, a « u b for u in 1 u n, if and only if s has the form with a leading 1 after (u - 1) zeros. zNotation (0, 1, -1) (=, >, <)
CMPUT Compiler Design and Optimization24 do i = 3, 100 S:A[2i] = B[i] + 2 T:C[i] = D[i] + 2 A[2i+1] + A[2i - 4] + A[i] done What are the dependences and the dependence distance vectors in the example above? An Example
CMPUT Compiler Design and Optimization25 do i = 3, 100 S:A[2i] = B[i] + 2 T1:TEMP1 = D[i] + 2 A[2i + 1] T2: TEMP2 = TEMP1 + A[2i - 4] T3: C(i) = TEMP2+ A[i] done i S (A) = [6; 8; 10; 12; 14; 16; …; 198; 200] i T1 (A) = [7; 9; 11; 13; 15; 17; …; 199; 201] i T2 (A) = [2; 4; 6; 8; 10; 12; …; 194; 196] i T3 (A) = [3; 4; 5; 6; 7; 8; …; 99; 100] An Example
CMPUT Compiler Design and Optimization26 An Example i S (A) = [6; 8; 10; 12; 14; 16; …; 198; 200] i T1 (A) = [7; 9; 11; 13; 15; 17; …; 199; 201] T1 is flow dependent on S with dependence distance 1. d(T1,S) = i T1 (A) - i S (A) do i = 3, 100 S:A[2i] = B[i] + 2 T1:TEMP1 = D[i] + 2 A[2i + 1] T2: TEMP2 = TEMP1 + A[2i - 4] T3: C(i) = TEMP2+ A[i] done
CMPUT Compiler Design and Optimization27 i S (A) = [6; 8; 10; 12; 14; 16; …; 198; 200] i T2 (A) = [2; 4; 6; 8; 10; 12; …; 194; 196] d(T2,S) = i T2 (A) - i S (A) T2 is flow dependent on S with dependence distance -4. do i = 3, 100 S:A[2i] = B[i] + 2 T1:TEMP1 = D[i] + 2 A[2i + 1] T2: TEMP2 = TEMP1 + A[2i - 4] T3: C(i) = TEMP2+ A[i] done An Example
CMPUT Compiler Design and Optimization28 i S (A) = [6; 8; 10; 12; 14; 16; …; 198; 200] i T3 (A) = [3; 4; 5; 6; 7; 8; …; 99; 100] d(T3,S) = i T3 (A) - i S (A) T3 is flow dependent on S with dependence distance (i-2i) = -i do i = 3, 100 S:A[2i] = B[i] + 2 T1:TEMP1 = D[i] + 2 A[2i + 1] T2: TEMP2 = TEMP1 + A[2i - 4] T3: C(i) = TEMP2+ A[i] done An Example
CMPUT Compiler Design and Optimization29 Wolfe’s Definition From Michael Wolfe’s, pg. 140: “An anti-dependence from a statement to itself is considered lexically forward”: S k : x[i] x[i+1] + 1 “A dependence is lexically forward when the source comes before the target without passing through a loop back edge”: x[1] x[2] + 1 x[2] x[3] + 1 x[3] x[4] + 1 (back edge)
CMPUT Compiler Design and Optimization30 Wolfe’s Definition From Michael Wolfe’s, pg. 140: “A self-flow dependence is lexically backward”: S k : x[i] x[i-1] + 1 x[1] x[0] + 1 x[2] x[1] + 1 x[3] x[2] + 1 (back edge)
CMPUT Compiler Design and Optimization31 Allen-Kennedy Definition From Allen-Kennedy’s, pg. 45: “Suppose that there is a dependence from statement S 1 on iteration i of a loop nest of n loops and statement S 2 on iteration j; then the dependence distance vector d(i,j) is defined as a vector of length n such that:
CMPUT Compiler Design and Optimization32 Allen-Kennedy Definition From Allen-Kennedy’s, pg. 46: “Suppose that there is a dependence from statement S 1 on iteration i of a loop nest of n loops and statement S 2 on iteration j; then the dependence direction vector D(i,j) is defined as a vector of length n such that:
CMPUT Compiler Design and Optimization33 Allen-Kennedy Definition From Allen-Kennedy’s, pg. 50: “Statement S 2 has a loop-carried dependence on statement S 1 if and only if S 1 references location M on iteration j, and d(i,j) > 0 (that is, D(i,j) contains a “<“ as its leftmost non-”=“ component).” “A loop-carried dependence from statement S 1 to statement S 2 is said to be backward if S 2 appears before S 1 in the loop body or if S 1 and S 2 are the same statement. The carried dependence is said to be forward if S 2 appears after S 1 in the loop body.