CMPUT680 - Fall 2006 Topic A: Data Dependence in Loops José Nelson Amaral

Slides:



Advertisements
Similar presentations
Optimizing Compilers for Modern Architectures Copyright, 1996 © Dale Carnegie & Associates, Inc. Dependence Testing Allen and Kennedy, Chapter 3 thru Section.
Advertisements

Optimizing Compilers for Modern Architectures Compiler Improvement of Register Usage Chapter 8, through Section 8.4.
5.1 Real Vector Spaces.
Optimizing Compilers for Modern Architectures Coarse-Grain Parallelism Chapter 6 of Allen and Kennedy.
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
COSC513 Operating System Research Paper Fundamental Properties of Programming for Parallelism Student: Feng Chen (134192)
Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Programming and Data Structure
Compiler Challenges for High Performance Architectures
Enhancing Fine-Grained Parallelism Chapter 5 of Allen and Kennedy Optimizing Compilers for Modern Architectures.
Dependence Analysis Kathy Yelick Bebop group meeting, 8/3/01.
Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.
Entropy Rates of a Stochastic Process
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Stanford University CS243 Winter 2006 Wei Li 1 Loop Transformations and Locality.
Enhancing Fine- Grained Parallelism Chapter 5 of Allen and Kennedy Mirit & Haim.
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
EECC551 - Shaaban #1 Spring 2004 lec# Static Compiler Optimization Techniques We already examined the following static compiler techniques aimed.
Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization.
Optimizing Compilers for Modern Architectures Coarse-Grain Parallelism Chapter 6 of Allen and Kennedy.
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic B: Loop Restructuring José Nelson Amaral
Dependence Testing Optimizing Compilers for Modern Architectures, Chapter 3 Allen and Kennedy Presented by Rachel Tzoref and Rotem Oshman.
Data Flow Analysis Compiler Design Nov. 8, 2005.
A Data Locality Optimizing Algorithm based on A Data Locality Optimizing Algorithm by Michael E. Wolf and Monica S. Lam.
Data Dependences CS 524 – High-Performance Computing.
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
EECC551 - Shaaban #1 Winter 2002 lec# Static Compiler Optimization Techniques We already examined the following static compiler techniques aimed.
1 Dynamic Programming Jose Rolim University of Geneva.
Creating Coarse-Grained Parallelism Chapter 6 of Allen and Kennedy Dan Guez.
Optimizing Compilers for Modern Architectures Dependence: Theory and Practice Allen and Kennedy, Chapter 2 pp
Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman.
Propositional Calculus Math Foundations of Computer Science.
Structured Data Types. Date Types We have seen various data types –Integer, Real, Character, Logical All these types define data values of different kinds.
Optimizing Compilers for Modern Architectures Dependence: Theory and Practice Allen and Kennedy, Chapter 2.
5  Systems of Linear Equations: ✦ An Introduction ✦ Unique Solutions ✦ Underdetermined and Overdetermined Systems  Matrices  Multiplication of Matrices.
Chapter 10 Review: Matrix Algebra
Presentation by: H. Sarper
Linear Algebra Chapter 4 Vector Spaces.
Systems Architecture I1 Propositional Calculus Objective: To provide students with the concepts and techniques from propositional calculus so that they.
Enhancing Fine-Grained Parallelism Chapter 5 of Allen and Kennedy Optimizing Compilers for Modern Architectures.
Array Dependence Analysis COMP 621 Special Topics By Nurudeen Lameed
Vectors CHAPTER 7. Ch7_2 Contents  7.1 Vectors in 2-Space 7.1 Vectors in 2-Space  7.2 Vectors in 3-Space 7.2 Vectors in 3-Space  7.3 Dot Product 7.3.
Linear Equations in Linear Algebra
1 1.7 © 2016 Pearson Education, Inc. Linear Equations in Linear Algebra LINEAR INDEPENDENCE.
Dominators, control-dependence and SSA form. Organization Dominator relation of CFGs –postdominator relation Dominator tree Computing dominator relation.
Carnegie Mellon Lecture 14 Loop Optimization and Array Analysis I. Motivation II. Data dependence analysis Chapter , 11.6 Dror E. MaydanCS243:
Execution of an instruction
1 Theory and Practice of Dependence Testing Data and control dependences Scalar data dependences  True-, anti-, and output-dependences Loop dependences.
Copyright © Cengage Learning. All rights reserved. CHAPTER 8 RELATIONS.
CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.
Pointers: Basics. 2 What is a pointer? First of all, it is a variable, just like other variables you studied  So it has type, storage etc. Difference:
8-1 Compilers Compiler A program that translates a high-level language program into machine code High-level languages provide a richer set of instructions.
Program Analysis & Transformations Loop Parallelization and Vectorization Toheed Aslam.
OR  Now, we look for other basic feasible solutions which gives better objective values than the current solution. Such solutions can be examined.
1 ENERGY 211 / CME 211 Lecture 4 September 29, 2008.
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
1 MAC 2103 Module 4 Vectors in 2-Space and 3-Space I.
Chapter 2 1. Chapter Summary Sets The Language of Sets - Sec 2.1 – Lecture 8 Set Operations and Set Identities - Sec 2.2 – Lecture 9 Functions and sequences.
 Matrix Operations  Inverse of a Matrix  Characteristics of Invertible Matrices …
Chapter 5 Relations and Operations
Chapter 4 Introduction to Set Theory
Relations.
CS314 – Section 5 Recitation 13
Dependence Analysis Important and difficult
Data Dependence, Parallelization, and Locality Enhancement (courtesy of Tarek Abdelrahman, University of Toronto)
Register Pressure Guided Unroll-and-Jam
Introduction to Optimization
Presentation transcript:

CMPUT680 - Fall 2006 Topic A: Data Dependence in Loops José Nelson Amaral

CMPUT Compiler Design and Optimization2 Reading Wolfe, Michael, High Performance Compilers for Parallel Computing, Addison-Wesley, 1996 Chapter 5 Randy Allen, Ken Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-based Approach, Morgan Kauffman, 200. Chapter 2.

CMPUT Compiler Design and Optimization3 Basic Concept and Motivation  A loop-carried data dependence occurs when a memory access in the iteration i of a loop cannot occur before an access in some iteration i-k is performed. zThere is data dependence between an access a at iteration i-k and an access b at iteration i if: ya and b access the same memory location yThere is a path from a to b yEither a or b is a write

CMPUT Compiler Design and Optimization4 Three Types of Data Dependence X = = X... Flow dependence = X X =... Anti-dependence X =... 0 Output dependence

CMPUT Compiler Design and Optimization5 Data Dependence Example 1: S1: A = 0 S2: B = A S3: A = B + 1 S4: C = A S1 S2 S3 S4  S 2 is flow dependent on S 1 S 1  f S 2 S 1  S 2 (Wolfe, pp. 138) S 1 is the source and S 2 is the target of the dependence.

CMPUT Compiler Design and Optimization6 Data Dependence S 2  S 3 : S 3 is flow-dependent on S 2 S 1  0 S 3 : S 3 is output-dependent on S 1 S 2  -1 S 3 : S 3 is anti-dependent on S 2 S1 S2 S3 S4 Example 1: S1: A = 0 S2: B = A S3: A = B + 1 S4: C = A

CMPUT Compiler Design and Optimization7 Parameterized Dependences DO I = 1, N S 1 A(I+1) = A(I) + B(I) ENDDO “Statement S 1 depends upon itself.” DO I = 1, N S 1 A(I+2) = A(I) + B(I) ENDDO “Statement S 1 depends on an instance of itself two iterations previous.” We need to be able to describe such dependences formally. (Allen-Kennedy, pp. 39)

CMPUT Compiler Design and Optimization8 Loop Normalization DO I = L, U STEP S …. ENDDO Given a loop of the form: The normalized value of an iteration k can be obtained from: Normalized(k) = (k-L+S)/S DO I = 5, 26 STEP 3 …. ENDDO Example … 26 Iteration Space … 8 Normalized Iteration Space (Allen-Kennedy, pp. 39)

CMPUT Compiler Design and Optimization9 Data Dependences Loop carried: between two statements instances in two different iterations of a loop. Loop independent: between two statements instances in the same loop iteration. Lexically forward: the source comes before the target. Lexically backward: otherwise. The right-hand side of an assignment is considered to precede the left-hand side.

CMPUT Compiler Design and Optimization10 Review of Linear Algebra Lexicographic Order Two n-vectors a and b are equal, a = b, if a i = b i, 1  i  n. We say that a is less than b, a<b, if a i <b i, 1  i  n. We say that a is lexicographically less than b, at level j, a « j b, if a i = b i, 1  i < j and a j <b j. We say that a is lexicographically less than b, a « b, if there is a j, 1  j  n, such that a « j b. (Wolfe, pp. 86)

CMPUT Compiler Design and Optimization11 Lexicographic Order Example of vectors

CMPUT Compiler Design and Optimization12 Properties of Lexicographic Order Let n  1, and i, j, k denote arbitrary vectors in R n 1 For each u in 1  u  n, the relation « u in R n is irreflexive and transitive. 2 The n relations « u are pairwise disjoint: i « u j and i « v j imply that u = v. 3 If i  j, there is a unique integer u such that 1  u  n and exactly one of the following two conditions holds: i « u j or j « u i. 4 i « u j and j « v k together imply that i « w k, where w = min (u,v).

CMPUT Compiler Design and Optimization13 Data Dependence in Loops An Example Find the dependence relations due to the array X in the program below: (S 1 ) for i = 2 to 9 do (S 2 ) X[i] = Y[i] + Z[i] (S 3 ) A[i] = X[i-1] + 1 (S 4 )end for Solution To find the data dependence relations in a simple loop, we can unroll the loop and see which statement instances depend on which others: i = 2i = 3i = 4 (s2) X[2]=Y[2]+Z[2] X[3] =Y[3]+Z[3] X[4]=Y[4]+Z[4] (s3) A[2]=X[1]+1 A[3] =X[2]+1 A[4]=X[3]+1 (Wolfe, pp. 140)

CMPUT Compiler Design and Optimization14 There is a loop-carried, lexically forward, flow dependence from S 2 to S 3. Data Dependence in Loops S2S2 S3S3 (1,3) Data dependence graph for statements in a loop (1,3) := iteration distance is 1, latency is 3. (S 1 ) for i = 2 to 9 do (S 2 ) X[i] = Y[i] + Z[i] (S 3 ) A[i] = X[i-1] + 1 (S 4 )end for i = 2i = 3i = 4 (s2) X[2]=Y[2]+Z[2] X[3]=Y[3]+Z[3] X[4]=Y[4]+Z[4] (s3) A[2]=X[1]+1 A[3]=X[2]+1 A[4]=X[3]+1

CMPUT Compiler Design and Optimization15 zIteration space and iteration-space- dependence-graph Example Show the iteration space dependence graph for the loop in our example. Solution Iteration space dependence graph We need an abstraction for this. Iteration Space (an informal introduction)

CMPUT Compiler Design and Optimization16 Iteration Space (an informal introduction) (S 1 ) for i = 3 to 9 do (S 2 ) X[i] = Y[i] + Z[i] (S 3 ) A[i] = X[i-2] + 1 (S 4 ) B[i] = A[i-1] + 2 (S 5 )end for i 2 S (X) = [3; 4; 5; 6; 7; 8; 9] i 3 T (X) = [1; 2; 3; 4; 5; 6; 7] i 3 S (A) = [3; 4; 5; 6; 7; 8; 9] i 4 T (A) = [2; 3; 4; 5; 6; 7; 8] For each dependency, there is an iteration vector for the source and one for the target Iteration Vector: a vector formed by the index variable used to access an array in the loop. S2S2 S3S3 S4S4

CMPUT Compiler Design and Optimization17 d(X) = i 3 T (X) - i 2 S (X) d(X) = [-2; -2; -2; -2; -2; -2; -2] d(A) = i 4 T (A) - i 3 S (A) d(A) = [-1; -1; -1; -1; -1; -1; -1] i 2 S (X) = [3; 4; 5; 6; 7; 8; 9] i 3 T (X) = [1; 2; 3; 4; 5; 6; 7] i 3 S (A) = [3; 4; 5; 6; 7; 8; 9] i 4 T (A) = [2; 3; 4; 5; 6; 7; 8] Distance Vector: a vector formed by the difference between the iteration vectors of the source and target of a dependency. (S 1 ) for i = 3 to 9 do (S 2 ) X[i] = Y[i] + Z[i] (S 3 ) A[i] = X[i-2] + 1 (S 4 ) B[i] = A[i-1] + 2 (S 5 )end for Iteration Space (an informal introduction) S2S2 S3S3 S4S4

CMPUT Compiler Design and Optimization18 dir(X) = [<;<;<;<;<;<;<] dir(A) = [<;<;<;<;<;<;<] The elements of a direction vector are, and =. Other authors use +, -, 0. (S 1 ) for i = 3 to 9 do (S 2 ) X[i] = Y[i] + Z[i] (S 3 ) A[i] = X[i-2] + 1 (S 4 ) B[i] = A[i-1] + 2 (S 5 )end for Direction Vector: contain only information about the direction of the dependence but no iteration distance information. Iteration Space (an informal introduction) S2S2 S3S3 S4S4

CMPUT Compiler Design and Optimization19 zEach element of the direction vector can be stored in two bits. zGiven a distance vector, we can compute the direction vector, but not vice-versa. Iteration Space (an informal introduction)

CMPUT Compiler Design and Optimization20 Example Show the index variable iteration vectors and normalized iteration vectors for the iterations in the loop below: (1)for i = 2 to 6 do (2) for j = 6 to 2 by -2 do (3) A[i, j] = A[i, j+2] +1 (4) end for (5)end for Solution Since there are two nested loops, the iteration space has two dimensions. Iteration Space (an informal introduction)

CMPUT Compiler Design and Optimization21 i Iteration space dependence graph corresponding to the index variable iteration vectors j Iteration Space (an informal introduction) (1)for i = 2 to 6 do (2) for j = 6 to 2 by -2 do (3) A[i, j] = A[i, j+2] +1 (4) end for (5)end for

CMPUT Compiler Design and Optimization22 Distance/Direction Vectors zIt is often convenient to deal with incompletely specified direction vectors Example 1: {(0, 0, 0, 1), (0, -1, 0, 1), (0, 0, 1, 1), (0, -1, 1, 1)} ==> {(0,  0,  0, 1)} Example 2: {(0, -1, 0, -1), (0, 0, 0, -1), (0, 1, 0, -1)} ==> {(0, *, 0, -1)}

CMPUT Compiler Design and Optimization23 Distance/Direction Vectors zLet a, b denote two vectors in R n and s their direction vector. Then a « b if and only if s has one of the following forms: (1, *, *, …, *) (0, 1, *, …, *) (0, 0, 1, *, …, *) (0, 0, …, 0, 1). More precisely, a « u b for u in 1  u  n, if and only if s has the form with a leading 1 after (u - 1) zeros. zNotation (0, 1, -1)  (=, >, <)

CMPUT Compiler Design and Optimization24 do i = 3, 100 S:A[2i] = B[i] + 2 T:C[i] = D[i] + 2  A[2i+1] + A[2i - 4] + A[i] done What are the dependences and the dependence distance vectors in the example above? An Example

CMPUT Compiler Design and Optimization25 do i = 3, 100 S:A[2i] = B[i] + 2 T1:TEMP1 = D[i] + 2  A[2i + 1] T2: TEMP2 = TEMP1 + A[2i - 4] T3: C(i) = TEMP2+ A[i] done i S (A) = [6; 8; 10; 12; 14; 16; …; 198; 200] i T1 (A) = [7; 9; 11; 13; 15; 17; …; 199; 201] i T2 (A) = [2; 4; 6; 8; 10; 12; …; 194; 196] i T3 (A) = [3; 4; 5; 6; 7; 8; …; 99; 100] An Example

CMPUT Compiler Design and Optimization26 An Example i S (A) = [6; 8; 10; 12; 14; 16; …; 198; 200] i T1 (A) = [7; 9; 11; 13; 15; 17; …; 199; 201] T1 is flow dependent on S with dependence distance 1. d(T1,S) = i T1 (A) - i S (A) do i = 3, 100 S:A[2i] = B[i] + 2 T1:TEMP1 = D[i] + 2  A[2i + 1] T2: TEMP2 = TEMP1 + A[2i - 4] T3: C(i) = TEMP2+ A[i] done

CMPUT Compiler Design and Optimization27 i S (A) = [6; 8; 10; 12; 14; 16; …; 198; 200] i T2 (A) = [2; 4; 6; 8; 10; 12; …; 194; 196] d(T2,S) = i T2 (A) - i S (A) T2 is flow dependent on S with dependence distance -4. do i = 3, 100 S:A[2i] = B[i] + 2 T1:TEMP1 = D[i] + 2  A[2i + 1] T2: TEMP2 = TEMP1 + A[2i - 4] T3: C(i) = TEMP2+ A[i] done An Example

CMPUT Compiler Design and Optimization28 i S (A) = [6; 8; 10; 12; 14; 16; …; 198; 200] i T3 (A) = [3; 4; 5; 6; 7; 8; …; 99; 100] d(T3,S) = i T3 (A) - i S (A) T3 is flow dependent on S with dependence distance (i-2i) = -i do i = 3, 100 S:A[2i] = B[i] + 2 T1:TEMP1 = D[i] + 2  A[2i + 1] T2: TEMP2 = TEMP1 + A[2i - 4] T3: C(i) = TEMP2+ A[i] done An Example

CMPUT Compiler Design and Optimization29 Wolfe’s Definition From Michael Wolfe’s, pg. 140: “An anti-dependence from a statement to itself is considered lexically forward”: S k : x[i]  x[i+1] + 1 “A dependence is lexically forward when the source comes before the target without passing through a loop back edge”: x[1]  x[2] + 1 x[2]  x[3] + 1 x[3]  x[4] + 1 (back edge)

CMPUT Compiler Design and Optimization30 Wolfe’s Definition From Michael Wolfe’s, pg. 140: “A self-flow dependence is lexically backward”: S k : x[i]  x[i-1] + 1 x[1]  x[0] + 1 x[2]  x[1] + 1 x[3]  x[2] + 1 (back edge)

CMPUT Compiler Design and Optimization31 Allen-Kennedy Definition From Allen-Kennedy’s, pg. 45: “Suppose that there is a dependence from statement S 1 on iteration i of a loop nest of n loops and statement S 2 on iteration j; then the dependence distance vector d(i,j) is defined as a vector of length n such that:

CMPUT Compiler Design and Optimization32 Allen-Kennedy Definition From Allen-Kennedy’s, pg. 46: “Suppose that there is a dependence from statement S 1 on iteration i of a loop nest of n loops and statement S 2 on iteration j; then the dependence direction vector D(i,j) is defined as a vector of length n such that:

CMPUT Compiler Design and Optimization33 Allen-Kennedy Definition From Allen-Kennedy’s, pg. 50: “Statement S 2 has a loop-carried dependence on statement S 1 if and only if S 1 references location M on iteration j, and d(i,j) > 0 (that is, D(i,j) contains a “<“ as its leftmost non-”=“ component).” “A loop-carried dependence from statement S 1 to statement S 2 is said to be backward if S 2 appears before S 1 in the loop body or if S 1 and S 2 are the same statement. The carried dependence is said to be forward if S 2 appears after S 1 in the loop body.