Parallelization, Compilation and Platforms 5LIM0 Loop parallelization Dependence checking Henk Corporaal March 2016 19-11-2018
Theory and Practice of Dependence Testing Data and control dependences Scalar data dependences True-, anti-, and output-dependences Loop dependences Loop parallelism Dependence distance and direction vectors Loop-carried dependences Loop-independent dependences Loop dependence and loop iteration reordering Dependence tests Slides largely based on Van Engelen, FSUniv. See also: https://en.wikipedia.org/wiki/Loop_dependence_analysis
Data vs Control Dependences Data dependence: data is produced and consumed in the correct order Control dependence: a dependence that arises as a result of control flow S1 PI = 3.14 S2 R = 5.0 S3 AREA = PI * R ** 2 S1 IF (T.EQ.0.0) GOTO S3 S2 A = A / T S3 CONTINUE
Data Dependence Definition There is a data dependence from statement S1 to statement S2 iff 1) both S1 and S2 access the same memory location and at least one of them stores into it, and 2) there is a feasible run-time execution path from S1 to S2
Data Dependence Classification Dependence relations are similar to hardware hazards (which may cause pipeline stalls) True dependences (also called flow dependences), denoted by S1 S2, are the same as RAW (Read After Write) hazards Anti dependences, denoted by S1 -1 S2, are the same as WAR (Write After Read) hazards Output dependences, denoted by S1 o S2, are the same as WAW (Write After Write) hazards Note: anti and output dependences are often called false dependences? Why? They are real? How to avoid them? S1 X = … S2 … = X S1 … = X S2 X = … S1 X = … S2 X = … S1 S2 S1 -1 S2 S1 o S2
Compiling for Loop Parallelism Dependence properties are crucial for determining and exploiting loop parallelism Bernstein’s conditions: Iteration I1 does not write into a location that is read by iteration I2 Iteration I2 does not write into a location that is read by iteration I1 Iteration I1 does not write into a location that is written into by iteration I2
Fortran’s PARALLEL DO The PARALLEL DO (FORTRAN) / ForALL guarantees that there are no scheduling constraints among its iterations PARALLEL DO I = 1, N A(I+1) = A(I) + B(I) ENDDO PARALLEL DO I = 1, N A(I) = A(I) + B(I) ENDDO PARALLEL DO I = 1, N A(I-1) = A(I) + B(I) ENDDO OK PARALLEL DO I = 1, N S = S + B(I) ENDDO Not OK
Loop Dependences While the PARALLEL DO requires the programmer to assure the loop iterations can be executed in parallel, an optimizing compiler must analyze loop dependences for automatic loop parallelization and vectorization Granularity of parallelism Instruction level parallelism (ILP), not loop-specific Synchronous (e.g. SIMD, also vector and SSE) parallelism is fine grain and compiler aims to optimize inner loops Asynchronous (e.g. MIMD, work sharing) parallelism is coarse grain and compiler aims to optimize outer loops
Synchronous Parallel Machines SIMD model, simpler and cheaper compared to MIMD Processors operate in lockstep executing the same instruction on different portions of the data space Application should exhibit data parallelism Control flow requires masking operations (similar to predicated instructions), which can be inefficient p1,1 p1,2 p1,3 p1,4 p2,1 p2,2 p2,3 p2,4 p3,1 p3,2 p3,3 p3,4 p4,1 p4,2 p4,3 p4,4 Example 4x4 processor mesh
Asynchronous Parallel Computers Multiprocessors Shared memory is directly accessible by each PE (processing element) Multicomputers Distributed memory requires message passing between PEs SMPs Shared memory in small clusters of processors, need message passing across clusters p1 p2 p3 p4 mem bus p1 p2 p3 p4 m1 m2 m3 m4 bus, x-bar, or network
Advanced Compiler Technology Powerful restructuring compilers transform loops into parallel or vector code Must analyze data dependences in loop nests besides straight-line code DO I = 1, N DO J = 1, N C(J,I) = 0.0 DO K = 1, N C(J,I) = C(J,I)+A(J,K)*B(K,I) ENDDO ENDDO ENDDO parallelize vectorize DO I = 1, N DO J = 1, N, 64 C(J:J+63,I) = 0.0 DO K = 1, N C(J:J+63,I) = C(J:J+63,I) +A(J:J+63,K)*B(K,I) ENDDO ENDDO ENDDO PARALLEL DO I = 1, N DO J = 1, N C(J,I) = 0.0 DO K = 1, N C(J,I) = C(J,I)+A(J,K)*B(K,I) ENDDO ENDDO ENDDO
Normalized Iteration Space In most situations it is preferable to use normalized iteration spaces For an arbitrary loop with loop index I and loop bounds L and U and stride S, the normalized iteration number is i = (I-L+S)/S DO I = 100, 20, -10 A(I) = B(100-I) + C(I/5) ENDDO is normalized with i = (110-I)/10 DO i = 1, 9 A(110-10*i) = B(10*i-10) + C(22-2*i) ENDDO
Dependence in Loops In this example loop statement S1 “depends on itself” from the previous iteration More precisely, statement S1 in iteration i has a true dependence with S1 in iteration i+1 DO I = 1, N S1 A(I+1) = A(I) + B(I) ENDDO S1 A(2) = A(1) + B(1) S1 A(3) = A(2) + B(2) S1 A(4) = A(3) + B(3) S1 A(5) = A(4) + B(4) … is executed as
Iteration Vector Definition Given a nest of n loops, the iteration vector i of a particular iteration of the innermost loop is a vector of integers i = (i1, i2, …, in) where ik represents the iteration number for the loop at nesting level k The set of all possible iteration vectors is an iteration space
Iteration Vector Example The iteration space of the statement at S1 is {(1,1), (2,1), (2,2), (3,1), (3,2), (3,3)} At iteration i = (2,1) the value of A(1,2) is assigned to A(2,1) DO I = 1, 3 DO J = 1, I S1 A(I,J) = A(J,I) ENDDO ENDDO (1,1) (2,1) (2,2) (3,2) (3,3) (3,1) I J
Iteration Vector Ordering The iteration vectors are naturally ordered according to a lexicographical order, e.g. iteration (1,2) precedes (2,1) and (2,2) in the example on the previous slide Definition, assuming normalized loops: Iteration i precedes iteration j, denoted i < j, iff 1) i[1] < j[1], or 2) i[1:k-1] = j[1:k-1] and ik < jk i.e. the first k indices are equal and the first non-equal index value should be smaller lexicographical ordering represents the ordering of iterations as defined by the original, un-transformed, loop; so the ordering specified by the programmer.
Loop Dependence Definition There exist a dependence from S1 to S2 in a loop nest iff there exist two iteration vectors i and j such that 1) i < j and there is a path from S1 to S2 2) S1 accesses memory location M on iteration i and S2 accesses memory location M on iteration j 3) one of these accesses is a write What if both are writes: WaW
Loop Dependence Example Show that the example loop has no loop dependences Answer: there are no iteration vectors i and j in {(1,1), (2,1), (2,2), (3,1), (3,2), (3,3)} such that i < j and either S1 in i writes to the same element of A that is read at S1 in iteration j, or S1 in iteration i reads an element A that is written in iteration j DO I = 1, 3 DO J = 1, I S1 A(I,J) = A(J,I) ENDDO ENDDO What about WaW?
Loop Reordering Transformations Definitions A reordering transformation is any program transformation that changes the execution order of the code, without adding or deleting any statement executions A reordering transformation preserves a dependence if it preserves the relative execution order of the source and sink of that dependence
Fundamental Theorem of Dependence Any reordering transformation that preserves every dependence in a program preserves the meaning of that program
Valid Transformations A transformation is said to be valid for the program to which it applies if it preserves all dependences in the program DO I = 1, 3 S1 A(I+1) = B(I) S2 B(I+1) = A(I) ENDDO invalid valid DO I = 1, 3 B(I+1) = A(I) A(I+1) = B(I) ENDDO DO I = 3, 1, -1 A(I+1) = B(I) B(I+1) = A(I) ENDDO
Dependences and Loop Transformations Loop dependences are tested before a transformation is applied When a dependence test is inconclusive, dependence must be assumed In this example the value of K is unknown and loop dependence is assumed: DO I = 1, N S1 A(I+K) = A(I) + B(I) ENDDO
Dependence Distance Vector Definition Suppose that there is a dependence from S1 on iteration i to S2 on iteration j; then the dependence distance vector d(i, j) is defined as d(i, j) = j - i Example: True dependence between S1 and itself on i = (1,1) and j = (2,1): d(i, j) = (1,0) i = (2,1) and j = (3,1): d(i, j) = (1,0) i = (2,2) and j = (3,2): d(i, j) = (1,0) DO I = 1, 3 DO J = 1, I S1 A(I+1,J) = A(I,J) ENDDO ENDDO
Dependence Direction Vector Definition Suppose that there is a dependence from S1 on iteration i and S2 on iteration j; then the dependence direction vector D(i, j) is defined as: D(i, j)k = “<” if d(i, j)k > 0 “=” if d(i, j)k = 0 “>” if d(i, j)k < 0 Watch the flipping of the condition from < to >
Example DO I = 1, 3 DO J = 1, I S1 A(I+1,J) = A(I,J) ENDDO ENDDO (1,1) (2,1) (2,2) (3,2) (3,3) (3,1) I J DO I = 1, 3 DO J = 1, I S1 A(I+1,J) = A(I,J) ENDDO ENDDO source sink True dependence between S1 and itself on i = (1,1) and j = (2,1): d(i, j) = (1,0), D(i, j) = (<, =) i = (2,1) and j = (3,1): d(i, j) = (1,0), D(i, j) = (<, =) i = (2,2) and j = (3,2): d(i, j) = (1,0), D(i, j) = (<, =)
Data Dependence Graphs Data dependences arise between statement instances It is generally infeasible to represent all data dependences that arise in a program Usually only static data dependences are recorded using the data dependence direction vector and depicted in a data dependence graph to compactly represent data dependences in a loop nest DO I = 1, 10000 S1 A(I) = B(I) * 5 S2 C(I+1) = C(I) + A(I) ENDDO Static data dependences for accesses to A and C: S1 (=) S2 and S2 (<) S2 Explain difference static and dynamic dependences (compare DSA and SSA). E.g. a static dependence may not hold for all iterations, i.e. for all dynamic statement instances Data dependence graph: S1 (=) (<) S2
Loop-Independent Dependences Definition Statement S1 has a loop-independent dependence on S2 iff there exist two iteration vectors i and j such that 1) S1 refers to memory location M on iteration i, S2 refers to M on iteration j, and i = j 2) there is a control flow path from S1 to S2 within the iteration Also called INTRA loop dependence
Loop-Carried Dependences Definitions Statement S1 has a loop-carried dependence on S2 iff there exist two iteration vectors i and j such that S1 refers to memory location M on iteration i, S2 refers to M on iteration j, and d(i, j) > 0, (that is, D(i, j) contains a “<” as its leftmost non-“=” component) A loop-carried dependence from S1 to S2 is 1) forward if S2 appears after S1 in the loop body 2) backward if S2 appears before S1 (or if S1=S2) The level of a loop-carried dependence is the index of the leftmost non- “=” of D(i, j) for the dependence Also called INTER-loop iteration dependences
Example (1) The loop-carried dependence S1 (<) S2 is forward The loop-carried dependence S2 (<) S1 is backward All loop-carried dependences are of level 1, because D(i, j) = (<) for every dependence DO I = 1, 3 S1 A(I+1) = F(I) S2 F(I+1) = A(I) ENDDO S1 (<) S2 and S2 (<) S1
Example (2) All loop-carried dependences are of level 3, because D(i, j) = (=, =, <) Note: level-k dependences are also denoted by Sx k Sy DO I = 1, 10 DO J = 1, 10 DO K = 1, 10 S1 A(I,J,K+1) = A(I,J,K) ENDDO ENDDO ENDDO S1 (=,=,<) S1 Alternative notation for a level-3 dependence S1 3 S1
Direction Vectors and Reordering Transformations (1) Let T be a transformation on a loop nest that does not rearrange the statements in the body of the innermost loop; then T is valid if, after it is applied, none of the direction vectors for dependences with source and sink in the nest has a leftmost non-“=” component that is “>”
Direction Vectors and Reordering Transformations (2) A reordering transformation preserves all level-k dependences if 1) it preserves the iteration order of the level-k loop, 2) if it does not interchange any loop at level < k to a position inside the level-k loop, and 3) if it does not interchange any loop at level > k to a position outside the level-k loop
Both S1 (<) S2 and S2 (<) S1 are level-1 dependences Examples DO I = 1, 10 S1 F(I+1) = A(I) S2 A(I+1) = F(I) ENDDO DO I = 1, 10 DO J = 1, 10 DO K = 1, 10 S1 A(I+1,J+2,K+3) = A(I,J,K) + B ENDDO ENDDO ENDDO Find deps. Find deps. Both S1 (<) S2 and S2 (<) S1 are level-1 dependences S1 (<,<,<) S1 is a level-1 dependence Choose transform Choose transform DO I = 1, 10 DO K = 10, 1, -1 DO J = 1, 10 S1 A(I+1,J+2,K+3) = A(I,J,K) + B ENDDO ENDDO ENDDO DO I = 1, 10 S2 A(I+1) = F(I) S1 F(I+1) = A(I) ENDDO
Direction Vectors and Reordering Transformations (3) A reordering transformation preserves a loop- independent dependence between statements S1 and S2 if it does not move statement instances between iterations and preserves the relative order of S1 and S2
S1 (=) S1 is a loop-independent dependence Examples DO I = 1, 10 S1 A(I) = … S2 … = A(I) ENDDO DO I = 1, 10 DO J = 1, 10 S1 A(I,J) = … S2 … = A(I,J) ENDDO ENDDO Find deps. Find deps. S1 (=) S1 is a loop-independent dependence S1 (=,=) S1 is a loop-independent dependence Choose transform Choose transform DO J = 10, 1, -1 DO I = 10, 1, -1 S1 A(I,J) = … S2 … = A(I,J) ENDDO ENDDO DO I = 10, 1, -1 S2 A(I) = … S1 … = A(I) ENDDO
Combining Direction Vectors When a loop nest has multiple different directions at the same loop level k, we sometimes abbreviate the level-k direction vector component with “*” S1 (*) S2 = S1 (<) S2 S1 (=) S2 S1 (>) S2 DO I = 1, 9 S1 A(I) = … S2 … = A(10-I) ENDDO DO I = 1, 10 S1 S = S + A(I) ENDDO S1 (*) S1 DO I = 1, 10 DO J = 1, 10 S1 A(J) = A(J)+1 ENDDO ENDDO S1 (*,=) S1
Dependence vectors: example 1 DO I = 1, 3 DO J = 1, I S1 A(I+1,J) = A(I,J) ENDDO ENDDO Distance vector is (1,0) S1 (<,=) S1 1 2 3 J I Note: J is inner-loop index, plotted vertically
Dependence vectors: example 2 J 4 DO I = 1, 4 DO J = 1, 4 S1 A(I,J+1) = A(I-1,J) ENDDO ENDDO 3 2 1 Distance vector is (1,1) S1 (<,<) S1 I 1 2 3 4
Dependence vectors: example 3 J 4 DO I = 1, 4 DO J = 1, 5-I S1 A(I+1,J+I-1) = A(I,J+I-1) ENDDO ENDDO 3 2 1 Distance vector is (1,-1) S1 (<,>) S1 I 1 2 3 4
Dependence vectors: example 4 J 4 DO I = 1, 4 DO J = 1, 4 S1 B(I) = B(I) + C(I) S2 A(I+1,J) = A(I,J) ENDDO ENDDO 3 2 1 I 1 2 3 4 Distance vectors are (0,1) and (1,0) S1 (=,<) S1 and S2 (<,=) S2
Parallelization It is valid to convert a sequential loop to a parallel loop if the loop carries no inter- iteration dependence DO I = 1, 4 DO J = 1, 4 S1 A(I,J+1) = A(I,J) ENDDO ENDDO S1 (=,<) S1 parallelize J 4 3 PARALLEL DO I = 1, 4 DO J = 1, 4 S1 A(I,J+1) = A(I,J) ENDDO ENDDO 2 1 I 1 2 3 4
Vectorization (1) A single-statement loop that carries no (inter- iteration) dependence can be vectorized DO I = 1, 4 S1 X(I) = X(I) + C ENDDO vectorize S1 X(1:4) = X(1:4) + C X(1) X(2) X(3) X(4) + Fortran 90 array statement C X(1) X(2) X(3) X(4) Vector operation
Vectorization (2) Loop-carried (inter- iteration) dependence => and is therefore invalid S1 (<) S1 DO I = 1, N S1 X(I+1) = X(I) + C ENDDO S1 X(2:N+1) = X(1:N) + C Invalid (Parallel) Fortran 90 statement I=1 I=12 I=3 I=4 I=5 I=6 I=7 I=8
Vectorization (3) For some compilers: only single loop statements can be vectorized, loops with multiple statements must be transformed using the loop distribution transformation Loop has no loop-carried dependence or has forward flow dependences DO I = 1, N S1 A(I+1) = B(I) + C S2 D(I) = A(I) + E ENDDO loop distribution DO I = 1, N S1 A(I+1) = B(I) + C ENDDO DO I = 1, N S2 D(I) = A(I) + E ENDDO vectorize S1 A(2:N+1) = B(1:N) + C S2 D(1:N) = A(1:N) + E
Vectorization (4) When a loop has backward flow dependences and no loop-independent dependences, interchange the statements to enable loop distribution DO I = 1, N S1 D(I) = A(I) + E S2 A(I+1) = B(I) + C ENDDO loop distribution DO I = 1, N S2 A(I+1) = B(I) + C ENDDO DO I = 1, N S1 D(I) = A(I) + E ENDDO vectorize S2 A(2:N+1) = B(1:N) + C S1 D(1:N) = A(1:N) + E
Vectorization (5) Can this loop be vectorized? Can this loop be parallelized? DO I = 1, N S1 A(I) = A[I+1] ENDDO I=1 I=12 I=3 I=4 I=5 I=6 I=7 I=8
Preliminary Transformations to Support Dependence Testing To determine distance and direction vectors with dependence tests require array subscripts to be in a standard form Preliminary transformations put more subscripts in affine form, where subscripts are linear integer functions of loop induction variables Loop normalization Induction-variable substitution Constant propagation
Loop Normalization S1 (<,=) S1 S1 (<,>) S1 Normalized loop: iteration starts at 0 (or 1) increment with 1 Dependence tests assume iteration spaces are normalized The normalization distorts dependences, but it is otherwise preferred to support loop analysis Make lower bound 1 (or 0) Make stride 1 DO I = 1, M DO J = I, N S1 A(J,I) = A(J,I-1) + 5 ENDDO ENDDO DO I = 1, M DO J = 1, N-I+1 S1 A(J+I-1,I) = A(J+I-1,I-1) + 5 ENDDO ENDDO normalize S1 (<,>) S1 S1 (<,=) S1
Data Flow Analysis Data flow analysis with reaching definitions or SSA form supports constant propagation and dead code elimination The above are needed to standardize array subscripts K = 2 DO I = 1, 100 S1 A(I+K) = A(I) + 5 ENDDO constant propagation DO I = 1, 100 S1 A(I+2,I) = A(I) + 5 ENDDO
Forward Substitution Forward substitution and induction variable substitution (IVS) yield array subscripts that are amenable to dependence analysis DO I = 1, 100 K = I + 2 S1 A(K) = A(K) + 5 ENDDO forward substitution DO I = 1, 100 S1 A(I+2) = A(I+2) + 5 ENDDO
Induction Variable (IV) Substitution Example loop with IVs After IV substitution (IVS) (note the affine indexes) After parallelization I = 0 J = 1 while (I<N) I = I+1 … = A[J] J = J+2 K = 2*I A[K] = … endwhile for i=0 to N-1 S1: … = A[2*i+1] S2: A[2*i+2] = … endfor forall (i=0,N-1) … = A[2*i+1] A[2*i+2] = … endforall IVS Dep test A[] … GCD test to solve dependence equation 2id - 2iu = -1 Since 2 does not divide 1 there is no data dependence. W R A[2*i+1] A[2*i+2]
Dependence Testing After loop normalization, IVS, and constant propagation, array subscripts are standardized Affine subscripts are important, since most dependence tests requires affine forms: a1 i1 + a2 i2 + … + an in + e Non-linear subscripts contain unknowns (symbolic terms), function values, array values, shifts >> <<, etc.
Dependence Equation (1) A dependence equation defines the access requirement DO I = 1, N S1 A(f(I)) = A(g(I)) ENDDO To prove flow dependence: for which values of < is f() = g() DO I = 1, N S1 A(I+1) = A(I) ENDDO +1 = has solution =-1 DO I = 1, N S1 A(2*I+1) = A(2*I) ENDDO 2*+1 = 2* has no solution
Dependence Equation (2) A dependence equation defines the access requirement DO I = 1, N S1 A(f(I)) = A(g(I)) ENDDO To prove anti dependence: for which values of > is f() = g() DO I = 1, N S1 A(I+1) = A(I) ENDDO +1 = has no solution DO I = 1, N S1 A(2*I) = A(2*I+2) ENDDO 2* = 2*+2 has solution =+1
Dependence Equation (3) To prove flow dependence: for which values of < is f() = g() DO I = 1, N DO J = 1, N S1 A(I-1) = A(J) ENDDO ENDDO 1-1 = 2 has solution, note =(1, 2) and =(1, 2) are iteration vectors DO I = 1, N S1 A(5) = A(I) ENDDO 5 = has solution if 5 < N DO I = 1, N S1 A(5) = A(6) ENDDO 5 = 6 has no solution
ZIV, SIV, and MIV ZIV (zero index variable) subscript pairs SIV (single index variable) subscript pairs MIV (multiple index variable) subscript pairs DO I = … DO J = … DO K = … S1 A(5,I+1,J) = A(N,I,K) ENDDO ENDDO ENDDO ZIV pair SIV pair MIV pair
Example First distance vector component is easy (SIV), but second is not so easy (MIV) SIV pair MIV pair DO I = 1, 4 DO J = 1, 5-I S1 A(I+1,J+I-1) = A(I,J+I-1) ENDDO ENDDO J 4 3 Distance vector is (1,-1) S1 (<,>) S1 2 1 I 1 2 3 4
Separability and Coupled Subscripts When testing multidimensional arrays, subscript are separable if indices do not occur in other subscripts ZIV and SIV subscripts pairs are separable MIV pairs may give rise to coupled subscript groups DO I = … DO J = … DO K = … S1 A(I,J,J) = A(I,J,K) ENDDO ENDDO ENDDO Index J is used in both 2nd and 3rd dimension of A-index SIV pair is separable MIV pair gives coupled indices
Dependence Testing Overview Partition subscripts into separable and coupled groups Classify each subscript as ZIV, SIV, or MIV For each separable subscript, apply the applicable single subscript test (ZIV, SIV, or MIV) to prove independence or produce direction vectors for dependence For each coupled group, apply a multiple subscript test If any test yields independence, no further testing needed Otherwise, merge all dependence vectors into one set
Solving Data Dependences Memory disambiguation Left: Unsolvable at compile time Array indices should be affine = linear expression of (outer) loop bounds read(n) Do I = 1, 4 A[I] = A[n] + 3
Dependence Testing is Conservative or Exact Dependence tests solve dependence equations Conservative tests attempts to prove that there is no solution to a dependence equation Absence of a dependence is proven Proving dependence sometimes possible and useful But if proofs fail, dependence must be assumed When dependence has to be assumed, translated code is correct but suboptimal Exact tests detect dependences iff they exists
Test overview: comparing 2 index expr. ZIV Zero Induction Variables: compares 2 expression containing no loop indices SIV Single Induction Variable MIV Multiple Induction Variable GCD: testing multiple dimensions GCD performs no loop bounds checking so GCD can only prove independence !! Banerjee: enhanced GCD, taking loop bounds into account
ZIV Test The ZIV test compares two subscripts If the expressions are proven unequal, no dependence can exist DO I = 1, N S1 A(5) = A(6) ENDDO K = 10 DO I = 1, N S1 A(5) = A(K) ENDDO K = 10 DO I = 1, N DO J = 1, N S1 A(I,5) = A(I,K) ENDDO ENDDO
Strong SIV Test Requires subscript pairs of the form aI+c1 and aI’+c2 for the def and use, respectively Dependence equation aI+c1 = aI’+c2 has a solution if the dependence distance d = (c1 - c2)/a is integer and |d| < U - L with L and U loop bounds DO I = 1, N S1 A(I+1) = A(I) ENDDO DO I = 1, N S1 A(2*I+2) = A(2*I) ENDDO DO I = 1, N S1 A(I+N) = A(I) ENDDO
Weak-Zero SIV Test Requires subscript pairs of the form a1I+c1 and a2I’+c2 with either a1= 0 or a2 = 0 If a2 = 0, the dependence equation I = (c2 - c1)/a1 has a solution if (c2 - c1)/a1 is integer and L < (c2 - c1)/a1 < U If a1 = 0, similar case DO I = 1, N S1 A(I) = A(1) ENDDO DO I = 1, N S1 A(2*I+2) = A(2) ENDDO DO I = 1, N S1 A(1) = A(I) + A(1) ENDDO
GCD The greatest common divisor (GCD) of integers a1, a2, …, an, denoted gcd(a1, a2, …, an), is the largest integer that evenly divides all these integers. Theorem: The linear Diophantine equation has an integer solution x1, x2, …, xn iff gcd(a1, a2, …, an) divides c
GCD examples Example 1: gcd(2,-2) = 2. No solutions Example 2: gcd(24,36,54) = 6. Many solutions
GCD Test Assumes that f() and g() are affine: f(x1,x2,…,xn)= a0+a1x1+…+anxn g(y1,y2,…,yn)= b0+b1y1+…+bnyn Reordering gives linear Diophantine equation a1x1-b1y1 +…+anxn-bnyn =b0-a0 which has a solution iff gcd(a1,…,an,b1,…,bn) divides b0-a0 DO I = 1, N S1 A(2*I+1) = A(2*I) ENDDO DO I = 1, N S1 A(4*I+1) = A(2*I+3) ENDDO DO I = 1, 10 DO J = 1, 10 S1 A(1+2*I+20*J) = A(2+20*I+2*J) ENDDO ENDDO
GCD dependence test example 1 for(int i=0;i<100;i++) { s1 a[2*i]=b[i]; s2 c[i]=a[4*i+6]; } GCD (2,4) = 2, does divide 6 => solution, e.g. 2*3 = 4*0 + 6 is this possible? GCD does not check direction, and whether the solution is within the loop bounds !!
GCD dependence test example 2 for (i=0; i< Ni; i++) for (j=0; j<Nj, j++) A(4*i + 2*j + 1) = .. A(6*i + 2*j + 4) gcd(4,-6,2,-2) = 2 Does 2 divide 4-1?
Banerjee Test for (i=L; i<=U; i++) { x[a_0 + a_1*i] = ... ... = x[b_0 + b_1*i] } Does a_0 + a_1*i = b_0 + b_1*i’ for some integer i and i’? If so then (a_1*i - b_1*i’) = (b_0 - a_0) Determine upper and lower bounds on (a_1*i - b_1*i’) for (i=1; i<=5; i++) x[i+5] = x[i]; upper bound = a_1*max(i) - b_1 * min(i’) = 4 lower bound = a_1*min(i) - b_1*max(i’) = -4 b_0 - a_0 = 5 is out of this range!
Run-Time Dependence Testing When a dependence equation includes unknowns, we can sometimes find the constraints on these unknowns to solve the equation and impose these constraints at run time on multi-version code DO I = 1, 10 S1 A(I) = A(I-K) + B(I) ENDDO IF K = 0 OR K < -9 OR K > 9 THEN PARALEL DO I = 1, 10 S1 A(I) = A(I-K) + B(I) ENDDO ELSE DO I = 1, 10 S1 A(I) = A(I-K) + B(I) ENDDO ENDIF
If nothing work: use Profiling Track dependences you can insert them in the IR of your compiler can 'easily' check all dependences, including e.g. A[B[i]] cases, and inter procedural dependences for i=0,N A[i] = f(..) function f(..) for j=0,M A[j] = A[..] However: when you do not 'see' a dependence (during profiling) independence is not guaranteed
What did you learn?
Summary Loop can be parallelized or vectorized if no inter-iteration dependences GCD test can check whether two affine index expressions can be equal it does not check the dependence directions it does not check whether the solution is within loop bounds More advanced tests needed can take larger computation time can use iteratively more elaborate tests to prove independence
Backup slides
Banerjee Test (1) f() and g() are affine: f(x1,x2,…,xn)= a0+a1x1+…+anxn g(y1,y2,…,yn)= b0+b1y1+…+bnyn Dependence equation a0-b0+a1x1-b1y1 +…+anxn-bnyn=0 has no solution if 0 is not within the lower bound L and upper bound U of the equation’s LHS IF M > 0 THEN DO I = 1, 10 S1 A(I) = A(I+M) + B(I) ENDDO Dependence equation = +M is rewritten as - M + - = 0 Constraints: 1 < M < 0 < < 9 0 < < 9
Banerjee Test (2) Test for (*) dependence: possible dependence because 0 lies within [-,8] IF M > 0 THEN DO I = 1, 10 S1 A(I) = A(I+M) + B(I) ENDDO - M + - = 0 L U step - M + - original eq. - M + - 9 - M + - 0 eliminate - M - 9 - M + 9 eliminate - 8 eliminate M Constraints: 1 < M < 0 < < 9 0 < < 9
1 < M < 0 < < -1 < 8 Banerjee Test (3) Test for (<) dependence: assume that < -1 No dependence because 0 does no lie within [-,-2] IF M > 0 THEN DO I = 1, 10 S1 A(I) = A(I+M) + B(I) ENDDO - M + - = 0 L U step - M + - original eq. - M + - 9 - M + - (+1) eliminate - M - 9 - M - 1 eliminate - - 2 eliminate M Constraints: 1 < M < 0 < < -1 < 8 1 < +1 < < 9
1 < M < 1 < -1 < < 9 Banerjee Test (4) Test for (>) dependence: assume that +1 > Possible dependence because 0 lies within [-,8] IF M > 0 THEN DO I = 1, 10 S1 A(I) = A(I+M) + B(I) ENDDO - M + - = 0 L U step - M + - original eq. - M + - (+1) - M + eliminate - M - 1 - M + 9 eliminate - 8 eliminate M Constraints: 1 < M < 1 < -1 < < 9 0 < < +1 < 8
1 < M < 0 < = < 9 Banerjee Test (5) Test for (=) dependence: assume that = No dependence because 0 does not lie within [-,-1] IF M > 0 THEN DO I = 1, 10 S1 A(I) = A(I+M) + B(I) ENDDO - M + - = 0 L U step - M + - original eq. - M + - eliminate - M eliminate - - 1 eliminate M Constraints: 1 < M < 0 < = < 9
Testing for All Direction Vectors Refine dependence between a pair of statements by testing for the most general set of direction vectors, and upon failure refine each component recursively (*,*,*) (=,*,*) (<,*,*) (>,*,*) (<,=,*) (<,<,*) (<,>,*) (<,=,=) (<,=,<) (<,=,>) yes (may depend) yes no (indep)
Exact analysis Most memory disambiguations are simple integer programs. Approach: Solve exactly – yes, or no solution Solve exactly with Fourier-Motzkin + branch and bound Omega package from University of Maryland