1 Partitioning Loops with Variable Dependence Distances Yijun Yu and Erik D’Hollander Department of Electronics and Information Systems University of Ghent,

1 Partitioning Loops with Variable Dependence Distances Yijun Yu and Erik D’Hollander Department of Electronics and Information Systems University of Ghent, Belgium

2 Introduction 1.Overview 2.Dependence analysis: pseudo distance matrix (PDM) 3.Loop transformations: unimodular and partitioning 4.Results 5.Conclusion

3 1. Overview Loop with linear array subscripts Solve dependence equation Find all non-constant distances Create maximally covering grid and base-vectors Create the pseudo distance matrix, PDM containing all base-vectors of the covering grid Find independent loops or independent partitions, based on the rank of PDM

4 Approach Uniform or constant distance Variable or non-constant distance rank(H)<loop depth? Non-full rank Full rank Partitioning transformation Unimodular transformation NY Dependence analysis: H= PDM det(H)> 1? Loop parallelization Y N Linear dependence equation Loop transformation:

5 2. Dependence Analysis 4I 1 -I 2 +3=J 1 +J 2 -1 2I 1 +I 2 -2=J 1 -J 2 +2 f(i)=g(j) iA+a = jB+b i=(I 1,I 2 ) j=(J 1,J 2 ) A[f(I)]=… …=A[g(I)] L 1 :do I 1 = -N,N L 2 : do I 2 = -N,N A(4I 1 -I 2 +3,2I 1 +I 2 -2)=… …=A(I 1 +I 2 -1,I 1 -I 2 +2) enddo ijA[f(i)]=A[g(j)]d=|j-i| (1, -5)(3, 10)A[12, -5](2,15) (3, 0)(9, 7)A[15, 4](6, 7) (-3,-3)(-9, 4)A[-6,-11](6, -7) …

6 The dependence distance 1. The linear dependence equation: 2. Using Banerjee’s unimodular transformation U to obtain an echelon matrix S, the equation t S=(b-a) is solved, yielding: 3. The distance between dependent iterations i, j is: U l and U r are left, right halves of U t has constant part t 1 and unknown part t 2

7 The distance set 1. From the dependence equation t S = (b-a), the solution vector t contains a constant and an arbitrary part: 2. Matrix F=U r -U l can be vertically separated into two sub-matrices: 3. The distance set of the dependence equations is: 

8 Distances in the iteration space Iteration-space (i 1,i 2 ) of loop 1 with dep. eqns: 4I 1 -I 2 +3=J 1 +J 2 -1 2I 1 +I 2 -2=J 1 -J 2 +2. The arrows (I1,I2)  (J1,J2) represent the distance vectors between dependent iterations. i1i1 i2i2

9 Distances base vectors 1.The dependence distance is non-constant for the reference pair, e.g. (2,15),(6,7),(6,-7), as highlighted. 2.However, the distance set is spanned by the grid generated by the base vectors (2,1) and (0,2). 3.For example, (2,15) = (2,1) + 7 (0,2), (6, 7) = 3 (2,1) + 2 (0,2), (6, -7) = 3 (2,1) - 5 (0,2). i1i1 i2i2

10 The largest base vectors The distance set is the linear combination of the row vectors in R : A lattice L ( R ) is a group of vectors generated by all the linear combinations of the independent row vectors of a matrix R. We look for the smallest lattice L ( R ) (generating the largest grid) which covers the whole distance set: In this way, possible spurious dependencies introduced by replacing the distance set with a lattice are minimized. 

11 Pseudo Distance Matrix (PDM) A Hermite normal form HNF ( R ) is a full row rank matrix reduced from the echelon form of R by unimodular transformation. Therefore H generates the same lattice as R does, that is, the smallest lattice. In addition, the HNF rows are base vectors. H is called the pseudo distance matrix (PDM), because it generates the distance set from its row vectors. Since the row vectors of H are constant, the techniques from the uniform distance dependence matrix may apply. H = HNF (R) L (H) = L (R)

12 Calculating the PDM 1. Solving the linear dependence equations: 2. Expressing the distance set : 3. Finding the largest base vectors:

13 3. Loop transformations: unimodular and partitioning Legality Any transformation should be legal, i.e. preserve the executing order of dependent iterations. Transformations depending on rank(H): 3.1 Unimodular transformation: non-full rank PDM 3.2 Partitioning transformation: full rank PDM 3.3 Combined approach

14 3.1 Unimodular transformation Given a non-full rank ( r  m ) pseudo distance matrix H, a unimodular matrix T can be developed such that the first m-r columns of HT are zero. As a result, m-r outermost loops can be parallelized.

15 3.2 Partitioning transformation Given a full rank pseudo distance matrix H, the loop nest can be partitioned such that det( H ) partitions are found. The partitioned parallelism is det( H ).

16 3.3 Combined approach After a unimodular transformation on a non-full rank PDM, the transformed PDM matrix has a full rank sub-matrix, S. When the det(S)>1, additional parallelism can be found using loop partitioning transformation.

17 L’ 1 :doall J 1 =-2N,2N L’ 2 : do J 2 =max(-N,-N-J 1 ), min(N,N-J 1 ) I 1 =J 2 I 2 =J 1 +J 2 A(3I 1 +1,2I 1 +I 2 -1)=… …=A(I 1 +3,I 2 +1) enddo enddoall 4. Results (1) Non-full rank PDM PDM=(2,2)(2,0)(0,2) L 1 :do I 1 =-N,N L 2 : do I 2 =-N,N A(3I 1 +1,2I 1 +I 2 -1)=… …=A(I 1 +3,I 2 +1) enddo

18 NF-rank: Dependence graphs j1j1 j2j2 i2i2

19 4. Results (2) partitioning L 1 :do I 1 =-N,N L 2 : do I 2 =-N,N A(4I 1 -I 2 +3,2I 1 +I 2 -2)=… …=A(I 1 +I 2 -1,I 1 -I 2 +2) enddo L’ 1 : doall Io 1 =0,1 L’ 2 : doall Io 2 =0,1 L’ 3 :do I 1 =-N+mod(N+Io 1,2), N-mod(N-Io 1,2),2 io’ 2 =Io 2 +(I 1 -Io 1 )/2 L’ 4 : do I 2 =-N+mod(N+Io’ 2,2), N-mod(N-Io’ 2,2),2 A(4I 1 -I 2 +3,2I 1 +I 2 -2)=… …=A(I 1 +I 2 -1,I 1 -I 2 +2) enddo enddoall

20 F-rank partitioning: dependence graphs

21 4. Results (3) Combined PDM=(2,2)(0, 2) L’ 1 :doall J 1 =-2N,2N L’ 2 : do J 2 =max(-N,-N-J 1 ), min(N,N-J 1 ) I 1 =J 2 I 2 =J 1 +J 2 A(3I 1 +1,2I 1 +I 2 -1)=… …=A(I 1 +3,I 2 +1) enddo enddoall (0, 1) L’’ 1 : doall Jo 2 =0,1 L’’ 2 : doall J 1 =-2N,2N p2=max(-N,-N-J 1 ) q2=min(N,N-J 1 ) L’’ 3 : do J 2 =p 2 +mod(Jo 2 -p 2,2), q 2 -mod(q 2 -Jo 2,2),2 I 1 =J 2 I 2 =J 1 +J 2 A(3I 1 +1,2I 1 +I 2 -1)=… …=A(I 1 +3,I 2 +1) enddo enddoall

22 F-rank submatrix dependence graph j1j1 j2j2 j2j2 j2j2 j1j1 j1j1

23 5. Conclusion The distances of the dependent iterations are non- constant when the array subscripts are linear. A pseudo distance matrix(PDM) with the largest base vectors of the distance space is computed from the linear dependence equations. Parallelism can still be exploited for these loops with variable distances by the unimodular and partitioning transformations that are derived from the PDM.

1 Partitioning Loops with Variable Dependence Distances Yijun Yu and Erik D’Hollander Department of Electronics and Information Systems University of Ghent,

Similar presentations

Presentation on theme: "1 Partitioning Loops with Variable Dependence Distances Yijun Yu and Erik D’Hollander Department of Electronics and Information Systems University of Ghent,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Partitioning Loops with Variable Dependence Distances Yijun Yu and Erik D’Hollander Department of Electronics and Information Systems University of Ghent,

Similar presentations

Presentation on theme: "1 Partitioning Loops with Variable Dependence Distances Yijun Yu and Erik D’Hollander Department of Electronics and Information Systems University of Ghent,"— Presentation transcript:

Similar presentations

About project

Feedback