Download presentation
Presentation is loading. Please wait.
Published byAmos Malone Modified over 6 years ago
1
Loop Restructuring Loop unswitching Loop peeling Loop fusion
Loop alignment for fusion Loop reversal Loop fission Loop alignment Loop index set splitting Loop interchange Scalar expansion
2
Unswitching Loop unswitching removes loop-independent conditionals
DO I = 1, N DO J = 2, N IF T(I) > 0 THEN A(I,J) = A(I,J-1)*T(I)+B(I) ELSE A(I,J) = ENDIF ENDDO ENDDO Loop unswitching removes loop-independent conditionals Reduces the frequency of executing branches But: leads to code expansion DO I = 1, N IF T(I) > 0 THEN DO J = 2, N A(I,J) = A(I,J-1)*T(I)+B(I) ENDDO ELSE DO J = 2, N A(I,J) = ENDDO ENDIF ENDDO
3
Peeling J = 0 K = M DO I = 0, N A(K) = B(J) - B(K) K = J J = J + 1 ENDDO Loop peeling removes the first (or last) iteration of a loop into separate code Enables loop fusion by changing bounds of one loop to match bounds of another But: leads to code expansion J = 0 K = M A(K) = B(J) - B(K) K = J J = J + 1 DO I = 1, N A(K) = B(J) - B(K) K = J J = J + 1 ENDDO
4
Fusion S1 B(1) = T(1)*X(1) S2 DO I = 2, N S3 B(I) = T(I)*X(I) S4 ENDDO S5 DO I = 2, N S6 A(I) = B(I) - B(I-1) S7 ENDDO Combine two consecutive loops with same IV and loop bounds into one Fused loop must preserve all dependence relations of the original loop Enables more effective scalar optimizations in fused loop But: may reduce temporal locality S1 S6 S3 S6 S1 B(1) = T(1)*X(1) Sx DO I = 2, N S3 B(I) = T(I)*X(I) S6 A(I) = B(I) - B(I-1) Sy ENDDO S1 S6 S3 (=) S6 S3 (<) S6 Original code has dependences S1 S6 and S3 S6 Fused loop has dependences S1 S6 and S3 (=) S6 and S3 (<) S6
5
Which of the three fused loops is legal?
Example a) S1 DO I = 1, N S2 A(I) = B(I) + 1 S3 ENDDO S4 DO I = 1, N S5 C(I) = A(I)/2 S6 ENDDO S7 DO I = 1, N S8 D(I) = 1/C(I+1) S9 ENDDO S1 DO I = 1, N S2 A(I) = B(I) + 1 S3 ENDDO Sx DO I = 1, N S5 C(I) = A(I)/2 S8 D(I) = 1/C(I+1) Sy ENDDO b) Sx DO I = 1, N S2 A(I) = B(I) + 1 S5 C(I) = A(I)/2 Sy ENDDO S7 DO I = 1, N S8 D(I) = 1/C(I+1) S9 ENDDO Which of the three fused loops is legal? c) Sx DO I = 1, N S2 A(I) = B(I) + 1 S5 C(I) = A(I)/2 S8 D(I) = 1/C(I+1) Sy ENDDO
6
Loop deps: S2 (=) S5 S2 (<) S5
Alignment for Fusion S1 DO I = 1, N S2 B(I) = T(I)/C S3 ENDDO S4 DO I = 1, N S5 A(I) = B(I+1) - B(I-1) S6 ENDDO Alignment for fusion changes iteration bounds of one loop to enable fusion when dependences would otherwise prevent fusion S2 S5 S1 DO I = 0, N-1 S2 B(I+1) = T(I+1)/C S3 ENDDO S4 DO I = 1, N S5 A(I) = B(I+1) - B(I-1) S6 ENDDO S2 S5 Sx B(1) = T(1)/C S1 DO I = 1, N-1 S2 B(I+1) = T(I+1)/C S5 A(I) = B(I+1) - B(I-1) S6 ENDDO Sy A(N) = B(N+1) - B(N-1) Loop deps: S2 (=) S5 S2 (<) S5
7
Reversal Reverse the direction of the iteration
S1 DO I = 1, N S2 B(I) = T(I)*X(I) S3 ENDDO S4 DO I = 1, N S5 A(I) = B(I+1) S6 ENDDO Reverse the direction of the iteration Only legal for loops that have no carried dependences Enables loop fusion by ensuring dependences are preserved between loop statements S2 S5 S1 DO I = N, 1, -1 S2 B(I) = T(I)*X(I) S3 ENDDO S4 DO I = N, 1, -1 S5 A(I) = B(I+1) S6 ENDDO S2 S5 S1 DO I = N, 1, -1 S2 B(I) = T(I)*X(I) S5 A(I) = B(I+1) S6 ENDDO S2 (<) S5
8
Fission (1) S1 DO I = 1, 10 S2 DO J = 1, 10 S3 A(I,J) = B(I,J) + C(I,J) S4 D(I,J) = A(I,J-1) * 2.0 S5 ENDDO S6 ENDDO Loop fission (or loop distribution) splits a single loop into multiple loops Enables vectorization Enables parallelization of separate loops if original loop is sequential Loop fission must preserve all dependence relations of the original loop S3 (=,<) S4 S1 DO I = 1, 10 S2 DO J = 1, 10 S3 A(I,J) = B(I,J) + C(I,J) Sx ENDDO Sy DO J = 1, 10 S4 D(I,J) = A(I,J-1) * 2.0 S5 ENDDO S6 ENDDO S3 (=,<) S4 S1 PARALLEL DO I = 1, 10 S3 A(I,1:10)=B(I,1:10)+C(I,1:10) S4 D(I,1:10)=A(I,0:9) * 2.0 S6 ENDDO S3 (=,<) S4
9
S3 (<) S2 S4 (<) S3 S3 (=) S4 S4 (=) S5
Fission (2) S1 DO I = 1, 10 S2 A(I) = A(I) + B(I-1) S3 B(I) = C(I-1)*X + Z S4 C(I) = 1/B(I) S5 D(I) = sqrt(C(I)) S6 ENDDO Compute the acyclic condensation of the dependence graph to find a legal order of the loops S3 (<) S2 S4 (<) S3 S3 (=) S4 S4 (=) S5 S2 S1 DO I = 1, 10 S3 B(I) = C(I-1)*X + Z S4 C(I) = 1/B(I) Sx ENDDO Sy DO I = 1, 10 S2 A(I) = A(I) + B(I-1) Sz ENDDO Su DO I = 1, 10 S5 D(I) = sqrt(C(I)) Sv ENDDO 1 S3 S4 S3 1 S2 S5 S4 Acyclic condensation S5 Dependence graph
10
Alignment S1 DO I = 2, N S2 A(I) = B(I) + C(I) S3 D(I) = A(I-1) * 2.0 S4 ENDDO Align statements in a loop body by expanding the iteration set Attempts to transform loop-carried dependences into loop-independent dependences Enables loop parallelization S2 (<) S3 S1 DO i = 1, N S2 IF (i>1) A(i) = B(i) + C(i) S3 IF (i<N) D(i+1) = A(i) * 2.0 S4 ENDDO S2 (=) S3 S1 Before S2 S1 After S2
11
Index Set Splitting Divide index set into two portions
S1 DO I = 1, 100 S2 A(I) = B(I) + C(I) S3 IF I > 10 THEN S4 D(I) = A(I) + A(I-10) S5 ENDIF S6 ENDDO Divide index set into two portions Removes conditionals to enable other transformations General case handles affine conditions in multi-dimensional loops by detecting a hyperplane through the iteration space polytope But: code expansion S1 DO I = 1, 10 S2 A(I) = B(I) + C(I) Sx ENDDO Sy DO I = 11, 100 S2 A(I) = B(I) + C(I) S4 D(I) = A(I) + A(I-10) Su ENDDO 3*J>I Loop1 Loop2 J I
12
Loop Interchange (1) Changes the nesting order of nested loops
S1 DO I = 1, N S2 DO J = 1, M S3 A(I,J) = A(I,J-1) + B(I,J) S4 ENDDO S5 ENDDO Changes the nesting order of nested loops Loop interchange must preserve all dependence relations of the original loop Enables vectorization of an outer loop Can be used to improve spatial locality S3 (=,<) S3 S2 DO J = 1, M S1 DO I = 1, N S3 A(I,J) = A(I,J-1) + B(I,J) S4 ENDDO S5 ENDDO S3 (<,=) S3 S2 DO J = 1, M S3 A(1:N,J)=A(1:N,J-1)+B(1:N,J) S5 ENDDO S3 (<,=) S3
13
S4 (<,<,=) S4 S4 (<,=,>) S4
Loop Interchange (2) S1 DO I = 1, N S2 DO J = 1, M S3 DO K = 1, L S A(I+1,J+1,K) = A(I,J,K) A(I,J+1,K+1) S5 ENDDO S6 ENDDO S7 ENDDO Compute the direction matrix and find which columns can be permuted without violating dependence relations in original loop nest S4 (<,<,=) S4 S4 (<,=,>) S4 < < = < = > < < = < = > < = < = > < Invalid Direction matrix < < = < = > < < = = < > Valid
14
Scalar Expansion S1 DO I = 1, N S2 T = A(I) + B(I) S3 C(I) = T + 1/T S4 ENDDO Breaks anti-dependence relations by expanding or promoting a scalar into an array Scalar anti-dependence relations prevent certain loop transformations such as loop fission and loop interchange S2 (=) S3 S2 -1(<) S3 Sx IF N > 0 THEN Sy ALLOC Tx(1:N) S1 DO I = 1, N S2 Tx(I) = A(I) + B(I) Sx C(I) = Tx(I) + 1/Tx(I) S4 ENDDO Sz T = Tx(N) Su ENDIF S2 (=) S3
15
Example S2 (=) S4 S4 (=,<) S4 S4 (=) S6 S2 -1(<) S6
S1 DO I = 1, 10 S2 T = A(I,1) S3 DO J = 2, 10 S4 T = T + A(I,J) S5 ENDDO S6 B(I) = T S7 ENDDO S1 DO I = 1, 10 S2 Tx(I) = A(I,1) S3 DO J = 2, 10 S4 Tx(I) = Tx(I)+A(I,J) S5 ENDDO S6 B(I) = Tx(I) S7 ENDDO S2 (=) S4 S4 (=,<) S4 S4 (=) S6 S2 -1(<) S6 S2 (=) S4 S4 (=,<) S4 S4 (=) S6 S1 DO I = 1, 10 S2 Tx(I) = A(I,1) Sx ENDDO S1 DO I = 1, 10 S3 DO J = 2, 10 S4 Tx(I) = Tx(I) + A(I,J) S5 ENDDO Sy ENDDO Sz DO I = 1, 10 S6 B(I) = Tx(I) S7 ENDDO S2 Tx(1:10) = A(1:10,1) S3 DO J = 2, 10 S4 Tx(1:10) = Tx(1:10)+A(1:10,J) S5 ENDDO S6 B(1:10) = Tx(1:10) S2 S4 S4 (<,=) S4 S4 S6 S2 S4 S4 (=,<) S4 S4 S6
16
Other Loop Restructuring Transformations
Loop skewing: denormalize iteration vectors to change the shape of the iteration space (skew) to allow loop interchange Strip mining: decompose a single loop into two nested loops (where the inner loop computes a strip of the data) Loop tiling: the loop space is divided into tiles
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.