Download presentation
Presentation is loading. Please wait.
Published byBartholomew Bailey Modified over 8 years ago
1
Dependence Analysis and Loops CS 3220 Spring 2016
2
Loop Examples Loop Permutation for improved locality do j = 1,6 do i = 1,5 do i = 1,5 A(j,i) = A(j,i)+1 A(j,i) = A(j,i)+1 enddo enddoenddo do i = 1,5 do j = 1,6 do j = 1,6 A(j,i) = A(j,i)+1 A(j,i) = A(j,i)+1 enddo enddoenddo
3
Loop Examples Parallelization do i = 1,100 A(i) = A(i)+1 A(i) = A(i)+1enddo do i = 1,100 A(i) = A(i-1)+1 A(i) = A(i-1)+1enddo
4
Data Dependences and Loops How do we identify dependences in loops? Simple view – Imagine that all loops are fully unrolled – Examine data dependences as before Problems Impractical Lose loop structure do i = 1,5 A(i) = A(i-1)+1 A(i) = A(i-1)+1enddo
5
Data Dependence Definition: Data dependence are constraints on the order in which statements may be executed Types of dependences Flow (true) dependences 1 writes memory that s 2 later reads (RAW) Anti-dependence: s 1 reads memory that s 2 later writes (WAR) Output dependences: s 1 writes memory that s 2 later writes (WAW) Input dependences: s 1 reads memory that s 2 later reads (RAR) Notation: s 1 δ s 2 s 1 is called the source of the dependence s 2 is called the sink or target s 1 must be executed before s 2
6
Example s1 a = b; s2 b = c + d; s3 e = a + d; s4 b = 3; s5 f = b * 2;
7
Dependences and Loops Loop-independent dependences Loop-carried dependences do i = 1,100 A(i) = B(i) + 1 A(i) = B(i) + 1 C(i) = A(i) * 2 C(i) = A(i) * 2enddo do i = 1,100 A(i) = B(i) + 1 A(i) = B(i) + 1 C(i) = A(i-1) * 2 C(i) = A(i-1) * 2enddo Dependences within the same loop iteration Dependences that cross loop iterations
8
Concepts Iteration space A set of tuples that represent the iterations of a loop Can visualize the dependence in an iterations space do j = 1,6 do i = 1,5 do i = 1,5 A(j,i) = A(j-1,i-1)+1 A(j,i) = A(j-1,i-1)+1 enddo enddoenddo
9
Protein String Matching Example
10
Distance Vectors Idea Concisely describe dependence relationships between iterations of an iteration space For each dimension of an iteration space, the distance is the number of iterations between accesses to the same memory location Definition v = i T – i S (target– source) do i = 1,6 do j = 1,5 do j = 1,5 A(i,j) = A(i-1,j-2)+1 A(i,j) = A(i-1,j-2)+1 enddo enddoenddo Distance Vector: (1,2)
11
More Examples Sample code do i = 1,6 do j = 1,5 do j = 1,5 A(i,j) = A(i-1,j+1)+1 A(i,j) = A(i-1,j+1)+1 enddo enddoenddo Distance vector: ?
12
Distance Vectors and Loop Transformations Any transformation we perform on the loop must respect the dependences Example: Can we permute the i and j loops? do i = 1,6 do j = 1,5 do j = 1,5 A(i,j) = A(i-1,j-2)+1 A(i,j) = A(i-1,j-2)+1 enddo enddoenddo do j = 1,5 do i = 1,6 do i = 1,6 A(i,j) = A(i-1,j-2)+1 A(i,j) = A(i-1,j-2)+1 enddo enddoenddo
13
Exercise do i = 1,6 do j = 1,5 A(i,j) = A(i-1,j+1)+1 enddoenddo Iteration space? Distance vector? What if exchange the order of i/j ? What kinds of dependency?
14
Direction Vector Definition A direction vector serves the same purpose as a distance vector when less precision is required or available Element i of a direction vector is, or = based on whether the source of the dependence precedes, follows or is in the same iteration as the target in loop I Distance vector = ? (<, <) Direction vector = ? (1,1) do i = 1,6 do j = 1,5 do j = 1,5 A(i,j) = A(i-1,j-1)+1 A(i,j) = A(i-1,j-1)+1 enddo enddoenddo
15
Distance Vectors: Legality Definition A dependence vector, v, is lexicographically nonnegative when the leftmost entry in v is positive or all elements of v are zero Yes: (0,0,0), (0,1), (0,2,-2) No: (-1), (0,-2), (0,-1,1) A dependence vector is legal when it is lexicographically nonnegative (assuming that indices increase as we iterate) Why are lexicographically negative distance vectors illegal? What are legal direction vectors?
16
Loop-Carried Dependences Definition A dependence D=(d1,...dn) is carried at loop level i if di is the first nonzero element of D Example Distance vectors: (1,0) for accesses to A (0,1) for accesses to B Loop-carried dependences The i loop carries dependence due to A The j loop carries dependence due to B do i = 1,5 do j = 1,5 do j = 1,5 A(i,j) = B(i-1,j)+1 A(i,j) = B(i-1,j)+1 B(i,j) = A(i, j-1) * 2 enddo enddoenddo
17
Parallelization Idea Each iteration of a loop may be executed in parallel if it carries no dependences do i = 1,5 do j = 1,5 do j = 1,5 A(i,j) = B(i-1,j-1)+1 A(i,j) = B(i-1,j-1)+1 B(i,j) = A(i, j-1) * 2 enddo enddoenddo i Parallelize i loop?
18
Parallelization (cont.) Idea Each iteration of a loop may be executed in parallel if it carries no dependences do i = 1,5 do j = 1,5 do j = 1,5 A(i,j) = B(i-1,j-1)+1 A(i,j) = B(i-1,j-1)+1 B(i,j) = A(i, j-1) * 2 enddo enddoenddo j Parallelize j loop?
19
Scalar Expansion Problem Loop-carried dependences inhibit parallelism Scalar references result in loop-carried dependences Can this loop be parallelized? ? What kind of dependences are these? ? do i = 1,6 t = A(i) + B(i) C(i) = t + 1/t enddo
20
Scalar Expansion Idea Eliminate false dependences by introducing extra storage Example Can this loop be parallelized? Disadvantages? do i = 1,6 T(i) = A(i) + B(i) C(i) = T(i) + 1/ T(i) enddo
21
Scalar Expansion Details Restrictions The loop must be a countable loop i.e. The loop trip count must be independent of the body of the loop There can not be loop-carried flow dependences due to the scalar The expanded scalar must have no upward exposed uses in the loop do i = 1,6 print(t) print(t) t = A(i) + B(i) t = A(i) + B(i) C(i) = t + 1/t C(i) = t + 1/tenddo Nested loops may require much more storage When the scalar is live after the loop, we must move the correct array value into the scalar
22
Example Revisited Sample code Why is this legal? No loop-carried dependences, so we can arbitrarily change order of iteration execution do j = 1,6 do i = 1,5 do i = 1,5 A(j,i) = A(j,i)+1 A(j,i) = A(j,i)+1 enddo enddoenddo do i = 1,5 do j = 1,6 do j = 1,6 A(j,i) = A(j,i)+1 A(j,i) = A(j,i)+1 enddo enddoenddo
23
Dependence Testing Consider the following code… do i = 1,5 A(3*i+2) = A(2*i+1)+1 enddo Question How do we determine whether one array reference depends on another across iterations of an iteration space?
24
Dependence Testing in General General code There exists a dependence between iterations I=(i1,..., in) and J=(j1,..., jn) when f(I) = g(J) (l1,...ln) < I,J < (h1,...,hn) do i1 = l1,h1... do in = ln, hn A(f(i1,...,in)) =... A(g(i1,...,in)) enddo...enddo
25
Multi-dimension Arrays Integer linear programming int A[1..100] …A[2*i, 2*j]… …A[2*i+3, 3*j-3] …
26
Algs for Solving the Dependence Problem Heuristics GCD test (Banerjee76,Towle76): determines whether integer solution is possible, no bounds checking Banerjee test (Banerjee 79): checks real bounds I-Test (Kong et al. 90): integer solution in real bounds Lambda test (Li et al. 90): all dimensions simultaneously Delta test (Goff et al. 91): pattern matches for efficiency Power test (Wolfe et al. 92): extended GCD and Fourier Motzkin combination Use some form of Fourier-Motzkin elimination for integers Parametric Integer Programming (Feautrier91) Omega test (Pugh92)
27
Dependence Testing: Simple Case Sample code do i = l,h A(a*i+c1) =... A(a*i+c2) Enddo Dependence? a*i1+c1 = a*i2+c2, or a*i1 – a*i2 = c2-c1 Solution exists if a divides c2-c1
28
Example Code do i = l,h A(2*i+2) = A(2*i-2)+1 enddo Dependence? 2*i1 – 2*i2 = -2 – 2 = -4 (yes, 2 divides -4) Kind of dependence? Anti? i2 + d = i1 ⇒ d = -2 Flow? i1 + d = i2 ⇒ d = 2 i1i2
29
GCD Test Idea Generalize test to linear functions of iterators Code do i = li,hi do j = lj,hj A(a1*i + a2*j + a0) =... A(b1*i + b2*j + b0)... enddoenddo Again a1*i1 - b1*i2 + a2*j1 – b2*j2 = b0 – a0 Solution exists if gcd(a1,a2,b1,b2) divides b0 – a0
30
Example Code do i = li,hi do j = lj,hj A(4*i + 2*j + 1) =... A(6*i + 2*j + 4)... enddoenddo
31
Till Now Improve performance by... improving data locality parallizing the computation Data Dependences iteration space distance vectors and direction vectors loop carried Transformation legality must respect data dependences scalar expansion as a technique to remove anti and output dependences Data Dependence Testing general formulation of the problem GCD test
32
Affine Array Indexes An array access in a loop is affine if The bounds of the loop are expressed as affine expressions of the surrounding loop variables and symbolic constants The index for each dimension of the array is also an affine expression of the surrounding loop variables and symbolic constants Examples X[i-1] X[i, j+1] X[1, i, 2*i+j] X[i*j] : not an affine array access
33
Nonaffine Accesses in Practice Sparse matrices X[Y[i]]
34
Loop Permutation Idea Swap the order of two loops to increase parallelism, to improve spatial locality, or to enable other transformations Also known as loop interchange do i = 1,5 do j = 1,5 do j = 1,5 x= A(2, j) +1 x= A(2, j) +1 enddo enddoenddo do j = 1,5 do i = 1,5 do i = 1,5 x= A(2, j) +1 x= A(2, j) +1 enddo enddoenddo Accessing strides thru a row of A An invariant w.r.t the inner loop
35
More examples A(i,j) do i = 1,5 do j = 1,5 do j = 1,5 x= A(i, j) +1 x= A(i, j) +1 enddo enddoenddo do j = 1,5 do i = 1,5 do i = 1,5 x= A(i, j) +1 x= A(i, j) +1 enddo enddoenddo Stride n access Stride 1 access
36
Dependency Problem What is the distance or direction vector of the dependences? may require an exponential number of calls to a dependence testing algorithm that only returns yes/no Input: IP problem Output: distance or direction vector for dependences Example outputs: (1,0), ( ), (>,=), (0,3) Which one of the above dependence vectors is not legal? What is the dependence relation? mapping from one iteration space to another Input: Presburger formula (i.e. affine constraints, existential and universal quantifiers, logical operators) Output: simplified presburger formula representing dependence relation Example input: { [i,j] → [i’,j’] | 1 <= i,j,i’,j’<=10 & i=i’-1 & j=j’ & i<i’ & j<j’ } Example output: { [i,j] → [i+1,j] | 1 <= i,j <= 10 }
37
Legality of Loop Interchange Case analysis of the direction vectors (=,=) The dependence is loop independent, so it is unaffected by interchange (=,<) The dependence is carried by the j loop. After interchange the dependence will be (<,=), so the dependence will still be carried by the j loop, so the dependence relations do not change. (<,=) The dependence is carried by the i loop. After interchange the dependence will be (=,<), so the dependence will still be carried by the i loop, so the dependence relations do not change.
38
Legality of Loop Interchange (cont.) More cases (<,<) The dependence distance is positive in both dimensions. After interchange it will still be positive in both dimensions, so the dependence relations do not change. ( ) The dependence is carried by the outer loop. After interchange the dependence will be (>,<), which changes the dependences and results in an illegal direction vector, so interchange is illegal. (>,*) (=,>) Such direction vectors are not possible for the original loop.
39
Loop Interchange Example Consider the ( ) case
40
Frameworks for Loop Transformations Unimodular Loop Transformations [Banerjee 90], [Wolf & Lam 91] For loop permutation, loop reversal, and loop skewing Idea: T i = i’, T is a matrix, i and i’ are iteration vectors Transformation is legal if the transformed dependence vector remain lexicographically positive Limitations only perfectly nested loops all statements are transformed the same
41
Revisit the Legality of Loop Interchange Intechange Matrix (=,=) (=,<) ( )
42
Loop Reversal Idea Change the direction of loop iteration (i.e., From low-to-high indices to high-to-low indices or vice versa) Benefits Improved cache performance Enables other transformations (coming soon) Example
43
Loop Reversal and Distance Vectors Impact Reversal of loop i negates the i-th entry of all distance vectors associated with the loop What about direction vectors? When is reversal legal? When the loop being reversed does not carry a dependence (i.e., When the transformed distance vectors remain legal) Example do i = 1,5 do j = 1,6 A(i,j) = A(i-1,j-1)+1 enddoenddo Dependence: Distance Vector: (1,1) Transformed Distance Vector: (1,-1) legal ?
44
Transforming the Dependences and Array Accesses
45
Loop Reversal Example Legality Loop reversal will change the direction of the dependence relation Is the following legal?
46
Loop Skewing
47
Transforming the Loop Bounds
48
Loop Fusion Idea Combine multiple loop nests into one Example Pros May improve data locality Reduces loop overhead Enables array contraction (opposite of scalar expansion) May enable better instruction scheduling Cons May hurt data locality May hurt icache performance
49
Legality of Loop Fusion Basic Conditions Both loops must have same structure Same loop depth; Same loop bounds; Same iteration directions Dependences must be preserved e.g., Flow dependences must not become anti dependences
50
Loop Fusion Example What are the dependences? Is there some transformation that will enable fusion of these loops?
51
Loop Fusion Example (cont) Loop reversal is legal for the original loops Does not change the direction of any dep. in the original code Reverse the direction in the fused loop: s 3 δ a s 2 will become s 2 δ f s 3
52
Loop Distribution Idea Split a loop nest into multiple loop nests (the inverse of fusion) Motivation? Produces multiple (potentially) less constrained loops May improve locality Enable other transformations, such as interchange
53
Legality Loop distribution is legal when the loop body contains no cycles in the dependence graph
54
Example Reverse of our previous example
55
Example If there are no cycles, we can reorder the loops with a topological sort
56
Loop Unrolling Motivation Reduces loop overhead Improves effectiveness of other transformations Code scheduling CSE The Transformation Make n copies of the loop: n is the unrolling factor Adjust loop bounds accordingly
57
Loop Balance Problem We’d like to produce loops with the right balance of memory operations and floating point operations The ideal balance is machine-dependent e.g. How many load-store units are connected to the L1 cache? e.g. How many functional units are provided?
58
Unroll and Jam Idea Restructure loops so that loaded values are used many times per iteration Unroll and Jam Unroll the outer loop some number of times Fuse (Jam) the resulting inner loops
59
Example
60
Unroll and Jam IS Tiling
61
Discussion The problem is hard Just finding a legal unimodular transformation is exponential in the number of loops Heuristic Perform reuse analysis to determine innermost tile (ie. localized vector space) For the localized vector space, break problem into all possible tiling combinations Apply S(kew)R(eversal)P(ermutation) algorithm in an attempt to make loops fully permutable Definitely works when dependences are lexicographically positive distance vectors O(n 2 *d) where n is the loop nest depth and d is the number of dependence vectors
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.