Dependence Analysis and Loops CS 3220 Spring 2016.

Dependence Analysis and Loops CS 3220 Spring 2016

Loop Examples  Loop Permutation for improved locality do j = 1,6 do i = 1,5 do i = 1,5 A(j,i) = A(j,i)+1 A(j,i) = A(j,i)+1 enddo enddoenddo do i = 1,5 do j = 1,6 do j = 1,6 A(j,i) = A(j,i)+1 A(j,i) = A(j,i)+1 enddo enddoenddo

Loop Examples  Parallelization do i = 1,100 A(i) = A(i)+1 A(i) = A(i)+1enddo do i = 1,100 A(i) = A(i-1)+1 A(i) = A(i-1)+1enddo

Data Dependences and Loops  How do we identify dependences in loops?  Simple view – Imagine that all loops are fully unrolled – Examine data dependences as before  Problems  Impractical  Lose loop structure do i = 1,5 A(i) = A(i-1)+1 A(i) = A(i-1)+1enddo

Data Dependence  Definition: Data dependence are constraints on the order in which statements may be executed  Types of dependences  Flow (true) dependences 1 writes memory that s 2 later reads (RAW)  Anti-dependence: s 1 reads memory that s 2 later writes (WAR)  Output dependences: s 1 writes memory that s 2 later writes (WAW)  Input dependences: s 1 reads memory that s 2 later reads (RAR)  Notation: s 1 δ s 2  s 1 is called the source of the dependence  s 2 is called the sink or target  s 1 must be executed before s 2

Example s1 a = b; s2 b = c + d; s3 e = a + d; s4 b = 3; s5 f = b * 2;

Dependences and Loops  Loop-independent dependences  Loop-carried dependences do i = 1,100 A(i) = B(i) + 1 A(i) = B(i) + 1 C(i) = A(i) * 2 C(i) = A(i) * 2enddo do i = 1,100 A(i) = B(i) + 1 A(i) = B(i) + 1 C(i) = A(i-1) * 2 C(i) = A(i-1) * 2enddo Dependences within the same loop iteration Dependences that cross loop iterations

Concepts  Iteration space  A set of tuples that represent the iterations of a loop  Can visualize the dependence in an iterations space do j = 1,6 do i = 1,5 do i = 1,5 A(j,i) = A(j-1,i-1)+1 A(j,i) = A(j-1,i-1)+1 enddo enddoenddo

Protein String Matching Example

Distance Vectors  Idea  Concisely describe dependence relationships between iterations of an iteration space  For each dimension of an iteration space, the distance is the number of iterations between accesses to the same memory location  Definition  v = i T – i S (target– source) do i = 1,6 do j = 1,5 do j = 1,5 A(i,j) = A(i-1,j-2)+1 A(i,j) = A(i-1,j-2)+1 enddo enddoenddo Distance Vector: (1,2)

More Examples  Sample code do i = 1,6 do j = 1,5 do j = 1,5 A(i,j) = A(i-1,j+1)+1 A(i,j) = A(i-1,j+1)+1 enddo enddoenddo Distance vector: ?

Distance Vectors and Loop Transformations  Any transformation we perform on the loop must respect the dependences  Example:  Can we permute the i and j loops? do i = 1,6 do j = 1,5 do j = 1,5 A(i,j) = A(i-1,j-2)+1 A(i,j) = A(i-1,j-2)+1 enddo enddoenddo do j = 1,5 do i = 1,6 do i = 1,6 A(i,j) = A(i-1,j-2)+1 A(i,j) = A(i-1,j-2)+1 enddo enddoenddo

Exercise do i = 1,6 do j = 1,5 A(i,j) = A(i-1,j+1)+1 enddoenddo  Iteration space?  Distance vector?  What if exchange the order of i/j ?  What kinds of dependency?

Direction Vector  Definition  A direction vector serves the same purpose as a distance vector when less precision is required or available  Element i of a direction vector is, or = based on whether the source of the dependence precedes, follows or is in the same iteration as the target in loop I Distance vector = ? (<, <) Direction vector = ? (1,1) do i = 1,6 do j = 1,5 do j = 1,5 A(i,j) = A(i-1,j-1)+1 A(i,j) = A(i-1,j-1)+1 enddo enddoenddo

Distance Vectors: Legality  Definition  A dependence vector, v, is lexicographically nonnegative when the leftmost entry in v is positive or all elements of v are zero Yes: (0,0,0), (0,1), (0,2,-2) No: (-1), (0,-2), (0,-1,1)  A dependence vector is legal when it is lexicographically nonnegative (assuming that indices increase as we iterate)  Why are lexicographically negative distance vectors illegal?  What are legal direction vectors?

Loop-Carried Dependences  Definition  A dependence D=(d1,...dn) is carried at loop level i if di is the first nonzero element of D  Example  Distance vectors: (1,0) for accesses to A (0,1) for accesses to B  Loop-carried dependences The i loop carries dependence due to A The j loop carries dependence due to B do i = 1,5 do j = 1,5 do j = 1,5 A(i,j) = B(i-1,j)+1 A(i,j) = B(i-1,j)+1 B(i,j) = A(i, j-1) * 2 enddo enddoenddo

Parallelization  Idea  Each iteration of a loop may be executed in parallel if it carries no dependences do i = 1,5 do j = 1,5 do j = 1,5 A(i,j) = B(i-1,j-1)+1 A(i,j) = B(i-1,j-1)+1 B(i,j) = A(i, j-1) * 2 enddo enddoenddo i Parallelize i loop?

Parallelization (cont.)  Idea  Each iteration of a loop may be executed in parallel if it carries no dependences do i = 1,5 do j = 1,5 do j = 1,5 A(i,j) = B(i-1,j-1)+1 A(i,j) = B(i-1,j-1)+1 B(i,j) = A(i, j-1) * 2 enddo enddoenddo j Parallelize j loop?

Scalar Expansion  Problem  Loop-carried dependences inhibit parallelism  Scalar references result in loop-carried dependences  Can this loop be parallelized? ?  What kind of dependences are these? ? do i = 1,6 t = A(i) + B(i) C(i) = t + 1/t enddo

Scalar Expansion  Idea  Eliminate false dependences by introducing extra storage  Example  Can this loop be parallelized?  Disadvantages? do i = 1,6 T(i) = A(i) + B(i) C(i) = T(i) + 1/ T(i) enddo

Scalar Expansion Details  Restrictions  The loop must be a countable loop i.e. The loop trip count must be independent of the body of the loop  There can not be loop-carried flow dependences due to the scalar  The expanded scalar must have no upward exposed uses in the loop do i = 1,6 print(t) print(t) t = A(i) + B(i) t = A(i) + B(i) C(i) = t + 1/t C(i) = t + 1/tenddo  Nested loops may require much more storage  When the scalar is live after the loop, we must move the correct array value into the scalar

Example Revisited  Sample code  Why is this legal?  No loop-carried dependences, so we can arbitrarily change order of iteration execution do j = 1,6 do i = 1,5 do i = 1,5 A(j,i) = A(j,i)+1 A(j,i) = A(j,i)+1 enddo enddoenddo do i = 1,5 do j = 1,6 do j = 1,6 A(j,i) = A(j,i)+1 A(j,i) = A(j,i)+1 enddo enddoenddo

Dependence Testing  Consider the following code… do i = 1,5 A(3*i+2) = A(2*i+1)+1 enddo  Question  How do we determine whether one array reference depends on another across iterations of an iteration space?

Dependence Testing in General  General code  There exists a dependence between iterations I=(i1,..., in) and J=(j1,..., jn) when  f(I) = g(J)  (l1,...ln) < I,J < (h1,...,hn) do i1 = l1,h1... do in = ln, hn A(f(i1,...,in)) =... A(g(i1,...,in)) enddo...enddo

Multi-dimension Arrays  Integer linear programming int A[1..100] …A[2*i, 2*j]… …A[2*i+3, 3*j-3] …

Algs for Solving the Dependence Problem  Heuristics  GCD test (Banerjee76,Towle76): determines whether integer solution is possible, no bounds checking  Banerjee test (Banerjee 79): checks real bounds  I-Test (Kong et al. 90): integer solution in real bounds  Lambda test (Li et al. 90): all dimensions simultaneously  Delta test (Goff et al. 91): pattern matches for efficiency  Power test (Wolfe et al. 92): extended GCD and Fourier Motzkin combination  Use some form of Fourier-Motzkin elimination for integers  Parametric Integer Programming (Feautrier91)  Omega test (Pugh92)

Dependence Testing: Simple Case  Sample code do i = l,h A(a*i+c1) =... A(a*i+c2) Enddo  Dependence?  a*i1+c1 = a*i2+c2, or  a*i1 – a*i2 = c2-c1  Solution exists if a divides c2-c1

Example  Code do i = l,h A(2*i+2) = A(2*i-2)+1 enddo  Dependence? 2*i1 – 2*i2 = -2 – 2 = -4 (yes, 2 divides -4)  Kind of dependence?  Anti? i2 + d = i1 ⇒ d = -2  Flow? i1 + d = i2 ⇒ d = 2 i1i2

GCD Test  Idea  Generalize test to linear functions of iterators  Code do i = li,hi do j = lj,hj A(a1*i + a2*j + a0) =... A(b1*i + b2*j + b0)... enddoenddo  Again  a1*i1 - b1*i2 + a2*j1 – b2*j2 = b0 – a0  Solution exists if gcd(a1,a2,b1,b2) divides b0 – a0

Example  Code do i = li,hi do j = lj,hj A(4*i + 2*j + 1) =... A(6*i + 2*j + 4)... enddoenddo

Till Now  Improve performance by...  improving data locality  parallizing the computation  Data Dependences  iteration space  distance vectors and direction vectors  loop carried  Transformation legality  must respect data dependences  scalar expansion as a technique to remove anti and output dependences  Data Dependence Testing  general formulation of the problem  GCD test

Affine Array Indexes  An array access in a loop is affine if  The bounds of the loop are expressed as affine expressions of the surrounding loop variables and symbolic constants  The index for each dimension of the array is also an affine expression of the surrounding loop variables and symbolic constants  Examples X[i-1] X[i, j+1] X[1, i, 2*i+j] X[i*j] : not an affine array access

Nonaffine Accesses in Practice  Sparse matrices  X[Y[i]]

Loop Permutation  Idea  Swap the order of two loops to increase parallelism, to improve spatial locality, or to enable other transformations  Also known as loop interchange do i = 1,5 do j = 1,5 do j = 1,5 x= A(2, j) +1 x= A(2, j) +1 enddo enddoenddo do j = 1,5 do i = 1,5 do i = 1,5 x= A(2, j) +1 x= A(2, j) +1 enddo enddoenddo Accessing strides thru a row of A An invariant w.r.t the inner loop

More examples  A(i,j) do i = 1,5 do j = 1,5 do j = 1,5 x= A(i, j) +1 x= A(i, j) +1 enddo enddoenddo do j = 1,5 do i = 1,5 do i = 1,5 x= A(i, j) +1 x= A(i, j) +1 enddo enddoenddo Stride n access Stride 1 access

Dependency Problem  What is the distance or direction vector of the dependences?  may require an exponential number of calls to a dependence testing algorithm that only returns yes/no  Input: IP problem  Output: distance or direction vector for dependences  Example outputs: (1,0), ( ), (>,=), (0,3) Which one of the above dependence vectors is not legal?  What is the dependence relation?  mapping from one iteration space to another  Input: Presburger formula (i.e. affine constraints, existential and universal quantifiers, logical operators)  Output: simplified presburger formula representing dependence relation  Example input: { [i,j] → [i’,j’] | 1 <= i,j,i’,j’<=10 & i=i’-1 & j=j’ & i<i’ & j<j’ }  Example output: { [i,j] → [i+1,j] | 1 <= i,j <= 10 }

Legality of Loop Interchange  Case analysis of the direction vectors  (=,=) The dependence is loop independent, so it is unaffected by interchange  (=,<) The dependence is carried by the j loop. After interchange the dependence will be (<,=), so the dependence will still be carried by the j loop, so the dependence relations do not change.  (<,=) The dependence is carried by the i loop. After interchange the dependence will be (=,<), so the dependence will still be carried by the i loop, so the dependence relations do not change.

Legality of Loop Interchange (cont.)  More cases  (<,<) The dependence distance is positive in both dimensions. After interchange it will still be positive in both dimensions, so the dependence relations do not change.  ( ) The dependence is carried by the outer loop. After interchange the dependence will be (>,<), which changes the dependences and results in an illegal direction vector, so interchange is illegal.  (>,*) (=,>) Such direction vectors are not possible for the original loop.

Loop Interchange Example  Consider the ( ) case

Frameworks for Loop Transformations  Unimodular Loop Transformations  [Banerjee 90], [Wolf & Lam 91]  For loop permutation, loop reversal, and loop skewing  Idea: T i = i’, T is a matrix, i and i’ are iteration vectors  Transformation is legal if the transformed dependence vector remain lexicographically positive  Limitations only perfectly nested loops all statements are transformed the same

Revisit the Legality of Loop Interchange  Intechange Matrix  (=,=)  (=,<)  ( )

Loop Reversal  Idea  Change the direction of loop iteration (i.e., From low-to-high indices to high-to-low indices or vice versa)  Benefits  Improved cache performance  Enables other transformations (coming soon)  Example

Loop Reversal and Distance Vectors  Impact  Reversal of loop i negates the i-th entry of all distance vectors associated with the loop  What about direction vectors?  When is reversal legal?  When the loop being reversed does not carry a dependence (i.e., When the transformed distance vectors remain legal)  Example do i = 1,5 do j = 1,6 A(i,j) = A(i-1,j-1)+1 enddoenddo Dependence: Distance Vector: (1,1) Transformed Distance Vector: (1,-1) legal ?

Transforming the Dependences and Array Accesses

Loop Reversal Example  Legality  Loop reversal will change the direction of the dependence relation  Is the following legal?

Loop Skewing

Transforming the Loop Bounds

Loop Fusion  Idea  Combine multiple loop nests into one  Example  Pros  May improve data locality  Reduces loop overhead  Enables array contraction (opposite of scalar expansion)  May enable better instruction scheduling  Cons  May hurt data locality  May hurt icache performance

Legality of Loop Fusion  Basic Conditions  Both loops must have same structure Same loop depth; Same loop bounds; Same iteration directions  Dependences must be preserved e.g., Flow dependences must not become anti dependences

Loop Fusion Example  What are the dependences?  Is there some transformation that will enable fusion of these loops?

Loop Fusion Example (cont)  Loop reversal is legal for the original loops  Does not change the direction of any dep. in the original code  Reverse the direction in the fused loop: s 3 δ a s 2 will become s 2 δ f s 3

Loop Distribution  Idea  Split a loop nest into multiple loop nests (the inverse of fusion)  Motivation?  Produces multiple (potentially) less constrained loops  May improve locality  Enable other transformations, such as interchange

Legality  Loop distribution is legal when the loop body contains no cycles in the dependence graph

Example  Reverse of our previous example

Example  If there are no cycles, we can reorder the loops with a topological sort

Loop Unrolling  Motivation  Reduces loop overhead  Improves effectiveness of other transformations  Code scheduling  CSE  The Transformation  Make n copies of the loop: n is the unrolling factor  Adjust loop bounds accordingly

Loop Balance  Problem  We’d like to produce loops with the right balance of memory operations and floating point operations  The ideal balance is machine-dependent e.g. How many load-store units are connected to the L1 cache? e.g. How many functional units are provided?

Unroll and Jam  Idea  Restructure loops so that loaded values are used many times per iteration  Unroll and Jam  Unroll the outer loop some number of times  Fuse (Jam) the resulting inner loops

Example

Unroll and Jam IS Tiling

Discussion  The problem is hard  Just finding a legal unimodular transformation is exponential in the number of loops  Heuristic  Perform reuse analysis to determine innermost tile (ie. localized vector space)  For the localized vector space, break problem into all possible tiling combinations  Apply S(kew)R(eversal)P(ermutation) algorithm in an attempt to make loops fully permutable  Definitely works when dependences are lexicographically positive distance vectors  O(n 2 *d) where n is the loop nest depth and d is the number of dependence vectors

Dependence Analysis and Loops CS 3220 Spring 2016.

Similar presentations

Presentation on theme: "Dependence Analysis and Loops CS 3220 Spring 2016."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dependence Analysis and Loops CS 3220 Spring 2016.

Similar presentations

Presentation on theme: "Dependence Analysis and Loops CS 3220 Spring 2016."— Presentation transcript:

Similar presentations

About project

Feedback