Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization.

Slides:



Advertisements
Similar presentations
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Advertisements

1 ECE734 VLSI Arrays for Digital Signal Processing Loop Transformation.
Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Satisfiability Modulo Theories (An introduction)
CS1010 Programming Methodology
Primality Testing Patrick Lee 12 July 2003 (updated on 13 July 2003)
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS Slides.
ACM Workshop Number Theory.
Dependence Analysis Kathy Yelick Bebop group meeting, 8/3/01.
12.1 Systems of Linear Equations: Substitution and Elimination.
Parallel and Cluster Computing 1. 2 Optimising Compilers u The main specific optimization is loop vectorization u The compilers –Try to recognize such.
Stanford University CS243 Winter 2006 Wei Li 1 Loop Transformations and Locality.
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
CMPUT680 - Fall 2006 Topic A: Data Dependence in Loops José Nelson Amaral
Complexity Analysis (Part I)
Discrete Mathematics Lecture 4 Harper Langston New York University.
EECC551 - Shaaban #1 Winter 2003 lec# Static Compiler Optimization Techniques We already examined the following static compiler techniques aimed.
Merge sort, Insertion sort
Discrete Structures Chapter 2 Part B Mathematical Induction
Dependence Testing Optimizing Compilers for Modern Architectures, Chapter 3 Allen and Kennedy Presented by Rachel Tzoref and Rotem Oshman.
Parallelizing Compilers Presented by Yiwei Zhang.
A Data Locality Optimizing Algorithm based on A Data Locality Optimizing Algorithm by Michael E. Wolf and Monica S. Lam.
Merge sort, Insertion sort. Sorting I / Slide 2 Sorting * Selection sort or bubble sort 1. Find the minimum value in the list 2. Swap it with the value.
Data Dependences CS 524 – High-Performance Computing.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Mathematics of Cryptography Part I: Modular Arithmetic, Congruence,
Proofs, Recursion, and Analysis of Algorithms Mathematical Structures for Computer Science Chapter 2 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProofs,
EECC551 - Shaaban #1 Winter 2002 lec# Static Compiler Optimization Techniques We already examined the following static compiler techniques aimed.
Lecture 4 Discrete Mathematics Harper Langston. Algorithms Algorithm is step-by-step method for performing some action Cost of statements execution –Simple.
Optimizing Compilers for Modern Architectures Dependence: Theory and Practice Allen and Kennedy, Chapter 2 pp
Optimizing Compilers for Modern Architectures Dependence: Theory and Practice Allen and Kennedy, Chapter 2.
Mathematics of Cryptography Part I: Modular Arithmetic, Congruence,
Linear Equations and Functions
CSCI 1900 Discrete Structures
Chapter 2 The Fundamentals: Algorithms, the Integers, and Matrices
1 Properties of Integers Objectives At the end of this unit, students should be able to: State the division algorithm Apply the division algorithm Find.
9/2/2015Discrete Structures1 Let us get into… Number Theory.
Module :MA3036NI Cryptography and Number Theory Lecture Week 7
The Impact of Data Dependence Analysis on Compilation and Program Parallelization Original Research by Kleanthis Psarris & Konstantinos Kyriakopoulos Year.
Algorithm Efficiency CS 110: Data Structures and Algorithms First Semester,
Mathematics in OI Prepared by Ivan Li. Mathematics in OI Greatest Common Divisor Finding Primes High Precision Arithmetic Partial Sum and Differencing.
Array Dependence Analysis COMP 621 Special Topics By Nurudeen Lameed
The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.
Greatest Common Divisor Jordi Cortadella Department of Computer Science.
Carnegie Mellon Lecture 14 Loop Optimization and Array Analysis I. Motivation II. Data dependence analysis Chapter , 11.6 Dror E. MaydanCS243:
1 Theory and Practice of Dependence Testing Data and control dependences Scalar data dependences  True-, anti-, and output-dependences Loop dependences.
Data Structure Introduction.
Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Loops.
Application: Algorithms Lecture 20 Section 3.8 Wed, Feb 21, 2007.
Greatest Common Divisors & Least Common Multiples  Definition 4 Let a and b be integers, not both zero. The largest integer d such that d|a and d|b is.
Ref: Pfleeger96, Ch.31 Properties of Arithmetic Reference: Pfleeger, Charles P., Security in Computing, 2nd Edition, Prentice Hall, 1996.
Application: Algorithms Lecture 19 Section 3.8 Tue, Feb 20, 2007.
Chapter 4 With Question/Answer Animations 1. Chapter Summary Divisibility and Modular Arithmetic - Sec 4.1 – Lecture 16 Integer Representations and Algorithms.
Analysis of Algorithms Spring 2016CS202 - Fundamentals of Computer Science II1.
CS 116 Object Oriented Programming II Lecture 13 Acknowledgement: Contains materials provided by George Koutsogiannakis and Matt Bauer.
Dependence Analysis and Loops CS 3220 Spring 2016.
Introduction to Algorithms
Data Dependence, Parallelization, and Locality Enhancement (courtesy of Tarek Abdelrahman, University of Toronto)
MATH301- DISCRETE MATHEMATICS Copyright © Nahid Sultana Dr. Nahid Sultana Chapter 4: Number Theory and Cryptography.
Recursion "To understand recursion, one must first understand recursion." -Stephen Hawking.
Repetition and Loop Statements
Presented by: Huston Bokinsky Ying Zhang 25 April, 2013
Parallelization, Compilation and Platforms 5LIM0
Loops (iterations, repetitions)
Lecture 20 Guest lecturer: Neal Gupta
Introduction to Algorithms
Application: Algorithms
Application: Algorithms
Cryptography Lecture 16.
Presentation transcript:

Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization

CS243 Winter 2006 Stanford University 2 Agenda Introduction Introduction Single Loop Single Loop Nested Loops Nested Loops Data Dependence Analysis Data Dependence Analysis

CS243 Winter 2006 Stanford University 3 Motivation DOALL loops: loops whose iterations can execute in parallel DOALL loops: loops whose iterations can execute in parallel New abstraction needed New abstraction needed Abstraction used in data flow analysis is inadequate Abstraction used in data flow analysis is inadequate Information of all instances of a statement is combined Information of all instances of a statement is combined for i = 11, 20 a[i] = a[i] + 3

CS243 Winter 2006 Stanford University 4 Examples for i = 11, 20 a[i] = a[i] + 3 Parallel for i = 11, 20 a[i] = a[i-1] + 3 Parallel?

CS243 Winter 2006 Stanford University 5 Examples for i = 11, 20 a[i] = a[i] + 3 Parallel for i = 11, 20 a[i] = a[i-1] + 3 for i = 11, 20 a[i] = a[i-10] + 3 Not parallel Parallel?

CS243 Winter 2006 Stanford University 6 Agenda Introduction Introduction Single Loop Single Loop Nested Loops Nested Loops Data Dependence Analysis Data Dependence Analysis

CS243 Winter 2006 Stanford University 7 Data Dependence of Scalar Variables True dependence True dependence Anti-dependence Anti-dependence a = = a a = Output dependence Output dependence Input dependence Input dependence a = = a

CS243 Winter 2006 Stanford University 8 Array Accesses in a Loop for i= 2, 5 a[i] = a[i] + 3 a[2] a[3] a[4] a[5] read write

CS243 Winter 2006 Stanford University 9 Array Anti-dependence for i= 2, 5 a[i-2] = a[i] + 3 a[2] a[0] a[3] a[1] a[4] a[2] a[5] a[3] read write

CS243 Winter 2006 Stanford University 10 Array True-dependence for i= 2, 5 a[i] = a[i-2] + 3 a[0] a[2] a[1] a[3] a[2] a[4] a[3] a[5] read write

CS243 Winter 2006 Stanford University 11 Dynamic Data Dependence Let o and o’ be two (dynamic) operations Let o and o’ be two (dynamic) operations Data dependence exists from o to o’, iff Data dependence exists from o to o’, iff either o or o’ is a write operation either o or o’ is a write operation o and o’ may refer to the same location o and o’ may refer to the same location o executes before o’ o executes before o’

CS243 Winter 2006 Stanford University 12 Static Data Dependence Let a and a’ be two static array accesses (not necessarily distinct) Let a and a’ be two static array accesses (not necessarily distinct) Data dependence exists from a to a’, iff Data dependence exists from a to a’, iff either a or a’ is a write operation either a or a’ is a write operation There exists a dynamic instance of a (o) and a dynamic instance of a’ (o’) such that There exists a dynamic instance of a (o) and a dynamic instance of a’ (o’) such that o and o’ may refer to the same location o and o’ may refer to the same location o executes before o’ o executes before o’

CS243 Winter 2006 Stanford University 13 Recognizing DOALL Loops Find data dependences in loop Find data dependences in loop Definition: a dependence is loop-carried if it crosses an iteration boundary Definition: a dependence is loop-carried if it crosses an iteration boundary If there are no loop-carried dependences then loop is parallelizable If there are no loop-carried dependences then loop is parallelizable

CS243 Winter 2006 Stanford University 14 Compute Dependence There is a dependence between a[i] and a[i-2] if There is a dependence between a[i] and a[i-2] if There exist two iterations i r and i w within the loop bounds such that iterations i r and i w read and write the same array element, respectively There exist two iterations i r and i w within the loop bounds such that iterations i r and i w read and write the same array element, respectively There exist i r, i w, 2 ≤ i r, i w ≤ 5, i r = i w -2 There exist i r, i w, 2 ≤ i r, i w ≤ 5, i r = i w -2 for i= 2, 5 a[i-2] = a[i] + 3

CS243 Winter 2006 Stanford University 15 Compute Dependence There is a dependence between a[i-2] and a[i-2] if There is a dependence between a[i-2] and a[i-2] if There exist two iterations i v and i w within the loop bounds such that iterations i v and i w write the same array element, respectively There exist two iterations i v and i w within the loop bounds such that iterations i v and i w write the same array element, respectively There exist i v, i w, 2 ≤ i v, i w ≤ 5, i v -2= i w -2 There exist i v, i w, 2 ≤ i v, i w ≤ 5, i v -2= i w -2 for i= 2, 5 a[i-2] = a[i] + 3

CS243 Winter 2006 Stanford University 16 Parallelization Is there a loop-carried dependence between a[i] and a[i-2]? Is there a loop-carried dependence between a[i] and a[i-2]? Is there a loop-carried dependence between a[i-2] and a[i-2]? Is there a loop-carried dependence between a[i-2] and a[i-2]? for i= 2, 5 a[i-2] = a[i] + 3

CS243 Winter 2006 Stanford University 17 Agenda Introduction Introduction Single Loops Single Loops Nested Loops Nested Loops Data Dependence Analysis Data Dependence Analysis

CS243 Winter 2006 Stanford University 18 Nested Loops Which loop(s) are parallel? Which loop(s) are parallel? for i1 = 0, 5 for i2 = 0, 3 a[i1,i2] = a[i1-2,i2-1] + 3

CS243 Winter 2006 Stanford University 19 Iteration Space An abstraction for loops An abstraction for loops Iteration is represented as coordinates in iteration space. Iteration is represented as coordinates in iteration space. for i1 = 0, 5 for i2 = 0, 3 a[i1,i2] = 3 i1 i2

CS243 Winter 2006 Stanford University 20 Execution Order Sequential execution order of iterations: Lexicographic order [0,0], [0,1], …[0,3], [1,0], [1,1], …[1,3], [2,0]… Sequential execution order of iterations: Lexicographic order [0,0], [0,1], …[0,3], [1,0], [1,1], …[1,3], [2,0]… Let I = (i 1,i 2,… i n ). I is lexicographically less than I’, I<I’, iff there exists k such that (i 1,… i k-1 ) = (i’ 1,… i’ k-1 ) and i k < i’ k Let I = (i 1,i 2,… i n ). I is lexicographically less than I’, I<I’, iff there exists k such that (i 1,… i k-1 ) = (i’ 1,… i’ k-1 ) and i k < i’ k i1i1 i2i2

CS243 Winter 2006 Stanford University 21 Parallelism for Nested Loops Is there a data dependence between a[i1,i2] and a[i1-2,i2-1]? Is there a data dependence between a[i1,i2] and a[i1-2,i2-1]? There exist i1 r, i2 r, i1 w, i2 w, such that There exist i1 r, i2 r, i1 w, i2 w, such that 0 ≤ i1 r, i1 w ≤ 5, 0 ≤ i1 r, i1 w ≤ 5, 0 ≤ i2 r, i2 w ≤ 3, 0 ≤ i2 r, i2 w ≤ 3, i1 r - 2 = i1 w i1 r - 2 = i1 w i2 r - 1 = i2 w i2 r - 1 = i2 w

CS243 Winter 2006 Stanford University 22 Loop-carried Dependence If there are no loop-carried dependences, then loop is parallelizable. If there are no loop-carried dependences, then loop is parallelizable. Dependence carried by outer loop: Dependence carried by outer loop: i1 r ≠ i1 w i1 r ≠ i1 w Dependence carried by inner loop: Dependence carried by inner loop: i1 r = i1 w i1 r = i1 w i2 r ≠ i2 w i2 r ≠ i2 w This can naturally be extended to dependence carried by loop level k. This can naturally be extended to dependence carried by loop level k.

CS243 Winter 2006 Stanford University 23 Nested Loops Which loop carries the dependence? Which loop carries the dependence? for i1 = 0, 5 for i2 = 0, 3 a[i1,i2] = a[i1-2,i2-1] + 3 i1 i2

CS243 Winter 2006 Stanford University 24 Agenda Introduction Introduction Single Loop Single Loop Nested Loops Nested Loops Data Dependence Analysis Data Dependence Analysis

CS243 Winter 2006 Stanford University 25 Solving Data Dependence Problems Memory disambiguation is un-decidable at compile-time. Memory disambiguation is un-decidable at compile-time. read(n) for i = 0, 3 a[i] = a[n] + 3

CS243 Winter 2006 Stanford University 26 Domain of Data Dependence Analysis Only use loop bounds and array indices which are integer linear functions of variables. Only use loop bounds and array indices which are integer linear functions of variables. for i1 = 1, n for i2 = 2*i1, 100 a[i1+2*i2+3][4*i1+2*i2][i1*i1] = … … = a[1][2*i1+1][i2] + 3

CS243 Winter 2006 Stanford University 27 Equations There is a data dependence, if There is a data dependence, if There exist i1 r, i2 r, i1 w, i2 w, such that There exist i1 r, i2 r, i1 w, i2 w, such that 0 ≤ i1 r, i1 w ≤ n, 2*i1 r ≤ i2 r ≤ 100, 2*i1 w ≤ i2 w ≤ 100, 0 ≤ i1 r, i1 w ≤ n, 2*i1 r ≤ i2 r ≤ 100, 2*i1 w ≤ i2 w ≤ 100, i1 w + 2*i2 w +3 = 1, 4*i1 w + 2*i2 w = 2*i1 r + 1 i1 w + 2*i2 w +3 = 1, 4*i1 w + 2*i2 w = 2*i1 r + 1 Note: ignoring non-affine relations Note: ignoring non-affine relations for i1 = 1, n for i2 = 2*i1, 100 a[i1+2*i2+3][4*i1+2*i2][i1*i1] = … … = a[1][2*i1+1][i2] + 3

CS243 Winter 2006 Stanford University 28 Solutions There is a data dependence, if There is a data dependence, if There exist i1 r, i2 r, i1 w, i2 w, such that There exist i1 r, i2 r, i1 w, i2 w, such that 0 ≤ i1 r, i1 w ≤ n, 2*i1 w ≤ i2 w ≤ 100, 2*i1 w ≤ i2 w ≤ 100, 0 ≤ i1 r, i1 w ≤ n, 2*i1 w ≤ i2 w ≤ 100, 2*i1 w ≤ i2 w ≤ 100, i1 w + 2*i2 w +3 = 1, 4*i1 w + 2*i2 w - 1 = i1 r + 1 i1 w + 2*i2 w +3 = 1, 4*i1 w + 2*i2 w - 1 = i1 r + 1 No solution → No data dependence No solution → No data dependence Solution → there may be a dependence Solution → there may be a dependence

CS243 Winter 2006 Stanford University 29 Form of Data Dependence Analysis Data dependence problems originally contains equalities and equalities Data dependence problems originally contains equalities and equalities Eliminate inequalities in the problem statement: Eliminate inequalities in the problem statement: Replace a ≠ b with two sub-problems: a>b or a b or a<b We get We get

CS243 Winter 2006 Stanford University 30 Form of Data Dependence Analysis Eliminate equalities in the problem statement: Eliminate equalities in the problem statement: Replace a =b with two sub-problems: a≤b and b≤a Replace a =b with two sub-problems: a≤b and b≤a We get We get Integer programming is NP-complete, i.e. Expensive Integer programming is NP-complete, i.e. Expensive

CS243 Winter 2006 Stanford University 31 Techniques: Inexact Tests Examples: GCD test, Banerjee’s test Examples: GCD test, Banerjee’s test 2 outcomes 2 outcomes No → no dependence No → no dependence Don’t know → assume there is a solution → dependence Don’t know → assume there is a solution → dependence Extra data dependence constraints Extra data dependence constraints Sacrifice parallelism for compiler efficiency Sacrifice parallelism for compiler efficiency

CS243 Winter 2006 Stanford University 32 GCD Test Is there any dependence? Is there any dependence? Solve a linear Diophantine equation Solve a linear Diophantine equation 2*i w = 2*i r + 1 2*i w = 2*i r + 1 for i = 1, 100 a[2*i] = … … = a[2*i+1] + 3

CS243 Winter 2006 Stanford University 33 GCD The greatest common divisor (GCD) of integers a 1, a 2, …, a n, denoted gcd(a 1, a 2, …, a n ), is the largest integer that evenly divides all these integers. The greatest common divisor (GCD) of integers a 1, a 2, …, a n, denoted gcd(a 1, a 2, …, a n ), is the largest integer that evenly divides all these integers. Theorem: The linear Diophantine equation Theorem: The linear Diophantine equation has an integer solution x 1, x 2, …, x n iff gcd(a 1, a 2, …, a n ) divides c has an integer solution x 1, x 2, …, x n iff gcd(a 1, a 2, …, a n ) divides c

CS243 Winter 2006 Stanford University 34 Examples Example 1: gcd(2,-2) = 2. No solutions Example 1: gcd(2,-2) = 2. No solutions Example 2: gcd(24,36,54) = 6. Many solutions Example 2: gcd(24,36,54) = 6. Many solutions

CS243 Winter 2006 Stanford University 35 Multiple Equalities Equation 1: gcd(1,-2,1) = 1. Many solutions Equation 1: gcd(1,-2,1) = 1. Many solutions Equation 2: gcd(3,2,1) = 1. Many solutions Equation 2: gcd(3,2,1) = 1. Many solutions Is there any solution satisfying both equations? Is there any solution satisfying both equations?

CS243 Winter 2006 Stanford University 36 The Euclidean Algorithm Assume a and b are positive integers, and a > b. Assume a and b are positive integers, and a > b. Let c be the remainder of a/b. Let c be the remainder of a/b. If c=0, then gcd(a,b) = b. If c=0, then gcd(a,b) = b. Otherwise, gcd(a,b) = gcd(b,c). Otherwise, gcd(a,b) = gcd(b,c). gcd(a 1, a 2, …, a n ) = gcd(gcd(a 1, a 2 ), a 3 …, a n ) gcd(a 1, a 2, …, a n ) = gcd(gcd(a 1, a 2 ), a 3 …, a n )

CS243 Winter 2006 Stanford University 37 Exact Analysis Most memory disambiguations are simple integer programs. Most memory disambiguations are simple integer programs. Approach: Solve exactly – yes, or no solution Approach: Solve exactly – yes, or no solution Solve exactly with Fourier-Motzkin + branch and bound Solve exactly with Fourier-Motzkin + branch and bound Omega package from University of Maryland Omega package from University of Maryland

CS243 Winter 2006 Stanford University 38 Incremental Analysis Use a series of simple tests to solve simple programs (based on properties of inequalities rather than array access patterns) Use a series of simple tests to solve simple programs (based on properties of inequalities rather than array access patterns) Solve exactly with Fourier-Motzkin + branch and bound Solve exactly with Fourier-Motzkin + branch and bound Memoization Memoization Many identical integer programs solved for each program Many identical integer programs solved for each program Save the results so it need not be recomputed Save the results so it need not be recomputed

CS243 Winter 2006 Stanford University 39 State of the Art Multiprocessors need large outer parallel loops Multiprocessors need large outer parallel loops Many inter-procedural optimizations are needed Many inter-procedural optimizations are needed Interprocedural scalar optimizations Interprocedural scalar optimizations Dependence Dependence Privatization Privatization Reduction recognition Reduction recognition Interprocedural array analysis Interprocedural array analysis Array section analysis Array section analysis

CS243 Winter 2006 Stanford University 40 Summary DOALL loops DOALL loops Iteration Space Iteration Space Data dependence analysis Data dependence analysis