Data Dependences CS 524 – High-Performance Computing.

Slides:



Advertisements
Similar presentations
1 ECE734 VLSI Arrays for Digital Signal Processing Loop Transformation.
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
Bernstein’s Conditions. Techniques to Exploit Parallelism in Sequential Programming Hierarchy of levels of parallelism: Procedure or Methods Statements.
Dependency Analysis We want to “parallelize” program to make it run faster. For that, we need dependence analysis to ensure correctness.
Enforcing Sequential Consistency in SPMD Programs with Arrays Wei Chen Arvind Krishnamurthy Katherine Yelick.
Enhancing Fine-Grained Parallelism Chapter 5 of Allen and Kennedy Optimizing Compilers for Modern Architectures.
Dependence Analysis Kathy Yelick Bebop group meeting, 8/3/01.
Parallel and Cluster Computing 1. 2 Optimising Compilers u The main specific optimization is loop vectorization u The compilers –Try to recognize such.
Stanford University CS243 Winter 2006 Wei Li 1 Loop Transformations and Locality.
Enhancing Fine- Grained Parallelism Chapter 5 of Allen and Kennedy Mirit & Haim.
Data Locality CS 524 – High-Performance Computing.
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
CMPUT680 - Fall 2006 Topic A: Data Dependence in Loops José Nelson Amaral
Sorting Algorithms CS 524 – High-Performance Computing.
EECC551 - Shaaban #1 Winter 2002 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization.
EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
1 CS 201 Compiler Construction Lecture 8 Code Optimizations: Partial Dead Code Elimination.
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic B: Loop Restructuring José Nelson Amaral
Parallelizing Compilers Presented by Yiwei Zhang.
A Data Locality Optimizing Algorithm based on A Data Locality Optimizing Algorithm by Michael E. Wolf and Monica S. Lam.
Basic Building Blocks of Programming. Variables and Assignment Think of a variable as an empty container Assignment symbol (=) means putting a value into.
Data Locality CS 524 – High-Performance Computing.
Parallel System Performance CS 524 – High-Performance Computing.
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
Dense Matrix Algorithms CS 524 – High-Performance Computing.
Chapter 8 Arrays and Strings
Optimizing Compilers for Modern Architectures Dependence: Theory and Practice Allen and Kennedy, Chapter 2 pp
Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman.
Optimizing Compilers for Modern Architectures Dependence: Theory and Practice Allen and Kennedy, Chapter 2.
ECE 1747H : Parallel Programming Lecture 1-2: Overview.
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
09/15/2011CS4961 CS4961 Parallel Programming Lecture 8: Dependences and Locality Optimizations Mary Hall September 15,
What’s in an optimizing compiler?
08/26/2010CS4961 CS4961 Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 26,
Array Dependence Analysis COMP 621 Special Topics By Nurudeen Lameed
Chapter 8 Arrays and Strings
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 Dependence Analysis and Loop Transformations.
Carnegie Mellon Lecture 14 Loop Optimization and Array Analysis I. Motivation II. Data dependence analysis Chapter , 11.6 Dror E. MaydanCS243:
1 Theory and Practice of Dependence Testing Data and control dependences Scalar data dependences  True-, anti-, and output-dependences Loop dependences.
Carnegie Mellon Lecture 15 Loop Transformations Chapter Dror E. MaydanCS243: Loop Optimization and Array Analysis1.
10/02/2012CS4230 CS4230 Parallel Programming Lecture 11: Breaking Dependences and Task Parallel Algorithms Mary Hall October 2,
Holistic Mobile Game Development with Unity 2015 Taylor & Francis. All rights Reserved.
Eliminating affinity tests and simplifying shared accesses in UPC Rahul Garg*, Kit Barton*, Calin Cascaval** Gheorghe Almasi**, Jose Nelson Amaral* *University.
CR18: Advanced Compilers L04: Scheduling Tomofumi Yuki 1.
ECE 1754 Loop Transformations by: Eric LaForest
Program Analysis & Transformations Loop Parallelization and Vectorization Toheed Aslam.
Digital Image Processing Lecture 6: Introduction to M- function Programming.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #13 – Other.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
10/01/2009CS4961 CS4961 Parallel Programming Lecture 12/13: Introduction to Locality Mary Hall October 1/3,
Dependence Analysis and Loops CS 3220 Spring 2016.
DEPENDENCE-DRIVEN LOOP MANIPULATION Based on notes by David Padua University of Illinois at Urbana-Champaign 1.
1 CS 201 Compiler Construction Code Optimizations: Partial Dead Code Elimination.
Code Optimization Overview and Examples
CS314 – Section 5 Recitation 13
Dependence Analysis Important and difficult
Data Dependence, Parallelization, and Locality Enhancement (courtesy of Tarek Abdelrahman, University of Toronto)
Loop Restructuring Loop unswitching Loop peeling Loop fusion
Parallelizing Loops Moreno Marzolla
Presented by: Huston Bokinsky Ying Zhang 25 April, 2013
C H A P T E R 3 Vectors in 2-Space and 3-Space
Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.
Loop-Level Parallelism
Introduction to Optimization
Optimizing single thread performance
Multidisciplinary Optimization
Presentation transcript:

Data Dependences CS 524 – High-Performance Computing

CS 524 (Wi 2003/04)- Asim LUMS2 Data Dependences Fundamental execution ordering constraints: S1:a = b + c S2:d = a * 2 S3:a = c + 2 S4:e = d + c + 2  S1 must execute before S2 (flow-dependence)  S2 must execute before S3 (anti-dependence)  S1 must execute before S3 (output-dependence)  But, S3 and S4 can execute concurrently S1 S2 S3S4

CS 524 (Wi 2003/04)- Asim LUMS3 Types of Dependences (1) Three types are usually defined:  Flow-dependence occurs when a variable which is assigned a value in one statement say S 1 is used in another statement, say S 2 later. Written as S 1 δ f S 2  Anti-dependence occurs when a variable which is used in one statement say S 1 is assigned a value in another statement, say S 2, later. Written as S 1 δ a S 2  Output dependence occurs when a variable which is assigned a value in one statement say S 1 is later reassigned in another statement say S 2. Written as S 1 δ o S 2

CS 524 (Wi 2003/04)- Asim LUMS4 Types of Dependences (2) Type of dependence found by using IN and OUT sets for each statement  IN(S): the set of memory locations read by statement S  OUT(S): the set of memory locations written by statement S  A memory location may be included in both IN(S) and OUT(S) If S 1 is executed before S 2 in sequential execution, then  OUT(S 1 ) ∩ IN(S 2 ) ≠ {} ==> S 1 δ f S 2  IN(S 1 ) ∩ OUT(S 2 ) ≠ {} ==> S 1 δ a S 2  OUT(S 1 ) ∩ OUT(S 2 ) ≠ {} ==> S 1 δ o S 2

CS 524 (Wi 2003/04)- Asim LUMS5 Data Dependence in Loops (1) Associate an instance to each statement and determine dependences between the instances  For example, we say S1(10) to mean the instance of S1 when i = 10 do i = 1, 50 S1: A(i) = B(i-1) + C(i) S2: B(i) = A(i+2) + C(i) end do

CS 524 (Wi 2003/04)- Asim LUMS6 Data Dependence in Loops (2) do i = 1, 50 S1: A(i) = B(i-1) + C(i) S2: B(i) = A(i+2) + C(i) end do S1(1): A(1) = B(0) + C(1) S2(1): B(1) = A(3) + C(1) S1(2): A(2) = B(1) + C(2) S2(2): B(2) = A(4) + C(2) S1(3): A(3) = B(2) + C(3) S2(3): B(3) = A(5) + C(3)... S1(50): A(50) = B(49) + C(50) S2(50): B(50) = A(52) + C(50)

CS 524 (Wi 2003/04)- Asim LUMS7 Data Flow Dependence Data written by some statement instance is later read by some statement instance, in the serial execution of the program. This is written as S 1 δ f S 2. do i = 3, 50 S1: A(i+1) =... S2:... = A(i-2)... end do S1 S2

CS 524 (Wi 2003/04)- Asim LUMS8 Data Anti-Dependence Data read by some statement instance is later written by some statement instance, in the serial execution of the program. This is written as S 1 δ a S 2. do i = 1, 50 S1: A(i-1) =... S2:... = A(i+1)... end do S1 S2

CS 524 (Wi 2003/04)- Asim LUMS9 Iteration Space Graph (1) Nested loops define an iteration space: do i = 1, 4 do j = 1, 4 A(i,j) = B(i,j) + C(j) end do Sequential execution (traversal order): j i

CS 524 (Wi 2003/04)- Asim LUMS10 Iteration Space Graph (2) Dimensionality of iteration space = loop nest level  Not restricted to be rectangular  Triangular iteration spaces are common in scientific codes do i = 1, 5 do j = i, 5 A(i,j) = B(i,j) + C(j) end do Sequential execution (traversal order): j i

CS 524 (Wi 2003/04)- Asim LUMS11 Sequential Execution Ordering of execution  Given two iterations (i 1, j 1 ) and (i 2, j 2 ) (with positive loop steps): we say (i 1, j 1 ) PCD (i 2, j 2 ) if and only if either (i 1 < i 2 ) or [(i 1 = i 2 ) AND (j i < j 2 )].  This rule can be extended to multi-dimensional iteration spaces. A vector (d 1, d 2 ) is positive, if (0, 0) PCD (d 1, d 2 ) i.e., its first (leading) non-zero component is positive. PCD = precedes = ordering symbol for vectors

CS 524 (Wi 2003/04)- Asim LUMS12 Dependences in Loop Nests do i 1 = L 1, U 1 do i 2 = L 2, U 2... do i n = L n, U n BODY(i 1, i 2,..., i n ) end do end do... end do There is a dependence in a loop nest if there are iterations I = (i 1, i 2,...,i n ) and J = (j 1, j 2,...,j n ) and some memory location M such that 1. I PCD J 2. BODY(I) and BODY(J) reference M 3. There is no intervening iteration K that accesses M, I PCD K PCD J

CS 524 (Wi 2003/04)- Asim LUMS13 Distance and Direction Vectors Assume a dependence from BODY(I = (i 1, i 2,...,i n )) and BODY(J = (j 1, j 2,...,j n )). The distance vector d = (j 1 – i 1, j 2 – i 2,…, j n – i n ) Define the sign function sgn(x 1 ) of scalar x 1 : -if x 1 < 0 sgn(x 1 ) = 0if x 1 = 0 +if x 1 > 0 The direction vector = (sgn(d 1 ), sgn(d 2 ),…,sgn(d n ) where d k = j k - i k for k = 1,…,n.

CS 524 (Wi 2003/04)- Asim LUMS14 Example of Dependence Vectors do i = 1, N do j = 1, N A(i, j) = A(i, j-3) + A(i-2, j) + A(i-1, j+2) + A(i+1, j-1) end do RHS reference TypeDistance vector Direction vector A(i, j-3)Flow(0, 3)(0, +) A(i-2, j)Flow(2, 0)(+, 0) A(i-1, j+2)Flow(1, -2)(+, -) A(i+1, j-1)Anti(1, -1)(+, -)

CS 524 (Wi 2003/04)- Asim LUMS15 Validity of Loop Permutation (1) Before interchange do i = 1, N do j = 1, N... end do After interchange do j = 1, N do i = 1, N... end do j i j i (+, -) prevents interchange (-+)

CS 524 (Wi 2003/04)- Asim LUMS16 Validity of Loop Permutation (2) Loop permutation is valid if  all dependences are satisfied after interchange  Geometric view: source (of depended statements) still executed before sink  Algebraic view: permuted dependence direction vector is lexicographically non-negative

CS 524 (Wi 2003/04)- Asim LUMS17 Iteration Space Blocking (Tiling) A tile in an n-dimensional iteration space is an n- dimensional subset of the iteration space  A tile is defined by a set of boundaries regularly spaced apart  Each tile boundary is an (n – 1)-dimensional plane j i

CS 524 (Wi 2003/04)- Asim LUMS18 Validity of Loop Blocking Loop blocking is valid if  All dependences still satisfied in tiled execution order of iteration space graph (i.e. source before sink)  Geometric view: No two dependences cross any tile boundary in opposite directions  Algebraic view: Dot product of all dependence distance vectors with normal to tile boundary has same sign

CS 524 (Wi 2003/04)- Asim LUMS19 Summary Data dependency analysis is essential to avoid changing the meaning of the program when performance optimization transformations are done Data dependency analysis is essential in design of parallel algorithms from sequential code Data dependences must be maintained in all loop transformations; otherwise the transformation is illegal Data dependency exist in a loop nest when dependence vector is lexicographically non-negative