CS314 – Section 5 Recitation 13

Slides:

Advertisements

Similar presentations

Etter/Ingber Arrays and Matrices. Etter/Ingber One-Dimensional Arrays 4 An array is an indexed data structure 4 All variables stored in an array are of.

Advertisements

Optimizing Compilers for Modern Architectures Copyright, 1996 © Dale Carnegie & Associates, Inc. Dependence Testing Allen and Kennedy, Chapter 3 thru Section.

1 ECE734 VLSI Arrays for Digital Signal Processing Loop Transformation.

Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.

Compiler Challenges for High Performance Architectures

Enhancing Fine-Grained Parallelism Chapter 5 of Allen and Kennedy Optimizing Compilers for Modern Architectures.

Dependence Analysis Kathy Yelick Bebop group meeting, 8/3/01.

Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Preliminary Transformations Chapter 4 of Allen and Kennedy Harel Paz.

Entropy Rates of a Stochastic Process

Parallel and Cluster Computing 1. 2 Optimising Compilers u The main specific optimization is loop vectorization u The compilers –Try to recognize such.

Practical Dependence Test Gina Goff, Ken Kennedy, Chau-Wen Tseng PLDI ’91 presented by Chong Liang Ooi.

Stanford University CS243 Winter 2006 Wei Li 1 Loop Transformations and Locality.

Enhancing Fine- Grained Parallelism Chapter 5 of Allen and Kennedy Mirit & Haim.

Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.

CMPUT680 - Fall 2006 Topic A: Data Dependence in Loops José Nelson Amaral

Chapter 8 Introduction to Arrays Part II Dr. Ali Can Takinacı İstanbul Technical University Faculty of Naval Architecture and Ocean Engineering İstanbul.

Reduction in Strength CS 480. Our sample calculation for i := 1 to n for j := 1 to m c [i, j] := 0 for k := 1 to p c[i, j] := c[i, j] + a[i, k] * b[k,

Compiler Improvement of Register Usage Part 1 - Chapter 8, through Section 8.4 Anastasia Braginsky.

CHAPTER 6 REPETITIVE EXECUTION 6.1 Introduction A repetition structure or loop makes possible the repeated execution of one or more statements called the.

CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.

CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic B: Loop Restructuring José Nelson Amaral

Dependence Testing Optimizing Compilers for Modern Architectures, Chapter 3 Allen and Kennedy Presented by Rachel Tzoref and Rotem Oshman.

Data Dependences CS 524 – High-Performance Computing.

Optimizing Compilers for Modern Architectures Dependence: Theory and Practice Allen and Kennedy, Chapter 2 pp

Dependence: Theory and Practice Allen and Kennedy, Chapter 2 Liza Fireman.

Optimizing Compilers for Modern Architectures Dependence: Theory and Practice Allen and Kennedy, Chapter 2.

Chap. 2 Matrices 2.1 Operations with Matrices

Multi-Dimensional Arrays

Enhancing Fine-Grained Parallelism Chapter 5 of Allen and Kennedy Optimizing Compilers for Modern Architectures.

Array Dependence Analysis COMP 621 Special Topics By Nurudeen Lameed

ISBN Chapter 7 Expressions and Assignment Statements.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 Dependence Analysis and Loop Transformations.

Carnegie Mellon Lecture 14 Loop Optimization and Array Analysis I. Motivation II. Data dependence analysis Chapter , 11.6 Dror E. MaydanCS243:

1 Theory and Practice of Dependence Testing Data and control dependences Scalar data dependences  True-, anti-, and output-dependences Loop dependences.

CS 614: Theory and Construction of Compilers Lecture 17 Fall 2003 Department of Computer Science University of Alabama Joel Jones.

Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Loops.

Program Analysis & Transformations Loop Parallelization and Vectorization Toheed Aslam.

Optimizing Compilers for Modern Architectures Enhancing Fine-Grained Parallelism Part II Chapter 5 of Allen and Kennedy.

Dependence Analysis and Loops CS 3220 Spring 2016.

DEPENDENCE-DRIVEN LOOP MANIPULATION Based on notes by David Padua University of Illinois at Urbana-Champaign 1.

CS 170 – INTRO TO SCIENTIFIC AND ENGINEERING PROGRAMMING.

Lecture 38: Compiling for Modern Architectures 03 May 02

Chapter 2 Algorithm Analysis

Code Optimization Overview and Examples

UMBC CMSC 104 – Section 01, Fall 2016

Chapter 2 Section 2 Absolute Value

Value Range Analysis with Modulo Interval Arithmetic

Analysis of Algorithms

Programming For Nuclear Engineers Lecture 6 Arrays

Dependence Analysis Important and difficult

2.4 & 2.5 Absolute Value Inequalities and Equations

Data Dependence, Parallelization, and Locality Enhancement (courtesy of Tarek Abdelrahman, University of Toronto)

Loop Restructuring Loop unswitching Loop peeling Loop fusion

Parallelizing Loops Moreno Marzolla

CS 213: Data Structures and Algorithms

Lecture 11: Advanced Static ILP

מבוסס על פרק 7 בספר של Allen and Kennedy

Visual Basic .NET BASICS

Parallelization, Compilation and Platforms 5LIM0

Preliminary Transformations

Register Pressure Guided Unroll-and-Jam

Code Optimization Overview and Examples Control Flow Graph

Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.

Output Variables {true} S {i = j} i := j; or j := i;

Loop Optimization “Programs spend 90% of time in loops”

Lecture 14: Problems with Lots of Similar Data

Introduction to Optimization

Optimizing single thread performance

Presentation transcript:

CS314 – Section 5 Recitation 13 Long Zhao (lz311@rutgers.edu) SIV, ZIV Vectorization Slides available at http://www.ilab.rutgers.edu/~lz311/CS314

Dependence Distance Vector Definition Suppose that there is a dependence from S1 on the i-th level iteration x to S2 on iteration y; then the i-th element of the dependence distance vector d is defined as d = y - x Example: DO I = 1, 3 DO J = 1, I S1 A(I+1,J) = A(I,J) ENDDO ENDDO True dependence between S1 and itself on i = (1,1) and j = (2,1): d(i, j) = (1,0) i = (2,1) and j = (3,1): d(i, j) = (1,0) i = (2,2) and j = (3,2): d(i, j) = (1,0)

Dependence Direction Vector Definition Suppose that there is a dependence from S1 on iteration i and S2 on iteration j; then the dependence direction vector D(i, j) is defined as “<” if d(i, j)k > 0 “=” if d(i, j)k = 0 “>” if d(i, j)k < 0 D(i, j)k =

Example 4 DO I = 1, 4 DO J = 1, 4 S1 A(I,J+1) = A(I-1,J) ENDDO ENDDO 3 2 1 Distance vector is (1,1) S1 (<,<) S1 I 1 2 3 4

ZIV Test e1, e2 are constants or loop invariant symbols DO j = 1, 100 S A(e1) = A(e2) + B(j) ENDDO e1, e2 are constants or loop invariant symbols If (e1-e2)!=0 No Dependence exists

Strong SIV Test Strong SIV subscripts are of the form For example the following are strong SIV subscripts

Strong SIV Test Strong SIV test when DO i1 = L1, U1 S1 A(f(i1,...,in)) = ... S2 ... = A(g(i1,...,in)) ENDDO Strong SIV test when f(...) = ai+c1 and g(…) = ai’+c2 Plug in α,β and solve for dependence: β-α = (c1 – c2)/a A dependence exists from S1 to S2 if: β-α is an integer |β-α| ≤ U1- L1

Strong SIV Test Example DO k = 1, 100 DO j = 1, 100 S1 A(j+1,k) = ... S2 ... = A(j,k) + 32 ENDDO

Weak SIV Tests Weak SIV subscripts are of the form For example the following are weak SIV subscripts

Weak-zero SIV Test Weak-Zero SIV test when DO i1 = L1, U1 S1 A(f(i1,...,in)) = ... S2 ... = A(g(i1,...,in)) ENDDO Weak-Zero SIV test when f(...) = ai+c1 and g(…) = c2 Plug in α and solve for dependence: α = (c2 – c1)/ a A dependence exists from S1 to S2 if: α is an integer L1 ≤ α ≤ U1

Weak-crossing SIV Test DO i1 = L1, U1 S1 A(f(i1,...,in)) = ... S2 ... = A(g(i1,...,in)) ENDDO Weak-Crossing SIV test when f(...) = ai+c1 and g(…) = -ai+c2 To find crossing point, set α = β and solve: α = (c2 – c1) / 2a A dependence exists from S1 to S2 if: 2α is an integer L1 ≤ α ≤ U1

Vectorization Simple vectorizer assumptions: singly-nested loops constant upper and lower bounds, step is always 1 body is sequence of assignment statements to array variables simple array index expressions of induction variable (i +/- c or c); can use ZIV or SIV test no function calls

Vectorization (1) A single-statement loop that carries no dependence can be vectorized DO I = 1, 4 S1 X(I) = X(I) + C ENDDO vectorize S1 X(1:4) = X(1:4) + C X(1) X(2) X(3) X(4) + Fortran 90 array statement C (anti), d = 0 S1 X(1) X(2) X(3) X(4) Vector operation

Vectorization (2) This example has a loop- carried dependence (true dependence), and is therefore invalid S1 (<) S1 DO I = 1, N S1 X(I+1) = X(I) + C ENDDO (true), d = 1 S1 X(2:N+1) = X(1:N) + C S1 Invalid Fortran 90 statement

Vectorization (2) This example has a loop- carried dependence (anti- dependence), and is valiad S1 (<) S1 DO I = 2, N S1 X(I-1) = X(I) + C ENDDO (anti), d = 1 S1 X(1:N-1) = X(2:N) + C S1

S1 Vectorization (3) (true), d = 1 S2 DO I = 1, N S1 A(I+1) = B(I) + C S2 D(I) = A(I) + E ENDDO Because only single loop statements can be vectorized, loops with multiple statements must be transformed using the loop distribution transformation Loop has no loop-carried dependence or has forward flow dependences loop distribution DO I = 1, N S1 A(I+1) = B(I) + C ENDDO DO I = 1, N S2 D(I) = A(I) + E ENDDO vectorize S1 A(2:N+1) = B(1:N) + C S2 D(1:N) = A(1:N) + E

S1 Vectorization (4) (true), d = 1 S2 DO I = 1, N S1 D(I) = A(I) + E S2 A(I+1) = B(I) + C ENDDO When a loop has backward flow dependences and no loop-independent dependences, interchange the statements to enable loop distribution loop distribution DO I = 1, N S2 A(I+1) = B(I) + C ENDDO DO I = 1, N S1 D(I) = A(I) + E ENDDO vectorize S2 A(2:N+1) = B(1:N) + C S1 D(1:N) = A(1:N) + E

Vectorization (5) S1 S2 DO I = 2, 100 S1 A(I) = D(I-1) + B(I-1) S2 B(I) = A(I-1) * 2 S3 C(I) = C(I-1) + 3 S4 D(I) = C(I) ENDDO S3 S4

Vectorization (5) S1 S2 S3 S4 B: (true), d = 1 A: (true), d = 1 DO I = 2, 100 S1 A(I) = D(I-1) + B(I-1) S2 B(I) = A(I-1) * 2 S3 C(I) = C(I-1) + 3 S4 D(I) = C(I) ENDDO D: (true), d = 1 (true), d = 1 S3 (true), d = 0 S4

Vectorization (5) S3 S1 S4 S2 S1 S3 S2 S4 (true), d = 1 B: (true), d = 1 A: (true), d = 1 S4 S2 D: (true), d = 1 D: (true), d = 1 (true), d = 1 S1 S3 A: (true), d = 1 B: (true), d = 1 (true), d = 0 S2 S4

Vectorization (5) S3 S4 S1 S2 (true), d = 1 DO I = 2, 100 S1 A(I) = D(I-1) + B(I-1) S2 B(I) = A(I-1) * 2 S3 C(I) = C(I-1) + 3 S4 D(I) = C(I) ENDDO S3 (true), d = 0 S4 D: (true), d = 1 DO I = 2, 100 S3 C(I) = C(I-1) + 3 S4 D(I) = C(I) S1 A(I) = D(I-1) + B(I-1) S2 B(I) = A(I-1) * 2 ENDDO S1 A: (true), d = 1 B: (true), d = 1 S2

Vectorization (5) S3 S4 S1 S2 (true), d = 1 DO I = 2, 100 S3 C(I) = C(I-1) + 3 S4 D(I) = C(I) S1 A(I) = D(I-1) + B(I-1) S2 B(I) = A(I-1) * 2 ENDDO S3 (true), d = 0 S4 D: (true), d = 1 DO I = 2, 100 S3 C(I) = C(I-1) + 3 ENDDO DO I = 2, 100 S4 D(I) = C(I) S1 A(I) = D(I-1) + B(I-1) S2 B(I) = A(I-1) * 2 ENDDO S1 A: (true), d = 1 B: (true), d = 1 S2

Vectorization (5) S3 S4 S1 S2 DO I = 2, 100 S3 C(I) = C(I-1) + 3 ENDDO DO I = 2, 100 S4 D(I) = C(I) S1 A(I) = D(I-1) + B(I-1) S2 B(I) = A(I-1) * 2 ENDDO (true), d = 1 S3 (true), d = 0 S4 D: (true), d = 1 DO I = 2, 100 S3 C(I) = C(I-1) + 3 ENDDO S4 D(2:100) = C(2:100) DO I = 2, 100 S1 A(I) = D(I-1) + B(I-1) S2 B(I) = A(I-1) * 2 ENDDO S1 A: (true), d = 1 B: (true), d = 1 S2

Vectorization (5) S3 S4 S1 S2 DO I = 2, 100 S3 C(I) = C(I-1) + 3 ENDDO S4 D(2:100) = C(2:100) DO I = 2, 100 S1 A(I) = D(I-1) + B(I-1) S2 B(I) = A(I-1) * 2 ENDDO (true), d = 1 S3 (true), d = 0 S4 D: (true), d = 1 S1 A: (true), d = 1 B: (true), d = 1 S2