09/20/2012CS4230 CS4230 Parallel Programming Lecture 9: Locality and Data Parallel Algorithms Mary Hall September 20, 2012 1.

Slides:

Advertisements

Similar presentations

Lecture Computer Science I - Martin Hardwick Bubble Sort bool bubble_once( vector &list, int top) // Perform one bubble iteration on a list { bool.

Advertisements

CS 101 Introductory Programming - Lecture 7: Loops In C & Good Coding Practices Presenter: Ankur Chattopadhyay.

Lesson 8 Searching and Sorting Arrays 1CS 1 Lesson 8 -- John Cole.

Lecture 19: Parallel Algorithms

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Sharks and Fishes – The problem

CS107 Introduction to Computer Science Lecture 3, 4 An Introduction to Algorithms: Loops.

1 Parallel Parentheses Matching Plus Some Applications.

Mathematical. Approach  Many of these problems read as brain teasers at first, but can be worked through in a logical way.  Just remember to rely on.

CSE115: Introduction to Computer Science I Dr. Carl Alphonce 343 Davis Hall

CS 536 Spring Global Optimizations Lecture 23.

1 Lecture 25: Parallel Algorithms II Topics: matrix, graph, and sort algorithms Tuesday presentations:  Each group: 10 minutes  Describe the problem,

Monday, 12/9/02, Slide #1 CS 106 Intro to CS 1 Monday, 12/9/02  QUESTIONS??  On HW #5 (Due 5 pm today)  Today:  Recursive functions  Reading: Chapter.

ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.

4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)

CS 106 Introduction to Computer Science I 03 / 07 / 2008 Instructor: Michael Eckmann.

Scalable Algorithmic Techniques (Ch. 4-5 Lin Snyder Text) Johnnie W. Baker Feb

Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.

Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.

Module 6 Lesson 16.

Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30 – 11:50am.

Lecture 8 – Stencil Pattern Stencil Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.

1 Lecture 2: Parallel Programs Topics: parallel applications, parallelization process, consistency models.

Searching. Linear (Sequential) Search Search an array or list by checking items one at a time. Linear search is usually very simple to implement, and.

Today’s lecture 2-Dimensional indexing Color Format Thread Synchronization within for- loops Shared Memory Tiling Review example programs Using Printf.

9.5 Inequalities and two variables

09/21/2010CS4961 CS4961 Parallel Programming Lecture 9: Red/Blue and Introduction to Data Locality Mary Hall September 21,

Chapter 5 Loops. Overview u Loop Statement Syntax  Loop Statement Structure: while, for, do-while u Count-Controlled Loops u Nested Loops u Loop Testing.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

10/02/2012CS4230 CS4230 Parallel Programming Lecture 11: Breaking Dependences and Task Parallel Algorithms Mary Hall October 2,

CS161 Topic #16 1 Today in CS161 Lecture #16 Prepare for the Final Reviewing all Topics this term Variables If Statements Loops (do while, while, for)

Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.

09/25/2012CS4230 CS4230 Parallel Programming Lecture 10: Red-Black, cont., and Data Parallel Algorithms Mary Hall September 25,

CS 100Lecture 191 CS100J Lecture 19 n Previous Lecture –Two dimensional arrays. –Reasonable size problem (a past assignment). –Stepwise refinement. –Use.

9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,

Parallel and Distributed Simulation Time Parallel Simulation.

ITEC 109 Lecture 11 While loops. while loops Review Choices –1 st –2 nd to ?th –Last What happens if you only use ifs? Can you have just an else by itself?

Project18’s Communication Drawing Design By: Camilo A. Silva BIOinformatics Summer 2008.

1 Searching and Sorting Searching algorithms with simple arrays Sorting algorithms with simple arrays –Selection Sort –Insertion Sort –Bubble Sort –Quick.

 In computer programming, a loop is a sequence of instruction s that is continually repeated until a certain condition is reached.  PHP Loops :  In.

CS4961 Parallel Programming Lecture 5: Data and Task Parallelism, cont

Compsci 201 Recitation 10 Professor Peck Jimmy Wei 11/1/2013.

CS/EE 217 GPU Architecture and Parallel Programming Midterm Review

1 Arrays of Arrays An array can represent a collection of any type of object - including other arrays! The world is filled with examples Monthly magazine:

Smoothed Particle Hydrodynamics Matthew Zhu CSCI 5551 — Fall 2015.

A: A: double “4” A: “34” 4.

Dr. Naveed Riaz Design and Analysis of Algorithms 1 1 Formal Methods in Software Engineering Lecture # 26.

1 Recursive algorithms Recursive solution: solve a smaller version of the problem and combine the smaller solutions. Example: to find the largest element.

Fast Marching Algorithm & Minimal Paths Vida Movahedi Elder Lab, February 2010.

1 CS1110 Lecture 16, 26 Oct 2010 While-loops Reading for next time: Ch (arrays) Prelim 2: Tu Nov 9 th, 7:30-9pm. Last name A-Lewis: Olin 155 Last.

1 Chapter 11 Global Properties (Distributed Termination)

 Introduction to Search Algorithms  Linear Search  Binary Search 9-2.

CORRECTNESS ISSUES AND LOOP INVARIANTS Lecture 8 CS2110 – Fall 2014.

08/23/2012CS4230 CS4230 Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 23,

09/10/2010CS4961 CS4961 Parallel Programming Lecture 6: SIMD Parallelism in SSE-3 Mary Hall September 10,

CS4961 Parallel Programming Lecture 8: SIMD, cont

Problem Solving Strategies

Week 9 - Monday CS 121.

Sieve of Eratosthenes.

Lecture 22: Parallel Algorithms

CS/EE 217 – GPU Architecture and Parallel Programming

Recursion and collections

Alpha 21264: Microarchitecture and Performance

L15: CUDA, cont. Memory Hierarchy and Examples

Lecture 8:The For Loop AP Computer Science Principles.

Solve and Graph 2x + 3 < 9 2x + 3 = x = x = 3

Lecture 4: Introduction to Code Analysis

Introduction to High Performance Computing Lecture 16

Presentation transcript:

09/20/2012CS4230 CS4230 Parallel Programming Lecture 9: Locality and Data Parallel Algorithms Mary Hall September 20,

Administrative SVD assignment status -Problems with speedup on water -I suspect OMP implementation -Will resolve today in some way with Axel 09/20/2012CS42302

More locality: Image correlation … Initialize th[i][j] = 0 … /* compute array convolution */ for(m = 0; m < IMAGE_NROWS - TEMPLATE_NROWS + 1; m++){ for(n = 0; n < IMAGE_NCOLS - TEMPLATE_NCOLS + 1; n++){ for(i=0; i < TEMPLATE_NROWS; i++){ for(j=0; j < TEMPLATE_NCOLS; j++){ if(mask[i][j] != 0) { th[m][n] += image[i+m][j+n]; } /* scale array with bright count and template bias */ … th[i][j] = th[i][j] * bc – bias;

Example Data Parallel Algorithm Problem 1 (#10 in Lin/Snyder on p. 111): The Red/Blue computation simulates two interactive flows. An n x n board is initialized so cells have one of three colors: red, white, and blue, where white is empty, red moves right, and blue moves down. Colors wrap around on the opposite side when reaching the edge. In the first half step of an iteration, any red color can move right one cell if the cell to the right is unoccupied (white). On the second half step, any blue color can move down one cell if the cell below it is unoccupied. The case where red vacates a cell (first half) and blue moves into it (second half) is okay. Viewing the board as overlaid with t x t tiles (where t divides n evenly), the computation terminates if any tile’s colored squares are more than c% one color. Use Peril-L to write a solution to the Red/Blue computation. 09/20/2012CS42304

Steps for “Localized” Solution Step 1. Partition global grid for n/t x n/t processors Step 2. Initialize half iteration (red) data structure Step 3. Iterate within each t x t tile until convergence (guaranteed?) Step 4. Compute new positions of red elts & copy blue elts Step 5. Communicate red boundary values Step 6. Compute new positions of blue elts Step 7. Communicate blue boundary values Step 8. Check locally if DONE 09/20/2012CS42305

Steps 1, 3 & 8. Partition Global Grid and Iterate int grid[n,n], lgrid[t,t]; boolean gdone = FALSE; int thr = (n/t)*(n/t); forall (index in(0..thr-1)) int myx = (n/t) * index/(n/t); int myy =(n/t) * (index % (n/t)); lgrid[] = localize(grid[myx,myy]); while (!gdone) { // Steps 3-7: compute new locations and determine if done if (Bsum > threshold || Wsum > threshold || Rsum > threshold) exclusive { gdone = TRUE; } barrier; } 09/20/2012CS42306

Step 2: Initialization of lr the first time int lr[t+2,t+2], lb[t+2,t+2]; // copy from lgrid and grid to lr lr[1:t,1:t] = lgrid[0:t-1,0:t-1]; lr[0,1:t] = grid[myx,myy:myy+t-1]; lr[t,1:t+1] = … ; lr[1:t,0] =…; lr[1:t,t+1] = ; lr[0,0] = lr[0,t+1] = lr[t+1,0] = lr[t+1,t+1] = W; // don’t care about these 09/20/2012CS42307 grid lgrid lr Boundary is ghost cells t

Steps 4 & 5: Compute new positions of red elts and Communicate Boundary Values int lb[t+1,t+1]; lb[0:t+1,0:t+1] = W; for (i=0; i<t+1; i++) { for (j=0; j<t+1; j++) { if (lr[i,j] == B) lb[i,j] = B; if (lr[i,j] == R && lr[i+1,j] = W) lb[i+1,j] = R; else lb[i,j] = R; } } barrier; lgrid[0:t-1,0:t-1] = lb[1:t,1:t]; //update local portion of global ds barrier; //copy leftmost ghosts from grid if (myx-1 >= 0) lb[0,1:t] = grid[myx-1, 0:t-1]; else lb[0,1:t] = grid[n-1,0:t-1]; 09/20/2012CS42308

Steps 6 & 7: Compute new positions of blue elts and communicate Boundary Values int lb[t+1,t+1]; lr[0:t+1,0:t+1] = W; for (i=0; i<t+1; i++) { for (j=0; j<t+1; j++) { if (lb[i,j] == R) then lr[i,j] = R; if (lb[i,j] == B && lb[i,j+1] = W) lr[i,j+1] = B; else lr[i,j] = B; } } barrier; lgrid[0:t-1,0:t-1] = lr[1:t,1:t]; //update local portion of global ds barrier; //copy top ghosts from grid if (myy-1 >= 0) lr[1:t,0] = grid[0:t-1,myy-1]; else lr[1:t,0] = grid[0:t-1,0]; 09/20/2012CS42309

Step 8: Compute locally if DONE for (i=1; i<t+1; i++) { for (j=1; j<t+1; j++) { if (lr[i,j] == R) then Rsum++; if (lr[i,j] == B) then Bsum++; if (lr[i,j] == W) then Wsum++; } 09/20/2012CS423010