09/25/2012CS4230 CS4230 Parallel Programming Lecture 10: Red-Black, cont., and Data Parallel Algorithms Mary Hall September 25, 2012 1.

Slides:



Advertisements
Similar presentations
CS 101 Introductory Programming - Lecture 7: Loops In C & Good Coding Practices Presenter: Ankur Chattopadhyay.
Advertisements

Lecture 19: Parallel Algorithms
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Sharks and Fishes – The problem
CS107 Introduction to Computer Science Lecture 3, 4 An Introduction to Algorithms: Loops.
Image Processing A brief introduction (by Edgar Alejandro Guerrero Arroyo)
CSCI-455/552 Introduction to High Performance Computing Lecture 26.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
CSE115: Introduction to Computer Science I Dr. Carl Alphonce 343 Davis Hall
CS 584. Review n Systems of equations and finite element methods are related.
CS 536 Spring Global Optimizations Lecture 23.
1 Lecture 25: Parallel Algorithms II Topics: matrix, graph, and sort algorithms Tuesday presentations:  Each group: 10 minutes  Describe the problem,
CS 376b Introduction to Computer Vision 02 / 27 / 2008 Instructor: Michael Eckmann.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
1 Parallel Algorithms III Topics: graph and sort algorithms.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
Thresholding Thresholding is usually the first step in any segmentation approach We have talked about simple single value thresholding already Single value.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30 – 11:50am.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Lecture 8 – Stencil Pattern Stencil Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Today’s lecture 2-Dimensional indexing Color Format Thread Synchronization within for- loops Shared Memory Tiling Review example programs Using Printf.
09/21/2010CS4961 CS4961 Parallel Programming Lecture 9: Red/Blue and Introduction to Data Locality Mary Hall September 21,
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.
CSE 501N Fall ‘09 12: Recursion and Recursive Algorithms 8 October 2009 Nick Leidenfrost.
CS 376b Introduction to Computer Vision 03 / 21 / 2008 Instructor: Michael Eckmann.
CSC 211 Data Structures Lecture 13
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
SIMD Image Processor Eric Liskay Andrew Northy Neraj Kumar 1.
Algorithm Design.
10/02/2012CS4230 CS4230 Parallel Programming Lecture 11: Breaking Dependences and Task Parallel Algorithms Mary Hall October 2,
Instructor: S. Narasimhan
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and.
Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.
Parallel and Distributed Simulation Time Parallel Simulation.
CS 484. Iterative Methods n Gaussian elimination is considered to be a direct method to solve a system. n An indirect method produces a sequence of values.
Tessellations You will need two pieces of computer paper and a small piece of thick paper and your final paper.
1 Searching and Sorting Searching algorithms with simple arrays Sorting algorithms with simple arrays –Selection Sort –Insertion Sort –Bubble Sort –Quick.
CS4961 Parallel Programming Lecture 5: Data and Task Parallelism, cont
Data Structures and Algorithms in Parallel Computing
CS/EE 217 GPU Architecture and Parallel Programming Midterm Review
1 Arrays of Arrays An array can represent a collection of any type of object - including other arrays! The world is filled with examples Monthly magazine:
Dr. Naveed Riaz Design and Analysis of Algorithms 1 1 Formal Methods in Software Engineering Lecture # 26.
Fast Marching Algorithm & Minimal Paths Vida Movahedi Elder Lab, February 2010.
Conditionals-Mod8-part21 Conditionals – part 2 Edge Detection Barb Ericson Georgia Institute of Technology May 2007.
1 Chapter 11 Global Properties (Distributed Termination)
Course 3 Binary Image Binary Images have only two gray levels: “1” and “0”, i.e., black / white. —— save memory —— fast processing —— many features of.
09/20/2012CS4230 CS4230 Parallel Programming Lecture 9: Locality and Data Parallel Algorithms Mary Hall September 20,
08/23/2012CS4230 CS4230 Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 23,
09/10/2010CS4961 CS4961 Parallel Programming Lecture 6: SIMD Parallelism in SSE-3 Mary Hall September 10,
Searching and Sorting Searching algorithms with simple arrays
High Altitude Low Opening?
CS4961 Parallel Programming Lecture 8: SIMD, cont
Computer Vision Lecture 13: Image Segmentation III
Sieve of Eratosthenes.
Computer Vision Lecture 12: Image Segmentation II
Lecture 22: Parallel Algorithms
Software Testing (Lecture 11-a)
Repetition Control Structure
Binary Search A binary search algorithm finds the position of a specified value within a sorted array. Binary search is a technique for searching an ordered.
Notes on Assignment 3 OpenMP Stencil Pattern
Solve and Graph 2x + 3 < 9 2x + 3 = x = x = 3
Patterns Paraguin Compiler Version 2.1.
Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Oct 14, 2014 slides6b.ppt 1.
Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Jan 28,
Introduction to High Performance Computing Lecture 16
Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson StencilPattern.ppt Oct 14,
Op Art Directions.
Presentation transcript:

09/25/2012CS4230 CS4230 Parallel Programming Lecture 10: Red-Black, cont., and Data Parallel Algorithms Mary Hall September 25,

Example Data Parallel Algorithm Problem 1 (#10 in Lin/Snyder on p. 111): The Red/Blue computation simulates two interactive flows. An n x n board is initialized so cells have one of three colors: red, white, and blue, where white is empty, red moves right, and blue moves down. Colors wrap around on the opposite side when reaching the edge. In the first half step of an iteration, any red color can move right one cell if the cell to the right is unoccupied (white). On the second half step, any blue color can move down one cell if the cell below it is unoccupied. The case where red vacates a cell (first half) and blue moves into it (second half) is okay. Viewing the board as overlaid with t x t tiles (where t divides n evenly), the computation terminates if any tile’s colored squares are more than c% one color. Use Peril-L to write a solution to the Red/Blue computation. 09/25/2012CS42302

Sequential Red-Black Computation int input[n,n], output[n,n]; done = FALSE; while (!done) { for (xx=0; xx<n; xx+=t) { for (yy=0; yy<n; yy+=t) { // red step for (x=xx; x<xx+t; x++) { for (y=yy; y<yy+t; y++) { output[x][y] = // red if white and left red } } input = &output; output[0:n-1][0:n-1] = “w”; 09/25/2012CS42303

Sequential Red-Black Computation, cont. // blue step for (x=xx; x<xx+t; x++) { for (y=yy; y<yy+t; y++) { output[x][y] = ?? } // Test for convergence for (x=xx; x<xx+t; x++) { for (y=yy; y<yy+t; y++) { case output[x][y] } 09/25/2012CS42304

Steps for “Localized” Solution Step 1. Partition global grid for n/t x n/t processors Step 2. Initialize half iteration (red) data structure Step 3. Iterate within each t x t tile until convergence (guaranteed?) Step 4. Compute new positions of red elts & copy blue elts Step 5. Communicate red boundary values Step 6. Compute new positions of blue elts Step 7. Communicate blue boundary values Step 8. Check locally if DONE 09/25/2012CS42305

Steps 1, 3 & 8. Partition Global Grid and Iterate int grid[n,n], lgrid[t,t]; boolean gdone = FALSE; int thr = (n/t)*(n/t); forall (index in(0..thr-1)) int myx = (n/t) * index/(n/t); int myy =(n/t) * (index % (n/t)); lgrid[] = localize(grid[myx,myy]); while (!gdone) { // Steps 3-7: compute new locations and determine if done if (Bsum > threshold || Wsum > threshold || Rsum > threshold) exclusive { gdone = TRUE; } barrier; } 09/25/2012CS42306

Step 2: Initialization of lr the first time int lr[t+2,t+2], lb[t+2,t+2]; // copy from lgrid and grid to lr lr[1:t,1:t] = lgrid[0:t-1,0:t-1]; lr[0,1:t] = grid[myx,myy:myy+t-1]; lr[t,1:t+1] = … ; lr[1:t,0] =…; lr[1:t,t+1] = ; lr[0,0] = lr[0,t+1] = lr[t+1,0] = lr[t+1,t+1] = W; // don’t care about these 09/25/2012CS42307 grid lgrid lr Boundary is ghost cells t

Steps 4 & 5: Compute new positions of red elts and Communicate Boundary Values int lb[t+1,t+1]; lb[0:t+1,0:t+1] = W; for (i=0; i<t+1; i++) { for (j=0; j<t+1; j++) { if (lr[i,j] == B) lb[i,j] = B; if (lr[i,j] == R && lr[i+1,j] = W) lb[i+1,j] = R; else lb[i,j] = R; } } barrier; lgrid[0:t-1,0:t-1] = lb[1:t,1:t]; //update local portion of global ds barrier; //copy leftmost ghosts from grid if (myx-1 >= 0) lb[0,1:t] = grid[myx-1, 0:t-1]; else lb[0,1:t] = grid[n-1,0:t-1]; 09/25/2012CS42308

Steps 6 & 7: Compute new positions of blue elts and communicate Boundary Values int lb[t+1,t+1]; lr[0:t+1,0:t+1] = W; for (i=0; i<t+1; i++) { for (j=0; j<t+1; j++) { if (lb[i,j] == R) then lr[i,j] = R; if (lb[i,j] == B && lb[i,j+1] = W) lr[i,j+1] = B; else lr[i,j] = B; } } barrier; lgrid[0:t-1,0:t-1] = lr[1:t,1:t]; //update local portion of global ds barrier; //copy top ghosts from grid if (myy-1 >= 0) lr[1:t,0] = grid[0:t-1,myy-1]; else lr[1:t,0] = grid[0:t-1,0]; 09/25/2012CS42309

Step 8: Compute locally if DONE for (i=1; i<t+1; i++) { for (j=1; j<t+1; j++) { if (lr[i,j] == R) then Rsum++; if (lr[i,j] == B) then Bsum++; if (lr[i,j] == W) then Wsum++; } 09/25/2012CS423010

Introduction to Stencils This pattern of looking around your boundary is called a stencil Common in numerical applications Also, common in image and signal processing 09/25/2012CS423011

Example Stencil: Sobel Edge Detection Sobel edge detection: Find the boundaries of the image where there is significant difference as compared to neighboring “pixels” and replace values to find edges for (i = 1; i < ImageNRows - 1; i++) for (j = 1; j < ImageNCols -1; j++) { sum1 = u[i-1][j+1] - u[i-1][j-1] + 2 * u[i][j+1] - 2 * u[i][j-1] + u[i+1][j+1] - u[i+1][j-1]; sum2 = u[i-1][j-1] + 2 * u[i-1][j] + u[i-1][j+1] - u[i+1][j-1] - 2 * u[i+1][j] - u[i+1][j+1]; magnitude = sum1*sum1 + sum2*sum2; if (magnitude > THRESHOLD) e[i][j] = 255; else e[i][j] = 0; } I  J  I  J  sum2 only sum1 only both UE