Embarrassingly Parallel Computations

Slides:

Advertisements

Similar presentations

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Practical techniques & Examples

4.1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M.

COMPE472 Parallel Computing Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations.

Embarrassingly Parallel (or pleasantly parallel) Domain divisible into a large number of independent parts. Minimal or no communication Each processor.

Computational Physics Lecture 4 Dr. Guy Tel-Zur.

CSE 160/Berman Programming Paradigms and Algorithms W+A 3.1, 3.2, p. 178, 6.3.2, H. Casanova, A. Legrand, Z. Zaogordnov, and F. Berman, "Heuristics.

EECC756 - Shaaban #1 lec # 8 Spring Message-Passing Computing Examples Problems with a very large degree of parallelism: –Image Transformations:

Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Characteristics of Embarrassingly Parallel Computations Easily parallelizable Little or no interaction between processes Can give maximum speedup.

Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.

1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

Basics of Rendering Pipeline Based Rendering –Objects in the scene are rendered in a sequence of steps that form the Rendering Pipeline. Ray-Tracing –A.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Scientific Computing Lecture 5 Dr. Guy Tel-Zur Autumn Colors, by Bobby Mikul, Mikul Autumn Colors, by Bobby Mikul,

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

Embarrassingly Parallel Computations processes …….. Input data results Each process requires different data and produces results from its input without.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

CSCI-455/552 Introduction to High Performance Computing Lecture 9.

CSCI-455/552 Introduction to High Performance Computing Lecture 23.

Embarrassingly Parallel (or pleasantly parallel) Characteristics Domain divisible into a large number of independent parts. Little or no communication.

Some Maths David Meredith Aalborg University.

Courtesy: Dr. David Walker, Cardiff University

2D Transformations with Matrices

Parallel Computers Chapter 1.

Embarrassingly Parallel (or pleasantly parallel)

CS 4731: Computer Graphics Lecture 20: Raster Graphics Part 1

Problem Solving Strategies

CS1371 Introduction to Computing for Engineers

Introduction to parallel algorithms

Partitioning and Divide-and-Conquer Strategies

Course Description Algorithms are: Recipes for solving problems.

JavaScript: Functions.

Pipelined Computations

Parallel Sorting Algorithms

Hash functions Open addressing

Parallel Programming with MPI and OpenMP

Chapter 8: Introduction to High-Level Language Programming

Fitting Curve Models to Edges

Embarrassingly Parallel

Decomposition Data Decomposition Functional Decomposition

Parallel Programming in C with MPI and OpenMP

Introduction to parallel algorithms

May 19 Lecture Outline Introduce MPI functionality

CSCE569 Parallel Computing

Parallel Sorting Algorithms

CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.

Introduction to parallelism and the Message Passing Interface

Monte Carlo Methods A so-called “embarrassingly parallel” computation as it decomposes into obviously independent tasks that can be done in parallel without.

Introduction to High Performance Computing Lecture 7

Parallel Techniques • Embarrassingly Parallel Computations

Embarrassingly Parallel Computations

Dr. Jim Rowan ITEC 2110 Chapter 3

More on HW 2 (due Jan 26) Again, it must be in Python 2.7.

Matrix Addition and Multiplication

Introduction to High Performance Computing Lecture 16

Introduction to High Performance Computing Lecture 17

Parallel Programming in C with MPI and OpenMP

S.JOSEPHINE THERESA, DEPT OF CS, SJC, TRICHY-2

Course Description Algorithms are: Recipes for solving problems.

CSCE 313: Embedded Systems Scaling Multiprocessors

Introduction to parallel algorithms

Introduction to High Performance Computing Lecture 8

Presentation transcript:

Embarrassingly Parallel Computations Chapter 3

1. Ideal Parallel Computation In this chapter, we will consider applications where there is minimal interaction between the slave processes. Slaves may be statically assigned or dynamically assigned. Load balancing techniques can offer improved performance. We will introduce simple load balancing here Chapter 7 will cover load balancing when the slaves must interact.

1. Ideal Parallel Computation Parallelizing these should be obvious and requires no special techniques or algorithms. A truly embarrassingly parallel computation requires no communication. (SPMD Model)

1. Ideal Parallel Computation Nearly embarrassingly parallel computations Require results to be distributed and collected in some way. A common approach is the Master-Slave organization.

2. Embarrassingly Parallel Examples 2.1 Geometrical Transformation of Images The most basic way to store a computer image is a pixmap (each pixel is stored in a 2D array) Geometrical transformations require mathematical operations to be performed on the coordinates of each pixel without affecting its value.

2.1 Geometrical Transformation of Images There are several different transformations Shifting Object is shifted Dx in the x dimension and Dy in the y dimension: x’=x+ Dx y’=y+ Dy Where x and y are the original and x’ and y’ are the new coordinates. Scaling Object scaled by factor Sx in the x-direction and Sy in the y-direction x’=xSx y’=ySy

2.1 Geometrical Transformation of Images Rotation Object rotated through an angle q about the origin of the coordinate system: x’=x cosq + y sinq y’=-x sinq + y cosq Clipping Deletes from the displayed picture those points outside the defined area.

2.1 Geometrical Transformation of Images The input data is the pixmap (bitmap) Typically held in a file, then copied into an array. The master is concerned with dividing up the array among the processors. There are two methods: Blocks Stripes

2.1 Geometrical Transformation of Images

2.1 Geometrical Transformation of Images

2.1 Geometrical Transformation of Images: Master – pseudo code for (i = 0, row = 0; i < 48; i++, row = row + 10) /* for each process*/ send(row, Pi); /* send row no.*/ for (i = 0; i < 480; i++) /* initialize temp */ for (j = 0; j < 640; j++) temp_map[i][j] = 0; for (i = 0; i < (640 * 480); i++) { /* for each pixel */ recv(oldrow,oldcol,newrow,newcol, PANY); /* accept new coords */ if !((newrow < 0)||(newrow >= 480)||(newcol < 0)||(newcol >= 640)) temp_map[newrow][newcol]=map[oldrow][oldcol]; } for (i = 0; i < 480; i++) /* update bitmap */ map[i][j] = temp_map[i][j];

2.1 Geometrical Transformation of Images: Slave – pseudo code recv(row, Pmaster); /* receive row no. */ for (oldrow = row; oldrow < (row + 10); oldrow++) for (oldcol = 0; oldcol < 640; oldcol++) {/* transform coords */ newrow = oldrow + delta_x; /* shift in x direction */ newcol = oldcol + delta_y; /* shift in y direction */ send(oldrow,oldcol,newrow,newcol, Pmaster);/* coords to master */ }

2.1 Geometrical Transformation of Images: Analysis Sequential: If each pixel required 1 time step then ts = n2 So, time complexity is O(n2) Recall that for parallel tp = tcomp + tcomm tcomm = tstartup + mtdata = O(m) tcomp = 2(n2/p) = O(n2/p) tp = O(n2) What’s the problem?

2.1 Geometrical Transformation of Images: Analysis The constants for the communication (for parallel) far exceed the constants in the computation (for sequential) Makes it relatively impractical This problem is probably better suited for a shared memory system.

2.2 Mandelbrot Set Def: set of points in the plane that are quasi-stable when computed by iterating the function: zk+1 = zk2+c Note: z is complex, and c is complex. The initial value for z is 0 Iterations continue until the magnitude of z is greater than 2 or the number of iterations reaches some limit

2.2 Mandelbrot Set

2.2 Mandelbrot Set: Sequential int cal_pixel(complex c) { int count, max; complex z; float temp, lengthsq; max = 256; z.real = 0; z.imag = 0; count = 0; /* number of iterations */ do { temp = z.real * z.real - z.imag * z.imag + c.real; z.imag = 2 * z.real * z.imag + c.imag; z.real = temp; lengthsq = z.real * z.real + z.imag * z.imag; count++; } while ((lengthsq < 4.0) && (count < max)); return count; }

2.2 Mandelbrot Set: Parallel Static Task Assignment – Master for (i = 0, row = 0; i < 48; i++, row = row + 10) /* for each process*/ send(&row, Pi); /* send row no.*/ for (i = 0; i < (480 * 640); i++) { /* from processes, any order */ recv(&c, &color, PANY); /*receive coords & colors */ display(c, color); /* display pixel on screen */ }

2.2 Mandelbrot Set: Slave recv(&row, Pmaster); /* receive row no. */ for (x = 0; x < disp_width; x++) /* screen coordinates x and y */ for (y = row; y < (row + 10); y++) { c.real = min_real + ((float) x * scale_real); c.imag = min_imag + ((float) y * scale_imag); color = cal_pixel(c); send(&c, &color, Pmaster);/* send coords, color to master */ }

2.2 Mandelbrot Set Dynamic Task Assignment Work Pool/Processor Farms

Master count = 0; /* counter for termination*/ row = 0; /* row being sent */ for (k = 0; k < procno; k++) { /* assuming procno<disp_height */ send(&row, Pk, data_tag); /* send initial row to process */ count++; /* count rows sent */ row++; /* next row */ } do { recv (&slave, &r, color, PANY, result_tag); count--; /* reduce count as rows received */ if (row < disp_height) { send (&row, Pslave, data_tag); /* send next row */ count++; } else send (&row, Pslave, terminator_tag); /* terminate */ rows_recv++; display (r, color); /* display row */ } while (count > 0);

2.2 Mandelbrot Set: Slave recv(y, Pmaster, ANYTAG, source_tag); /* receive 1st row to compute */ while (source_tag == data_tag) { c.imag = imag_min + ((float) y * scale_imag); for (x = 0; x < disp_width; x++) { /* compute row colors */ c.real = real_min + ((float) x * scale_real); color[x] = cal_pixel(c); } send(&i, &y, color, Pmaster, result_tag); /* row colors to master */ recv(y, Pmaster, source_tag); /* receive next row */ };

2.2 Mandelbrot Set

2.2 Mandelbrot Set: Analysis Sequential ts= max x n = O(n) Parallel Phase 1: Communication Out tcomm=s(tstartup+tdata) It could be possible to use a scatter routine here which would cut the number of startup times. Phase 2: Computation (max x n)/s Phase 3: Communication Back in tcomm2= (n/s)(tstartup+tdata) Overall tp <= (max x n)/s + (n/s + s)(tstartup+tdata)

2.3 Monte Carlo Methods The basis of Monte Carlo methods is the use of random selections in calculations. Example – Calculate p A circle is formed within a square The circle has unit radius Therefore the square has sides of length 2 Area of the square is 4

2.3 Monte Carlo Methods

2.3 Monte Carlo Methods The ratio of the area of the circle to the square is (p r2)/2x2 = p/4 Points within the square are chosen randomly and a score is kept of how many points happen to lie within the circle The fraction of points within the circle will be p/4, given a sufficient number of randomly selected samples.

2.3 Monte Carlo Methods Random Number Generation The most popular way of creating a pueudorandom number sequence: x1, x2, x3, …, xi-1, xi, xi+1, …, xn-1, xn, is by evaluating xi+1 from a carefully chosen function of xi, often of the form xi+1 = (axi + c) mod m where a, c, and m are constants chosen to create a sequence that has similar properties to truly random sequences.

2.3 Monte Carlo Methods Parallel Random Number Generation It turns out that xi+1 = (axi + c) mod m xi+k = (Axi + C) mod m where A = ak mod m, C = c(ak-1 + an-2 + … + a1 + a0) mod m, and k is a selected “jump” constant.

5. Summary This Chapter introduced the following concepts: An ideal embarrassingly parallel computation Embarrassingly parallel problems and analyses Partitioning a two-dimensional data set Work pool approach to achieve load balancing Counter termination algorithm Monte Carlo methods Parallel random number generation