COMPE472 Parallel Computing Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations.

Slides:

Advertisements

Similar presentations

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Advertisements

Practical techniques & Examples

Grid Computing, B. Wilkinson, C Program Command Line Arguments A normal C program specifies command line arguments to be passed to main with:

Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.

1 Parallel Parentheses Matching Plus Some Applications.

Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.

CSCI-455/552 Introduction to High Performance Computing Lecture 11.

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.

4.1 Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M.

Image Processing A brief introduction (by Edgar Alejandro Guerrero Arroyo)

Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.

Partitioning and Divide-and-Conquer Strategies Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.

Embarrassingly Parallel (or pleasantly parallel) Domain divisible into a large number of independent parts. Minimal or no communication Each processor.

Computational Physics Lecture 4 Dr. Guy Tel-Zur.

CSE 160/Berman Programming Paradigms and Algorithms W+A 3.1, 3.2, p. 178, 6.3.2, H. Casanova, A. Legrand, Z. Zaogordnov, and F. Berman, "Heuristics.

EECC756 - Shaaban #1 lec # 8 Spring Message-Passing Computing Examples Problems with a very large degree of parallelism: –Image Transformations:

Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.

Machine Learning CUNY Graduate Center Lecture 7b: Sampling.

Lesson2 Point-to-point semantics Embarrassingly Parallel Examples.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.

Characteristics of Embarrassingly Parallel Computations Easily parallelizable Little or no interaction between processes Can give maximum speedup.

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

Lec 5 Feb 10 Goals: analysis of algorithms (continued) O notation summation formulas maximum subsequence sum problem (Chapter 2) three algorithms image.

COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.

Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Separate multivariate observations

1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.

Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Simulation of Random Walk How do we investigate this numerically? Choose the step length to be a=1 Use a computer to generate random numbers r i uniformly.

Parallelism and Robotics: The Perfect Marriage By R.Theron,F.J.Blanco,B.Curto,V.Moreno and F.J.Garcia University of Salamanca,Spain Rejitha Anand CMPS.

Load Balancing and Termination Detection Load balance : - statically before the execution of any processes - dynamic during the execution of the processes.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Scientific Computing Lecture 5 Dr. Guy Tel-Zur Autumn Colors, by Bobby Mikul, Mikul Autumn Colors, by Bobby Mikul,

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Ch 9 Infinity page 1CSC 367 Fractals (9.2) Self similar curves appear identical at every level of detail often created by recursively drawing lines.

Definitions Speed-up Efficiency Cost Diameter Dilation Deadlock Embedding Scalability Big Oh notation Latency Hiding Termination problem Bernstein’s conditions.

Embarrassingly Parallel Computations processes …….. Input data results Each process requires different data and produces results from its input without.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.

Review on Graphics Basics. Outline Polygon rendering pipeline Affine transformations Projective transformations Lighting and shading From vertices to.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

CSCI-455/552 Introduction to High Performance Computing Lecture 9.

8.2 Trigonometric (Polar) Form of Complex Numbers.

October 16, 2014Computer Vision Lecture 12: Image Segmentation II 1 Hough Transform The Hough transform is a very general technique for feature detection.

MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory

Embarrassingly Parallel (or pleasantly parallel) Characteristics Domain divisible into a large number of independent parts. Little or no communication.

CSCI-455/552 Introduction to High Performance Computing Lecture 21.

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University

Suzaku Pattern Programming Framework (a) Structure and low level patterns © 2015 B. Wilkinson Suzaku.pptx Modification date February 22,

Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1.

Embarrassingly Parallel Computations

Embarrassingly Parallel (or pleasantly parallel)

Embarrassingly Parallel

Monte Carlo Methods A so-called “embarrassingly parallel” computation as it decomposes into obviously independent tasks that can be done in parallel without.

Introduction to High Performance Computing Lecture 7

Parallel Techniques • Embarrassingly Parallel Computations

Embarrassingly Parallel Computations

Parallel Programming in C with MPI and OpenMP

Introduction to High Performance Computing Lecture 8

Presentation transcript:

COMPE472 Parallel Computing Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations Load Balancing and Termination Detection Parallel Techniques 3.1

COMPE472 Parallel Computing Chapter 3 Embarrassingly Parallel Computations 3.2

COMPE472 Parallel Computing Embarrassingly Parallel Computations A computation that can obviously be divided into a number of completely independent parts, each of which can be executed by a separate process(or). No communication or very little communication between processes Each process can do its tasks without any interaction with other processes 3.3

COMPE472 Parallel Computing Practical embarrassingly parallel computation with static process creation and master-slave approach 3.4

COMPE472 Parallel Computing Practical embarrassingly parallel computation with dynamic process creation and master-slave approach 3.5

COMPE472 Parallel Computing Embarrassingly Parallel Computation Examples Low level image processing Mandelbrot set Monte Carlo Calculations 3.6

COMPE472 Parallel Computing Low level image processing Many low level image processing operations only involve local data with very limited if any communication between areas of interest. Grayscale: 8 bits to represent 256 color scales An array of 8 bits may represent a figure (bitmap) 3.7

COMPE472 Parallel Computing Some geometrical operations Shifting (Kaydırma) Object shifted by  x in the x-dimension and  y in the y- dimension: x = x +  x y = y +  y where x and y are the original and x and y are the new coordinates. Scaling (Ölçekleme) Object scaled by a factor S x in x-direction and S y in y- direction: x = xS x y = yS y 3.8

COMPE472 Parallel Computing Rotation (Döndürme) Object rotated through an angle q about the origin of the coordinate system: x = x cos  + y sin  y = -x sin  + y cos  Input: A bitmap representation of a figure (array) Parallelesm: Bitmap can be divided into pixel groups with the assumption that #pixels > #processors 3.8

COMPE472 Parallel Computing Partitioning into regions for individual processes Square region for each process (can also use strips) 80x80x48 3.9

COMPE472 Parallel Computing 1 Master + 48 slaves 10

COMPE472 Parallel Computing Parallel Shifting 1.Distribute first row number to slaves 2.Slaves do the shifting for the rows assigned and sends the shifted bitmap back to master 3.Master collect and displays the shifted figure

COMPE472 Parallel Computing Algorithm par_shift Given a bitmap map[ ][ ] produce shifted bitmap in temp_map[ ][ ] MASTER: for (i=0, row=0; i<48; i++, row=row+10) send(row,pi); for (i=0; i<480; i++) for (j=0; j<640; j++) temp_map[i][j] = 0; for (i=0; i <640*480; i++){ recv(oldrow, oldcol,newrow,newcol,Pany); if ! (newrow =480||newcol =640 ) temp_map[newrow][newcol] = map[oldrow][oldcol];

COMPE472 Parallel Computing SLAVE: recv(row,pmaster); for (oldrow=row;oldrow<row+10;oldrow++) for (oldcol=0;oldcol<640;oldcol++) { newrow = oldrow + delta_x newcol = oldcol + delta_y; send(oldrow,oldcol, newrow, newcol,Pmaster); }

COMPE472 Parallel Computing Complexity analysis For each pixel 2 computational steps are required Serial computation t s = 2 n 2 = O(n 2 ) Parallel computation –t p = t comm + t comp –t comm = p (t startup + t data )+ n 2 (t startup + 4 t data ) = O(p+ n 2 ) –t comp = 2(n 2 /p) = O(n 2 /p) –t comp = O(n 2 ), since p is a constant

COMPE472 Parallel Computing Mandelbrot Set Set of points in a complex plane that are quasi-stable (will increase and decrease, but not exceed some limit) when computed by iterating the function where z k +1 is the (k + 1)th iteration of the complex number z = a + bi and c is a complex number giving position of point in the complex plane. The initial value for z is zero. Iterations continued until magnitude of z is greater than 2 or number of iterations reaches arbitrary limit. Magnitude of z is the length of the vector given by 3.10

COMPE472 Parallel Computing Sequential routine computing value of one point returning number of iterations structure complex { float real; float imag; }; int cal_pixel(complex c) { int count, max; complex z; float temp, lengthsq; max = 256; z.real = 0; z.imag = 0; count = 0; /* number of iterations */ do { temp = z.real * z.real - z.imag * z.imag + c.real; z.imag = 2 * z.real * z.imag + c.imag; z.real = temp; lengthsq = z.real * z.real + z.imag * z.imag; count++; } while ((lengthsq < 4.0) && (count < max)); return count; } 3.11

COMPE472 Parallel Computing Mandelbrot set 3.12

COMPE472 Parallel Computing Parallelizing Mandelbrot Set Computation Static Task Assignment Simply divide the region in to fixed number of parts, each computed by a separate processor. Not very successful because different regions require different numbers of iterations and time. 3.13

COMPE472 Parallel Computing Static-Master Master For (i=0, row=0;i<48; i++,row=row+10) Send(&row,Pi); For (i=0;i<480*640;i++){ Recv(&c, &color,Pany); Display(c, color); }

COMPE472 Parallel Computing Static-Slave Slave (Process i) Recv(&row, Pmaster); For (x=0, x<disp_width;x++) For (y=row;y<row+10;y++){ c.real=min_real+x*scale_real; c.imag=min_imag+y*scale_imag; Color=cal_pixel(c); Send(&c, &color, Pmaster); }

COMPE472 Parallel Computing Dynamic Task Assignment Have processor request regions after computing previous regions

COMPE472 Parallel Computing Dynamic Task Assignment Work Pool/Processor Farms 3.14

COMPE472 Parallel Computing Work pool Master count=0; row=0; for (k=0, k<num_proc;k++){ send(&row,Pk, data_tag); count++; row++;} do{ recv(&slave,&r,&color,Pany,result_tag); count--; if (row<disp_height){ send(&row,Pslave,data_tag); row++; count++;} else send(&row,Pslave,terminator_tag); display(r, color); }while(count>0);

COMPE472 Parallel Computing Slave recv(y,Pmaster,source_tag); while(source_tag==data_tag){ c.imag=imag_min+((float)y*scale_imag); for(x=0;x<disp_width;x++){ c.real=real_min+((float)x*scale_real); color[x]=cal_pixel(c); } send(&x, &y, color, Pmaster,result_tag); recv(&y,Pmaster,source_tag); }

COMPE472 Parallel Computing Monte Carlo Methods Another embarrassingly parallel computation. Monte Carlo methods use of random selections. 3.15

COMPE472 Parallel Computing Circle formed within a 2 x 2 square. Ratio of area of circle to square given by: Points within square chosen randomly. Score kept of how many points happen to lie within circle. Fraction of points within the circle will be, given a sufficient number of randomly selected samples. 3.16

COMPE472 Parallel Computing 3.17

COMPE472 Parallel Computing Computing an Integral One quadrant can be described by integral Random pairs of numbers, (x r,y r ) generated, each between 0 and 1. Counted as in circle if 3.18

COMPE472 Parallel Computing Alternative (better) Method Use random values of x to compute f(x) and sum values of f(x): where x r are randomly generated values of x between x 1 and x 2. Monte Carlo method very useful if the function cannot be integrated numerically (maybe having a large number of variables) 3.19

COMPE472 Parallel Computing Example Computing the integral Sequential Code sum = 0; for (i = 0; i < N; i++) { /* N random samples */ xr = rand_v(x1, x2); /* generate next random value */ sum = sum + xr * xr - 3 * xr; /* compute f(xr) */ } area = (sum / N) * (x2 - x1); Routine randv(x1, x2) returns a pseudorandom number between x1 and x

COMPE472 Parallel Computing 3.21 Parallel Monte Carlo (MASTER) for (i=0, i=N/n; i++){ /* N samples for (j=0; j<n; j++) xr[j] = rand(); /*send n randoms recv(Pany, req_tag, Psource) send(xr, &n, Psource, compute_tag) } for (i=0; i<num_slaves; i++){ recv(Pi, req_tag); send(Pi, stop_tag); sum = 0; reduce_add(&sum, Pgroup);

COMPE472 Parallel Computing SLAVES sum = 0; send(Pmaster, req_tag); recv(xr, &n, Pmaster, source_tag); while (source_tag == compute_tag){ for (i=0; i<n; i++) sum = sum + xr[i]*xr[i] – 3*xr[i]; send(Pmaster, req_tag); recv(xr, &n, Pmaster, source_tag); }; reduce_add(sum, Pgroup);

COMPE472 Parallel Computing Assignment 4 Part I: Implement the Sequential Mandelbrot algorithm discussed in the lecture. Due date: