Embarrassingly Parallel

Slides:



Advertisements
Similar presentations
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Practical techniques & Examples
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Computer Science 320 Parallel Computing Design Patterns.
COMPE472 Parallel Computing Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations.
Embarrassingly Parallel (or pleasantly parallel) Domain divisible into a large number of independent parts. Minimal or no communication Each processor.
Parallelism. Why need Parallelism?  Faster, of course Finish the work earlier  Same work in less time Do more work  More work in the same time.
Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.
Statistics.
Characteristics of Embarrassingly Parallel Computations Easily parallelizable Little or no interaction between processes Can give maximum speedup.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.
Determine whether the sequence 6, 18, 54, is geometric. If it is geometric, find the common ratio. Choose the answer from the following :
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Students: Deaconescu Ionut Albu Alexandru
1 CS4402 – Parallel Computing Lecture 7 Parallel Graphics – More Fractals Scheduling.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Advanced Topics NP-complete reports. Continue on NP, parallelism.
Performance Evaluation of Parallel Processing. Why Performance?
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
Monte Carlo Methods.
Definitions Speed-up Efficiency Cost Diameter Dilation Deadlock Embedding Scalability Big Oh notation Latency Hiding Termination problem Bernstein’s conditions.
Embarrassingly Parallel Computations processes …….. Input data results Each process requires different data and produces results from its input without.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Author: B. C. Bromley Presented by: Shuaiyuan Zhou Quasi-random Number Generators for Parallel Monte Carlo Algorithms.
Embarrassingly Parallel Computations Partitioning and Divide-and-Conquer Strategies Pipelined Computations Synchronous Computations Asynchronous Computations.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
CSCI-455/552 Introduction to High Performance Computing Lecture 9.
Algorithm Discovery and Design Objectives: Interpret pseudocode Write pseudocode, using the three types of operations: * sequential (steps in order written)
Master/Workers Model Example Research Computing UNC - Chapel Hill Instructor: Mark Reed
High-Performance Computing 12.2: Parallel Algorithms.
Embarrassingly Parallel (or pleasantly parallel) Characteristics Domain divisible into a large number of independent parts. Little or no communication.
Parallelizing Sobel Edge Detection Message-Passing, Shared-Memory, and Streaming Implementations Patrick Griffin, Levente Jakob, and James Psota.
Computer Science 320 Reduction. Estimating π Throw N darts, and let C be the number of darts that land within the circle quadrant of a unit circle Then,
Distributed and Parallel Processing George Wells.
Hashing (Message Digest) Hello There Hashing (Message Digest)
CEng 713, Evolutionary Computation, Lecture Notes parallel Evolutionary Computation.
Embarrassingly Parallel Computations
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Chaos Theory The theory of non-linear functions, such that small differences in the input of the function can result in large and unpredictable differences.
Squashed How many times must the transformation given by the matrix
Parallel Graph Algorithms
ChaNGa: Design Issues in High Performance Cosmology
Embarrassingly Parallel (or pleasantly parallel)
Parallel Programming in C with MPI and OpenMP
Lecture 7: Image alignment
FP1 Matrices Transformations
Course Outline Introduction in algorithms and applications
Chaos Theory The theory of non-linear functions, such that small differences in the input of the function can result in large and unpredictable differences.
CSC4005 – Distributed and Parallel Computing
CS 584.
COMP60621 Fundamentals of Parallel and Distributed Systems
By Brandon, Ben, and Lee Parallel Computing.
Monte Carlo Methods A so-called “embarrassingly parallel” computation as it decomposes into obviously independent tasks that can be done in parallel without.
Introduction to High Performance Computing Lecture 7
Parallel Techniques • Embarrassingly Parallel Computations
Embarrassingly Parallel Computations
Parallel Algorithm Models
Complexity Measures for Parallel Computation
Objectives Compare linear, quadratic, and exponential models.
COMP60611 Fundamentals of Parallel and Distributed Systems
Lesson 9: Basic Monte Carlo integration
Image Stitching Linda Shapiro ECE/CSE 576.
Image Stitching Linda Shapiro ECE P 596.
Presentation transcript:

Embarrassingly Parallel Geoffrey Fox Not embarrassing  rather ideal He meant embarrassed to submit a paper No communication Except to distribute and gather the data Best speedup possibilities Linear and sometimes superlinear

Embarrassingly Parallel send recv send recv

Example Geometrical Image transformation Shift, scale, rotate, clip, etc. y’ = -x sin T + y cos T and x’ = x cos T + y sin T Transformation of a pixel is independent of other pixel transformations We must only consider the partitioning of the problem

Mandelbrot set for (x = 0; x < disp_width; x++) for (y = 0; y < disp_height; y++) { real = rmin + x * realscale imag = imin + y * imagscale color = CalcPixel(real, imag) display(x,y,color) }

Partitioning Schemes Pixel/process Row/process Column/process Formula?

Partitioning Schemes If (iproc == 0) for(p = 0, n = 0; n < nrows; nrows += nrows/nproc) send(pict[n], p++) Else recv(pict, 0) for(x = 0; x < nrows/nproc; x++) for(y = 0; y < ncols; y++) pict[x][y] = blah blah blah Etc.

Partitioning Schemes pixel/section per process? How balanced and efficient is the algorithm?

Partitioning Schemes Static partitioning is simple but … Could be imbalanced Also think about memory requirements Must it all fit on a single node?

Dynamic Partitioning Schemes Give work to processes as needed Generally Master-Worker

Master-Worker Pseudocode create work pool while (there is work) recv(message) if (message is request) send(work) else recv(answer) send poison pill to all workers Worker send (work request) recv(work) while (there is work) process work send(work results coming) send(results) send(work request)

Monte Carlo Methods Use of random/statistical properties to perform calculations Example: Compute pi Area of circle pi Area of square 4 Randomly pick points Determine if in or out of circle The fraction of points within will be pi/4

Monte Carlo Methods Area ratios can be inefficient Consider alternative for integrals sum=0 for(i = 0; i < N; i++) xr = random value between x1 and x2 sum += f(xr) area = sum/N

Parallel Monte Carlo Very simple Static Print out the answer Everybody computes N values & a sum Gather and sum up the sums Print out the answer Highly efficient But is it the best way?

Parallel Monte Carlo All Monte Carlo calculations depend on a uniform random number sequence. Choices for parallel Single random number process Everybody generate their own (be careful) Is the combination of P random number sequences necessarily uniform? How does it compare with what is done on a single processor?

Parallel Random Number Generation General form for random number calc. It can be shown that: k is jump constant A & C only need to be computed once

Parallel Random Number Generation Given k processors, generate the first k numbers sequentially using normal formula use each of these numbers to generate more p1 p2 p3 p4 x1 x5 x9 … x2 x6 x10 … x3 x7 x11 … x4 x8 x12 …