Embarrassingly Parallel Geoffrey Fox Not embarrassing rather ideal He meant embarrassed to submit a paper No communication Except to distribute and gather the data Best speedup possibilities Linear and sometimes superlinear
Embarrassingly Parallel send recv send recv
Example Geometrical Image transformation Shift, scale, rotate, clip, etc. y’ = -x sin T + y cos T and x’ = x cos T + y sin T Transformation of a pixel is independent of other pixel transformations We must only consider the partitioning of the problem
Mandelbrot set for (x = 0; x < disp_width; x++) for (y = 0; y < disp_height; y++) { real = rmin + x * realscale imag = imin + y * imagscale color = CalcPixel(real, imag) display(x,y,color) }
Partitioning Schemes Pixel/process Row/process Column/process Formula?
Partitioning Schemes If (iproc == 0) for(p = 0, n = 0; n < nrows; nrows += nrows/nproc) send(pict[n], p++) Else recv(pict, 0) for(x = 0; x < nrows/nproc; x++) for(y = 0; y < ncols; y++) pict[x][y] = blah blah blah Etc.
Partitioning Schemes pixel/section per process? How balanced and efficient is the algorithm?
Partitioning Schemes Static partitioning is simple but … Could be imbalanced Also think about memory requirements Must it all fit on a single node?
Dynamic Partitioning Schemes Give work to processes as needed Generally Master-Worker
Master-Worker Pseudocode create work pool while (there is work) recv(message) if (message is request) send(work) else recv(answer) send poison pill to all workers Worker send (work request) recv(work) while (there is work) process work send(work results coming) send(results) send(work request)
Monte Carlo Methods Use of random/statistical properties to perform calculations Example: Compute pi Area of circle pi Area of square 4 Randomly pick points Determine if in or out of circle The fraction of points within will be pi/4
Monte Carlo Methods Area ratios can be inefficient Consider alternative for integrals sum=0 for(i = 0; i < N; i++) xr = random value between x1 and x2 sum += f(xr) area = sum/N
Parallel Monte Carlo Very simple Static Print out the answer Everybody computes N values & a sum Gather and sum up the sums Print out the answer Highly efficient But is it the best way?
Parallel Monte Carlo All Monte Carlo calculations depend on a uniform random number sequence. Choices for parallel Single random number process Everybody generate their own (be careful) Is the combination of P random number sequences necessarily uniform? How does it compare with what is done on a single processor?
Parallel Random Number Generation General form for random number calc. It can be shown that: k is jump constant A & C only need to be computed once
Parallel Random Number Generation Given k processors, generate the first k numbers sequentially using normal formula use each of these numbers to generate more p1 p2 p3 p4 x1 x5 x9 … x2 x6 x10 … x3 x7 x11 … x4 x8 x12 …