Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder Chapter 7: MPI and Other Local View Languages
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-2 Figure 7.1 An MPI solution to the Count 3s problem.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-3 Figure 7.1 An MPI solution to the Count 3s problem. (cont.)
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-4 Code Spec 7.1 MPI_Init().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-5 Code Spec 7.2 MPI_Finalize().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-6 Code Spec 7.3 MPI_Comm_Size().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-7 Code Spec 7.4 MPI_Comm_Rank().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-8 Code Spec 7.5 MPI_Send().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-9 Code Spec 7.6 MPI_Recv().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-10 Code Spec 7.7 MPI_Reduce().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-11 Code Spec 7.8 MPI_Scatter().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-12 Code Spec 7.8 MPI_Scatter(). (cont.)
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-13 Figure 7.2 Replacement code (for lines 16– 48 of Figure 7.1) to distribute data using a scatter operation.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-14 Code Spec 7.9 MPI_Gather().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-15 Figure 7.3 Each message must be copied as it moves across four address spaces, each contributing to the overall latency.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-16 Code Spec 7.10 MPI_Scan().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-17 Code Spec 7.11 MPI_Bcast(). MPI routine to broadcast data from one root process to all other processes in the communicator.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-18 Code Spec 7.12 MPI_Barrier().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-19 Code Spec 7.13 MPI_Wtime().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-20 Figure 7.4 Example of collective communication within a group.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-21 Code Spec 7.14 MPI_Comm_group().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-22 Code Spec 7.15 MPI_Group_incl().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-23 Code Spec 7.16 MPI_Comm_create().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-24 Figure 7.5 A 2D relaxation replaces—on each iteration—all interior values by the average of their four nearest neighbors.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-25 Figure 7.6 MPI code for the main loop of the 2D SOR computation.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-26 Figure 7.6 MPI code for the main loop of the 2D SOR computation. (cont.)
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-27 Figure 7.6 MPI code for the main loop of the 2D SOR computation. (cont.)
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-28 Figure 7.7 Depiction of dynamic work redistribution in MPI.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-29 Figure 7.8 A 2D SOR MPI program using non-blocking sends and receives.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-30 Figure 7.8 A 2D SOR MPI program using non-blocking sends and receives. (cont.)
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-31 Figure 7.8 A 2D SOR MPI program using non-blocking sends and receives. (cont.)
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-32 Code Spec 7.17 MPI_Waitall().
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 7-33 Figure 7.9 Creating a derived data type.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Partitioned Global Address Space Languages Higher level of abstraction Built on top of distributed memory clusters Considered a single address space Allows definition of global data structures Must consider local vs global data No longer consider message passing details or distributed data structures Use a more efficient one sided substrate 7-34
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Main PGAS Co-Array Fortran – Unified Parallel C – Titanium – 7-35
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Co-Array Fortran (CAF) Extends FORTRAN Originally called F - - Elegant and simple Uses co-array (communication array) Real, dimension (n,n)[p,*]:: a, b, c –a, b, c are co-arrays Memory for co-array is dist across each process determined by the dimension statement 7-36
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Unified Parallel C (UPC) Global view of address space Shared arrays are distributed in cyclic or block cyclic arrangement (aides load balancing) Supports pointers (C) 4 types –private private –shared private –private shared –shared shared 7-37
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C pointers Private pointer pointing locally –int *p1; Private pointer pointing to shared space –shared int *p2; Shared pointer pointing locally –int *shared p3; Shared pointer pointing into shared space –shared int *shared p4; 7-38
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley UPC Has a forall verb –upc_forall Distributes normal C for loop iterations across all processes A global operation whereas most other operations are local 7-39
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Titanium Extends java Object oriented Adds regions –Supports safe memory management Unordered iteration –Foreach –Allows concurrency over multiple indices in a block 7-40