1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco.

Slides:



Advertisements
Similar presentations
The University of Adelaide, School of Computer Science
Advertisements

Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
MPI Collective Communications
Reference: / MPI Program Structure.
MPI_Gatherv CISC372 Fall 2006 Andrew Toy Tom Lynch Bill Meehan.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
SOME BASIC MPI ROUTINES With formal datatypes specified.
MPI Collective Communication CS 524 – High-Performance Computing.
Its.unc.edu 1 Derived Datatypes Research Computing UNC - Chapel Hill Instructor: Mark Reed
Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.
Lecture 6 CSS314 Parallel Computing
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
1 TRAPEZOIDAL RULE IN MPI Copyright © 2010, Elsevier Inc. All rights Reserved.
1 MPI Datatypes l The data in a message to sent or received is described by a triple (address, count, datatype), where l An MPI datatype is recursively.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
Lecture 5 MPI Introduction Topics …Readings January 19, 2012 CSCE 713 Advanced Computer Architecture.
The University of Adelaide, School of Computer Science
Chapter 6 Parallel Sorting Algorithm Sorting Parallel Sorting Bubble Sort Odd-Even (Transposition) Sort Parallel Odd-Even Transposition Sort Related Functions.
Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
1 Collective Communications. 2 Overview  All processes in a group participate in communication, by calling the same function with matching arguments.
The University of Adelaide, School of Computer Science
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
PP Lab MPI programming VI. Program 1 Break up a long vector into subvectors of equal length. Distribute subvectors to processes. Let them compute the.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Parallel Programming with MPI By, Santosh K Jena..
Lecture 5 Barriers and MPI Introduction Topics Barriers Uses implementations MPI Introduction Readings – Semaphore handout dropboxed January 24, 2012 CSCE.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
CS4230 CS4230 Parallel Programming Lecture 13: Introduction to Message Passing Mary Hall October 23, /23/2012.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco.
An Introduction to MPI (message passing interface)
L19: Putting it together: N-body (Ch. 6) November 22, 2011.
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Lecture 5 CSS314 Parallel Computing Book: “An Introduction to Parallel Programming” by Peter Pacheco
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
MPI Derived Data Types and Collective Communication
Message Passing Interface Using resources from
Lecture 7 CSS314 Parallel Computing Book: “An Introduction to Parallel Programming” by Peter Pacheco
Lecture 6 MPI Introduction Topics …Readings January 19, 2012 CSCE 713 Advanced Computer Architecture.
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Distributed Systems CS Programming Models- Part II Lecture 14, Oct 28, 2013 Mohammad Hammoud 1.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Introduction to MPI Programming Ganesh C.N.
CS4402 – Parallel Computing
MPI Message Passing Interface
Send and Receive.
Parallel Programming with MPI and OpenMP
An Introduction to Parallel Programming with MPI
Send and Receive.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Mary Hall November 3, /03/2011 CS4961.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Collective Communication in MPI and Advanced Features
Lecture 14: Inter-process Communication
CSCE569 Parallel Computing
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Message Passing Programming Based on MPI
Synchronizing Computations
5- Message-Passing Programming
Distributed Memory Programming with Message-Passing
MPI Message Passing Interface
Presentation transcript:

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco

2 Copyright © 2010, Elsevier Inc. All rights Reserved Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in MPI. Collective communication. MPI derived datatypes. Performance evaluation of MPI programs. Parallel sorting. Safety in MPI programs. # Chapter Subtitle

3 A distributed memory system Copyright © 2010, Elsevier Inc. All rights Reserved

4 A shared memory system Copyright © 2010, Elsevier Inc. All rights Reserved

5 Hello World! Copyright © 2010, Elsevier Inc. All rights Reserved (a classic)

6 Identifying MPI processes Common practice to identify processes by nonnegative integer ranks. p processes are numbered 0, 1, 2,.. p-1 Copyright © 2010, Elsevier Inc. All rights Reserved

7 Our first MPI program Copyright © 2010, Elsevier Inc. All rights Reserved

8 Compilation Copyright © 2010, Elsevier Inc. All rights Reserved mpicc -g -Wall -o mpi_hello mpi_hello.c wrapper script to compile turns on all warnings source file create this executable file name (as opposed to default a.out) produce debugging information

9 Execution Copyright © 2010, Elsevier Inc. All rights Reserved mpiexec -n mpiexec -n 1./mpi_hello mpiexec -n 4./mpi_hello run with 1 process run with 4 processes

10 Execution Copyright © 2010, Elsevier Inc. All rights Reserved mpiexec -n 1./mpi_hello mpiexec -n 4./mpi_hello Greetings from process 0 of 1 ! Greetings from process 0 of 4 ! Greetings from process 1 of 4 ! Greetings from process 2 of 4 ! Greetings from process 3 of 4 !

11 MPI Programs Written in C. Has main. Uses stdio.h, string.h, etc. Need to add mpi.h header file. Identifiers defined by MPI start with “MPI_”. First letter following underscore is uppercase. For function names and MPI-defined types. Helps to avoid confusion. Copyright © 2010, Elsevier Inc. All rights Reserved

12 MPI Components MPI_Init Tells MPI to do all the necessary setup. MPI_Finalize Tells MPI we’re done, so clean up anything allocated for this program. Copyright © 2010, Elsevier Inc. All rights Reserved

13 Basic Outline Copyright © 2010, Elsevier Inc. All rights Reserved

14 Communicators A collection of processes that can send messages to each other. MPI_Init defines a communicator that consists of all the processes created when the program is started. Called MPI_COMM_WORLD. Copyright © 2010, Elsevier Inc. All rights Reserved

15 Communicators Copyright © 2010, Elsevier Inc. All rights Reserved number of processes in the communicator my rank (the process making this call)

16 SPMD Single-Program Multiple-Data We compile one program. Process 0 does something different. Receives messages and prints them while the other processes do the work. The if-else construct makes our program SPMD. Copyright © 2010, Elsevier Inc. All rights Reserved

17 Communication Copyright © 2010, Elsevier Inc. All rights Reserved

18 Data types Copyright © 2010, Elsevier Inc. All rights Reserved

19 Communication Copyright © 2010, Elsevier Inc. All rights Reserved

20 Message matching Copyright © 2010, Elsevier Inc. All rights Reserved MPI_Send src = q MPI_Recv dest = r r q

21 Receiving messages A receiver can get a message without knowing: the amount of data in the message, the sender of the message, or the tag of the message. Copyright © 2010, Elsevier Inc. All rights Reserved

22 status_p argument Copyright © 2010, Elsevier Inc. All rights Reserved MPI_SOURCE MPI_TAG MPI_ERROR MPI_Status* MPI_Status* status; status.MPI_SOURCE status.MPI_TAG

23 How much data am I receiving? Copyright © 2010, Elsevier Inc. All rights Reserved

24 Issues with send and receive Exact behavior is determined by the MPI implementation. MPI_Send may behave differently with regard to buffer size, cutoffs and blocking. MPI_Recv always blocks until a matching message is received. Know your implementation; don’t make assumptions! Copyright © 2010, Elsevier Inc. All rights Reserved

25 TRAPEZOIDAL RULE IN MPI Copyright © 2010, Elsevier Inc. All rights Reserved

26 The Trapezoidal Rule Copyright © 2010, Elsevier Inc. All rights Reserved

27 The Trapezoidal Rule Copyright © 2010, Elsevier Inc. All rights Reserved

28 One trapezoid Copyright © 2010, Elsevier Inc. All rights Reserved

29 Pseudo-code for a serial program Copyright © 2010, Elsevier Inc. All rights Reserved

30 Parallelizing the Trapezoidal Rule 1.Partition problem solution into tasks. 2.Identify communication channels between tasks. 3.Aggregate tasks into composite tasks. 4.Map composite tasks to cores. Copyright © 2010, Elsevier Inc. All rights Reserved

31 Parallel pseudo-code Copyright © 2010, Elsevier Inc. All rights Reserved

32 Tasks and communications for Trapezoidal Rule Copyright © 2010, Elsevier Inc. All rights Reserved

33 First version (1) Copyright © 2010, Elsevier Inc. All rights Reserved

34 First version (2) Copyright © 2010, Elsevier Inc. All rights Reserved

35 First version (3) Copyright © 2010, Elsevier Inc. All rights Reserved

36 Dealing with I/O Copyright © 2010, Elsevier Inc. All rights Reserved Each process just prints a message.

37 Running with 6 processes Copyright © 2010, Elsevier Inc. All rights Reserved unpredictable output

38 Input Most MPI implementations only allow process 0 in MPI_COMM_WORLD access to stdin. Process 0 must read the data (scanf) and send to the other processes. Copyright © 2010, Elsevier Inc. All rights Reserved

39 Function for reading user input Copyright © 2010, Elsevier Inc. All rights Reserved

40 COLLECTIVE COMMUNICATION Copyright © 2010, Elsevier Inc. All rights Reserved

41 Tree-structured communication 1.In the first phase: (a) Process 1 sends to 0, 3 sends to 2, 5 sends to 4, and 7 sends to 6. (b) Processes 0, 2, 4, and 6 add in the received values. (c) Processes 2 and 6 send their new values to processes 0 and 4, respectively. (d) Processes 0 and 4 add the received values into their new values. 2.(a) Process 4 sends its newest value to process 0. (b) Process 0 adds the received value to its newest value. Copyright © 2010, Elsevier Inc. All rights Reserved

42 A tree-structured global sum Copyright © 2010, Elsevier Inc. All rights Reserved

43 An alternative tree-structured global sum Copyright © 2010, Elsevier Inc. All rights Reserved

44 MPI_Reduce Copyright © 2010, Elsevier Inc. All rights Reserved

45 Predefined reduction operators in MPI Copyright © 2010, Elsevier Inc. All rights Reserved

46 Collective vs. Point-to-Point Communications All the processes in the communicator must call the same collective function. For example, a program that attempts to match a call to MPI_Reduce on one process with a call to MPI_Recv on another process is erroneous, and, in all likelihood, the program will hang or crash. Copyright © 2010, Elsevier Inc. All rights Reserved

47 Collective vs. Point-to-Point Communications The arguments passed by each process to an MPI collective communication must be “compatible.” For example, if one process passes in 0 as the dest_process and another passes in 1, then the outcome of a call to MPI_Reduce is erroneous, and, once again, the program is likely to hang or crash. Copyright © 2010, Elsevier Inc. All rights Reserved

48 Collective vs. Point-to-Point Communications The output_data_p argument is only used on dest_process. However, all of the processes still need to pass in an actual argument corresponding to output_data_p, even if it’s just NULL. Copyright © 2010, Elsevier Inc. All rights Reserved

49 Collective vs. Point-to-Point Communications Point-to-point communications are matched on the basis of tags and communicators. Collective communications don’t use tags. They’re matched solely on the basis of the communicator and the order in which they’re called. Copyright © 2010, Elsevier Inc. All rights Reserved

50 Example (1) Copyright © 2010, Elsevier Inc. All rights Reserved Multiple calls to MPI_Reduce

51 Example (2) Suppose that each process calls MPI_Reduce with operator MPI_SUM, and destination process 0. At first glance, it might seem that after the two calls to MPI_Reduce, the value of b will be 3, and the value of d will be 6. Copyright © 2010, Elsevier Inc. All rights Reserved

52 Example (3) However, the names of the memory locations are irrelevant to the matching of the calls to MPI_Reduce. The order of the calls will determine the matching so the value stored in b will be = 4, and the value stored in d will be = 5. Copyright © 2010, Elsevier Inc. All rights Reserved

53 MPI_Allreduce Useful in a situation in which all of the processes need the result of a global sum in order to complete some larger computation. Copyright © 2010, Elsevier Inc. All rights Reserved

54 Copyright © 2010, Elsevier Inc. All rights Reserved A global sum followed by distribution of the result.

55 Copyright © 2010, Elsevier Inc. All rights Reserved A butterfly-structured global sum.

56 Broadcast Data belonging to a single process is sent to all of the processes in the communicator. Copyright © 2010, Elsevier Inc. All rights Reserved

57 Copyright © 2010, Elsevier Inc. All rights Reserved A tree-structured broadcast.

58 A version of Get_input that uses MPI_Bcast Copyright © 2010, Elsevier Inc. All rights Reserved

59 Data distributions Copyright © 2010, Elsevier Inc. All rights Reserved Compute a vector sum.

60 Serial implementation of vector addition Copyright © 2010, Elsevier Inc. All rights Reserved

61 Different partitions of a 12- component vector among 3 processes Copyright © 2010, Elsevier Inc. All rights Reserved

62 Partitioning options Block partitioning Assign blocks of consecutive components to each process. Cyclic partitioning Assign components in a round robin fashion. Block-cyclic partitioning Use a cyclic distribution of blocks of components. Copyright © 2010, Elsevier Inc. All rights Reserved

63 Data Layout When allocating ensemble memory, an array is divided into layers, where each layer is KP array elements, spread evenly across the P processors. Layer l=floor(i/KP) Processor p=floor((i mod KP)/K) Offset f=i mod K where i is the index offset of array.

64 Example (1/2) Show how to find an element of an array A[2:100] distributed over eight processors with a block size of five. Solution Element k will be found in layer floor((k- 2)/40), at processor floor(((k-2)mod 40)/5) with offset (k-2)mod 5.

65 Example (2/2)

66 Parallel implementation of vector addition Copyright © 2010, Elsevier Inc. All rights Reserved

67 Scatter MPI_Scatter can be used in a function that reads in an entire vector on process 0 but only sends the needed components to each of the other processes. Copyright © 2010, Elsevier Inc. All rights Reserved

68 Rdata [1,2,3,4,5,6,7,8] [1,2] MPI_Scatter int Sdata[8] = {1,2,3,4,5,6,7,8}, Rdata[2]; int Send_cnt = 2, Recv_cnt = 2, src = 0; MPI_Scatter( Sdata, Send_cnt, MPI_INTEGER, Rdata, Recv_cnt, MPI_INTEGER, src, MPI_COMM_WORLD); CPU 0 CPU 1 CPU 2 CPU 3 MPI_Scatter [1,2]Sdata [5,6] [3,4] [7,8]Rdata [3,4] [5,6] [7,8]

69 Reading and distributing a vector Copyright © 2010, Elsevier Inc. All rights Reserved

70 Gather Collect all of the components of the vector onto process 0, and then process 0 can process all of the components. Copyright © 2010, Elsevier Inc. All rights Reserved

71 MPI_Gather int Send_cnt = 2, Recv_cnt = 2, dest = 0; MPI_Gather ( Sdata, Send_cnt, MPI_INTEGER, Rdata, Recv_cnt, MPI_INTEGER, dest, MPI_COMM_WORLD); CPU 0 CPU 1 CPU 2 CPU 3 MPI_Gather Sdata [1,2] [3,4] [5,6] [7,8] [1,2,3,4,5,6,7,8] Rdata

72 Print a distributed vector (1) Copyright © 2010, Elsevier Inc. All rights Reserved

73 Print a distributed vector (2) Copyright © 2010, Elsevier Inc. All rights Reserved

74 Allgather Concatenates the contents of each process’ send_buf_p and stores this in each process’ recv_buf_p. As usual, recv_count is the amount of data being received from each process. Copyright © 2010, Elsevier Inc. All rights Reserved

75 Rdata MPI_Allgather int Send_cnt = 2, Recv_cnt = 8; MPI_Allgather ( Sdata, Send_cnt, MPI_INTEGER, Rdata, Recv_cnt, MPI_INTEGER, MPI_COMM_WORLD); CPU 0 CPU 1 CPU 2 CPU 3 MPI_Allgather Sdata [1,2] [3,4] [5,6] [7,8] [1,2,3,4,5,6,7,8]

76 MPI_Gatherv int dest = 0, Send_cnt = sizeof(Sbuf); int rc[4] = {3,1,2,2}, disp[4] = {0,3,4,6}; MPI_Gatherv ( Sbuf, Send_cnt, MPI_INTEGER, Rbuf, rc, disp, MPI_INTEGER, dest, MPI_COMM_WORLD); CPU 0 CPU 1 CPU 2 CPU 3 MPI_Gatherv Sdata [1,2,3] [4] [7,8] Rdata [1,2,3,4,5,6,7,8] [5,6]

77 MPI_Scatterv int MPI_Scatterv(const void *sendbuf, const int *sendcounts, const int *displs, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm) Copyright © 2010, Elsevier Inc. All rights Reserved

78 Matrix-vector multiplication Copyright © 2010, Elsevier Inc. All rights Reserved i-th component of y Dot product of the ith row of A with x.

79 Matrix-vector multiplication Copyright © 2010, Elsevier Inc. All rights Reserved

80 Multiply a matrix by a vector Copyright © 2010, Elsevier Inc. All rights Reserved Serial pseudo-code

81 C style arrays Copyright © 2010, Elsevier Inc. All rights Reserved stored as

82 Serial matrix-vector multiplication Copyright © 2010, Elsevier Inc. All rights Reserved

83 An MPI matrix-vector multiplication function (1) Copyright © 2010, Elsevier Inc. All rights Reserved

84 An MPI matrix-vector multiplication function (2) Copyright © 2010, Elsevier Inc. All rights Reserved

85 MPI DERIVED DATATYPES Copyright © 2010, Elsevier Inc. All rights Reserved

86 Derived datatypes Used to represent any collection of data items in memory by storing both the types of the items and their relative locations in memory. The idea is that if a function that sends data knows this information about a collection of data items, it can collect the items from memory before they are sent. Similarly, a function that receives data can distribute the items into their correct destinations in memory when they’re received. Copyright © 2010, Elsevier Inc. All rights Reserved

87 Derived datatypes Formally, consists of a sequence of basic MPI data types together with a displacement for each of the data types. Trapezoidal Rule example: Copyright © 2010, Elsevier Inc. All rights Reserved

88 MPI_Type create_struct Builds a derived datatype that consists of individual elements that have different basic types. Copyright © 2010, Elsevier Inc. All rights Reserved

89 MPI_Get_address Returns the address of the memory location referenced by location_p. The special type MPI_Aint is an integer type that is big enough to store an address on the system. Copyright © 2010, Elsevier Inc. All rights Reserved

90 MPI_Type_commit Allows the MPI implementation to optimize its internal representation of the datatype for use in communication functions. Copyright © 2010, Elsevier Inc. All rights Reserved

91 MPI_Type_free When we’re finished with our new type, this frees any additional storage used. Copyright © 2010, Elsevier Inc. All rights Reserved

92 Get input function with a derived datatype (1) Copyright © 2010, Elsevier Inc. All rights Reserved

93 Get input function with a derived datatype (2) Copyright © 2010, Elsevier Inc. All rights Reserved

94 Get input function with a derived datatype (3) Copyright © 2010, Elsevier Inc. All rights Reserved

95 MPI Datatype An MPI data is recursively defined as: Predefined, corresponding to a data type from the language (e.g., MPI_INT, MPI_DOUBLE_PRECISION) A strided block of datatypes A contiguous array of MPI datatypes An indexd array of blocks of datatypes An arbitrary structure of datatypes There are MPI functions to construct custom datatypes, such an array of (int, float) pairs, or a row of a matrix stored columnwise.

96 MPI_Type_contiguous Include a fixed amount of continuous array which have the same data type. int err, count; MPI_Datatype oldtype, newtype; err = MPI_Type_contiguous( count, oldtype, &newtype) ; countThe size of array. oldtypeThe old data type. newtypeThe new data type.

97 MPI_Type_contiguous int count = 3; MPI_Datatype newtype; err = MPI_Type_contiguous( count, MPI_INTEGER, &newtype); Old: New: INTEGER 012 element 0 element 1 element 2

98 MPI_Type_vector Include a fixed size of interval of discontinuous array which have the same data type. int err, count, blocklength, stride; MPI_Datatype oldtype, newtype; err = MPI_Type_vector( count, blocklength, stride, oldtype, &newtype); countThe amount of block. blocklengthThe amount of data with old data type at a block. strideThe distance of block, and using old data type as unit. oldtypeThe old data type. newtypeThe new data type.

99 MPI_Type_vector int count = 2, blocklength = 2, stride = 3; MPI_Datatype newtype; MPI_Type_vector( count, blocklength, stride, MPI_INTEGER, &newtype); Old: New: INTEGER element 0 element 1

100 MPI_Type_indexed Include an any distance and discontinuous array which have the same data type. int err, count, length[], disp[]; MPI_Datatype oldtype, newtype; err = MPI_Type_indexed( count, length, disp, oldtype, &newtype); countThe amount of block. lengthThe amount of data with old data type at a block. dispThe location of block, and using old data type as unit. oldtypeThe old data type. newtypeThe new data type.

101 MPI_Type_indexed int count = 3, length[3] = {1,3,2}, disp[3] = {0,3,7}; MPI_Datatype newtype; MPI_Type_indexed( count, length, disp, MPI_INTEGER, &newtype); Old: New: INT element 0 element 1 element 2

102 MPI_Datatype_struct Any combination of data types. int err, count, length[]; MPI_Aint disp[]; MPI_Datatype oldtype[], newtype; err = MPI_Type_struct( count, length, disp, oldtype, &newtype); countThe amount of block. lengthThe amount of data with old data type at a block. dispThe location of block, and using type as unit. oldtypeThe old data types. newtypeThe new data type.

103 MPI_Datatype_struct int count = 2, length[2] = {2,4}, disp[2] = {0, extent(MPI_INTEGER)*2}; MPI_Datatype oldtype[2] = {MPI_INTEGER, MPI_DOUBLE}, newtype; MPI_Type_struct( count, length, disp, oldtype, &newtype); Old: New: INTDouble INT Double element 0 element 1

104 MPI_Type_extent For using to know the memory size of a specifies data type. MPI_Datatype type; MPI_Aint extent; int err = MPI_Type_extent( type, &extent); typeData type. extentThe memory size of an data type unit.

105 PERFORMANCE EVALUATION Copyright © 2010, Elsevier Inc. All rights Reserved

106 Elapsed parallel time Returns the number of seconds that have elapsed since some time in the past. Copyright © 2010, Elsevier Inc. All rights Reserved

107 Elapsed serial time In this case, you don’t need to link in the MPI libraries. Returns time in microseconds elapsed from some point in the past. Copyright © 2010, Elsevier Inc. All rights Reserved

108 Elapsed serial time Copyright © 2010, Elsevier Inc. All rights Reserved

109 MPI_Barrier Ensures that no process will return from calling it until every process in the communicator has started calling it. Copyright © 2010, Elsevier Inc. All rights Reserved

110 MPI_Barrier Copyright © 2010, Elsevier Inc. All rights Reserved

111 Run-times of serial and parallel matrix-vector multiplication Copyright © 2010, Elsevier Inc. All rights Reserved (Seconds)

112 Speedup Copyright © 2010, Elsevier Inc. All rights Reserved

113 Efficiency Copyright © 2010, Elsevier Inc. All rights Reserved

114 Speedups of Parallel Matrix- Vector Multiplication Copyright © 2010, Elsevier Inc. All rights Reserved

115 Efficiencies of Parallel Matrix- Vector Multiplication Copyright © 2010, Elsevier Inc. All rights Reserved

116 Scalability A program is scalable if the problem size can be increased at a rate so that the efficiency doesn’t decrease as the number of processes increase. Copyright © 2010, Elsevier Inc. All rights Reserved

117 Scalability Programs that can maintain a constant efficiency without increasing the problem size are sometimes said to be strongly scalable. Programs that can maintain a constant efficiency if the problem size increases at the same rate as the number of processes are sometimes said to be weakly scalable. Copyright © 2010, Elsevier Inc. All rights Reserved

118 A PARALLEL SORTING ALGORITHM Copyright © 2010, Elsevier Inc. All rights Reserved

119 Sorting n keys and p = comm sz processes. n/p keys assigned to each process. No restrictions on which keys are assigned to which processes. When the algorithm terminates: The keys assigned to each process should be sorted in (say) increasing order. If 0 ≤ q < r < p, then each key assigned to process q should be less than or equal to every key assigned to process r. Copyright © 2010, Elsevier Inc. All rights Reserved

120 Serial bubble sort Copyright © 2010, Elsevier Inc. All rights Reserved

121 Odd-even transposition sort A sequence of phases. Even phases, compare swaps: Odd phases, compare swaps: Copyright © 2010, Elsevier Inc. All rights Reserved

122 Example Start: 5, 9, 4, 3 Even phase: compare-swap (5,9) and (4,3) getting the list 5, 9, 3, 4 Odd phase: compare-swap (9,3) getting the list 5, 3, 9, 4 Even phase: compare-swap (5,3) and (9,4) getting the list 3, 5, 4, 9 Odd phase: compare-swap (5,4) getting the list 3, 4, 5, 9 Copyright © 2010, Elsevier Inc. All rights Reserved

123 Serial odd-even transposition sort Copyright © 2010, Elsevier Inc. All rights Reserved

124 Communications among tasks in odd-even sort Copyright © 2010, Elsevier Inc. All rights Reserved Tasks determining a[i] are labeled with a[i].

125 Parallel odd-even transposition sort Copyright © 2010, Elsevier Inc. All rights Reserved

126 Pseudo-code Copyright © 2010, Elsevier Inc. All rights Reserved

127 Compute_partner Copyright © 2010, Elsevier Inc. All rights Reserved

128 Safety in MPI programs The MPI standard allows MPI_Send to behave in two different ways: it can simply copy the message into an MPI managed buffer and return, or it can block until the matching call to MPI_Recv starts. Copyright © 2010, Elsevier Inc. All rights Reserved

129 Safety in MPI programs Many implementations of MPI set a threshold at which the system switches from buffering to blocking. Relatively small messages will be buffered by MPI_Send. Larger messages, will cause it to block. Copyright © 2010, Elsevier Inc. All rights Reserved

130 Safety in MPI programs If the MPI_Send executed by each process blocks, no process will be able to start executing a call to MPI_Recv, and the program will hang or deadlock. Each process is blocked waiting for an event that will never happen. Copyright © 2010, Elsevier Inc. All rights Reserved (see pseudo-code)

131 Safety in MPI programs A program that relies on MPI provided buffering is said to be unsafe. Such a program may run without problems for various sets of input, but it may hang or crash with other sets. Copyright © 2010, Elsevier Inc. All rights Reserved

132 MPI_Ssend An alternative to MPI_Send defined by the MPI standard. The extra “s” stands for synchronous and MPI_Ssend is guaranteed to block until the matching receive starts. Copyright © 2010, Elsevier Inc. All rights Reserved

133 Restructuring communication Copyright © 2010, Elsevier Inc. All rights Reserved

134 MPI_Sendrecv An alternative to scheduling the communications ourselves. Carries out a blocking send and a receive in a single call. The dest and the source can be the same or different. Especially useful because MPI schedules the communications so that the program won’t hang or crash. Copyright © 2010, Elsevier Inc. All rights Reserved

135 MPI_Sendrecv Copyright © 2010, Elsevier Inc. All rights Reserved

136 Safe communication with five processes Copyright © 2010, Elsevier Inc. All rights Reserved

137 Parallel odd-even transposition sort Copyright © 2010, Elsevier Inc. All rights Reserved

138 Run-times of parallel odd-even sort Copyright © 2010, Elsevier Inc. All rights Reserved (times are in milliseconds)

139 Concluding Remarks (1) MPI or the Message-Passing Interface is a library of functions that can be called from C, C++, or Fortran programs. A communicator is a collection of processes that can send messages to each other. Many parallel programs use the single- program multiple data or SPMD approach. Copyright © 2010, Elsevier Inc. All rights Reserved

140 Concluding Remarks (2) Most serial programs are deterministic: if we run the same program with the same input we’ll get the same output. Parallel programs often don’t possess this property. Collective communications involve all the processes in a communicator. Copyright © 2010, Elsevier Inc. All rights Reserved

141 Concluding Remarks (3) When we time parallel programs, we’re usually interested in elapsed time or “wall clock time”. Speedup is the ratio of the serial run-time to the parallel run-time. Efficiency is the speedup divided by the number of parallel processes. Copyright © 2010, Elsevier Inc. All rights Reserved

142 Concluding Remarks (4) If it’s possible to increase the problem size (n) so that the efficiency doesn’t decrease as p is increased, a parallel program is said to be scalable. An MPI program is unsafe if its correct behavior depends on the fact that MPI_Send is buffering its input. Copyright © 2010, Elsevier Inc. All rights Reserved