An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams
Outline Disclaimers Disclaimers Overview of basic parallel programming on a cluster with the goals of MPI Overview of basic parallel programming on a cluster with the goals of MPI Batch system interaction Batch system interaction Startup procedures Startup procedures Quick review Quick review Blocking message passing Blocking message passing Non-blocking message passing Lab day Lab day Collective communications
Review Functions we have covered in detail: MPI_INITMPI_FINALIZE MPI_COMM_SIZE MPI_COMM_RANK MPI_SENDMPI_RECV Useful constants: MPI_COMM_WORLD MPI_ANY_SOURCE MPI_ANY_TAGMPI_SUCCESS
Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 SEND RECV SEND RECV SEND …
Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 1
Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 2
Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 3
Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 4
Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 5
Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 6
Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 7
Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 8
Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 9
Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 10!
Solution MPI_SENDRECV(sendbuf, sendcount, sendtype, dest, sendtag, recvbuf, recvcount, recvtype, source, recvtag, comm, status, ierror) The semantics of a send-receive operation is what would be obtained if the caller forked two concurrent threads, one to execute the send, and one to execute the receive, followed by a join of these two threads. The semantics of a send-receive operation is what would be obtained if the caller forked two concurrent threads, one to execute the send, and one to execute the receive, followed by a join of these two threads.
Nonblocking Message Passing Allows for the overlap of communication and computation. Completion of a message is broken into four steps instead of two. post-send post-send complete-send complete-send post-receive post-receive complete-receive complete-receive
Posting Operations MPI_ISEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, REQUEST, IERROR) IN BUF(*) IN BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, REQUEST OUT IERROR, REQUEST MPI_IRECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, REQUEST, IERROR) IN BUF(*) IN BUF(*) IN INTEGER, COUNT, DATATYPE, SOURCE, TAG, COMM, IN INTEGER, COUNT, DATATYPE, SOURCE, TAG, COMM, OUT IERROR, REQUEST OUT IERROR, REQUEST
Request Objects All nonblocking communications use request objects to identify communication operations and link the posting operation with the completion operation. Conceptually, they can be thought of as a pointer to a specific message instance floating around in MPI space. Just as in pointers, request handles must be treated with care or you can create request handle leaks (like a memory leak) and completely lose access to the status of a message.
Request Objects The value MPI_REQUEST_NULL is used to indicate an invalid request handle. Operations that deallocate request objects set the request handle to this value. Posting operations allocate memory for request objects and completion operations deallocate that memory and clean up the space.
Completion Operations MPI_WAIT(REQUEST, STATUS, IERROR) INOUT INTEGER REQUEST INOUT INTEGER REQUEST OUT STATUS, IERROR OUT STATUS, IERROR A call to MPI_WAIT returns when the operation identified by REQUEST is complete. MPI_WAIT is the blocking version of completion operations where the program has determined it can’t do any more useful work without completing the current message. In this case, it chooses to block until the corresponding send or receive completes. In iterative parallel code, it is often the case that an MPI_WAIT is placed directly before the next post operation that intends to use the same request object variable. Successful completion of the function MPI_WAIT will set REQUEST=MPI_REQUEST_NULL.
Completion Operations MPI_TEST(REQUEST, FLAG, STATUS, IERROR) INOUT INTEGER REQUEST INOUT INTEGER REQUEST OUT STATUS(MPI_STATUS_SIZE) OUT STATUS(MPI_STATUS_SIZE) OUT LOGICAL FLAG OUT LOGICAL FLAG A call to MPI_TEST returns flag=true if the operation identified by REQUEST is complete. MPI_TEST is the nonblocking version of completion operations. If flag=true then MPI_TEST will clean up the space associated with REQUEST, deallocating the memory and setting REQUEST = MPI_REQUEST_NULL. MPI_TEST allows the user to create code that can attempt to communicate as much as possible but continue doing useful work if messages are not ready.
Maximizing Overlap To achieve maximum overlap between computation and communication, communications should be started as soon as possible and completed as late as possible. Sends should be posted as soon as the data to be sent is available. Sends should be posted as soon as the data to be sent is available. Receives should be posted as soon as the receive buffer can be used. Receives should be posted as soon as the receive buffer can be used. Sends should be completed just before the send buffer is to be reused. Sends should be completed just before the send buffer is to be reused. Receives should be completed just before the data in the buffer is to be reused. Receives should be completed just before the data in the buffer is to be reused. Overlap can often be increased by reordering the computation.
Setting up your account for MPI exercise.html exercise.html List of 124 machine names:
More Stuff Note: to login the 124 linux machines from the outside world, you do "ssh rlogin.cslab.vt.edu". You will then be logged into one of the machines in the lab. Set up public/private key pair. You only have to do this once. It will allow you to launch mpi jobs from any of the McB 124 machines, and have them run on any of these machines, without having to type passwords. McB 124 machinesMcB 124 machines First, enter the command ssh-keygen -t dsa -N "" The result of this command will be something like this:: Generating public/private dsa key pair. Enter file in which to save the key (/home/ugrads/NAME/.ssh/id_dsa): Your identification has been saved in /home/ugrads/NAME/.ssh/id_dsa. Your public key has been saved in /home/ugrads/NAME/.ssh/id_dsa.pub. The key fingerprint is: 89:ff:00:5f:06:fd:d0:a2:9e:51:b1:00:cd:0a:76:6f First, enter the command ssh-keygen -t dsa -N "" The result of this command will be something like this:: Generating public/private dsa key pair. Enter file in which to save the key (/home/ugrads/NAME/.ssh/id_dsa): Your identification has been saved in /home/ugrads/NAME/.ssh/id_dsa. Your public key has been saved in /home/ugrads/NAME/.ssh/id_dsa.pub. The key fingerprint is: 89:ff:00:5f:06:fd:d0:a2:9e:51:b1:00:cd:0a:76:6f Then do this cd.ssh cp id_dsa.pub authorized_keys2 Then do this cd.ssh cp id_dsa.pub authorized_keys2 To make sure this step worked, try ssh'ing to another machine in the lab, e.g., "ssh strawberry". You should be able to do this without being prompted for a password To make sure this step worked, try ssh'ing to another machine in the lab, e.g., "ssh strawberry". You should be able to do this without being prompted for a password
Even More Stuff Put /home/staff/ribbens/mpich-1.2.6/bin in your path. Make a subdirectory, mkdir MPI, and cd to it. Hello world example Copy hello.c from /home/staff/ribbens/MPI. Copy hello.c from /home/staff/ribbens/MPI.hello.c Compile and link: mpicc -o hello hello.c Compile and link: mpicc -o hello hello.c Run on 4 processors: mpirun -np 4 hello Run on 4 processors: mpirun -np 4 hello Learn more about mpirun: mpirun -help Learn more about mpirun: mpirun -help