Parallel Programming C. Ferner & B. Wilkinson, 2014 Introduction to Message Passing Interface (MPI) Introduction 9/4/2012 1.

Slides:

Advertisements

Similar presentations

MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.

Advertisements

CS 140: Models of parallel programming: Distributed memory and MPI.

Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.

Reference: / MPI Program Structure.

Tutorial on MPI Experimental Environment for ECE5610/CSC

High Performance Computing

Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.

CS 240A: Models of parallel programming: Distributed memory and MPI.

Message-Passing Programming and MPI CS 524 – High-Performance Computing.

Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.

Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)

12b.1 Introduction to Message-passing with MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.

Message Passing Interface. Message Passing Interface (MPI) Message Passing Interface (MPI) is a specification designed for parallel applications. The.

1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.

Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.

ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.

Parallel & Cluster Computing MPI Basics Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy, Contra Costa College.

Director of Contra Costa College High Performance Computing Center

2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.

1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.

2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 14, 2013.

9-2.1 “Grid-enabling” applications Part 2 Using Multiple Grid Computers to Solve a Single Problem MPI © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid.

An Introduction to Parallel Programming and MPICH Nikolaos Hatzopoulos.

Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.

CS 240A Models of parallel programming: Distributed memory and MPI.

Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.

Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.

Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.

CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.

Message Passing Programming Model AMANO, Hideharu Textbook pp. １４０－１４７.

Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.

MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.

Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.

Parallel Programming with MPI By, Santosh K Jena..

MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©

CSCI-455/522 Introduction to High Performance Computing Lecture 4.

1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.

CS4230 CS4230 Parallel Programming Lecture 13: Introduction to Message Passing Mary Hall October 23, /23/2012.

Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.

Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.

Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.

An Introduction to MPI (message passing interface)

1 HPCI Presentation Kulathep Charoenpornwattana. March 12, Outline Parallel programming with MPI Running MPI applications on Azul & Itanium Running.

Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.

Project18 Communication Design + Parallelization Camilo A Silva BIOinformatics Summer 2008.

3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.

Message Passing Interface Using resources from

MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.

1 Programming distributed memory systems Clusters Distributed computers ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 6, 2015.

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.

Introduction to parallel computing concepts and technics

Message-Passing Computing

Using Paraguin to Create Parallel Programs

Introduction to MPI.

MPI Message Passing Interface

Introduction to Message Passing Interface (MPI)

CS 5334/4390 Spring 2017 Rogelio Long

Lecture 14: Inter-process Communication

Pattern Programming Tools

Hybrid Parallel Programming

Introduction to Parallel Computing with MPI

Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes

Hello, world in MPI #include <stdio.h> #include "mpi.h"

Hello, world in MPI #include <stdio.h> #include "mpi.h"

MPI Message Passing Interface

Some codes for analysis and preparation for programming

Presentation transcript:

Parallel Programming C. Ferner & B. Wilkinson, 2014 Introduction to Message Passing Interface (MPI) Introduction 9/4/2012 1

Parallel Programming C. Ferner & B. Wilkinson, 2014 Shared-Memory Systems Processor Bus interface Processor Bus interface Processor Bus interface Processor Bus interface Memory controller Memory Processor/ memory us Shared memory All processors can access all of the shared memory 9/4/2012 2

Parallel Programming C. Ferner & B. Wilkinson, 2014 For processors that share the same memory, if one processor computes a value or values needed other processors, they need to synchronize Processors that need the data must wait for the processor that computes it However, all processors can access the same memory so nothing else is required. 9/4/2012 3

Parallel Programming C. Ferner & B. Wilkinson, 2014 Distributed-Memory Systems Processor Interconnection network Local Computers Messages memory For clusters each processor cannot access the memory of other processors. Memory is private. 9/4/2012 4

Parallel Programming C. Ferner & B. Wilkinson, 2014 If one processor computes a value or values needed other processors, it has to transmit that data This requires message-passing All data is private 9/4/2012 5

Parallel Programming C. Ferner & B. Wilkinson, 2014 MPI (Message Passing Interface) Widely adopted message passing library standard. MPI-1 finalized in 1994, MPI-2 in 1996, MPI-3 in 2012 Process-based -- processes communicate between themselves with messages. Point-to-point and collectively. A specification, not an implementation. Several free implementations exist, OpenMPI, MPICH, Large number of routines: MPI routines, MPI routines, MPI routines, but typically only a few used. C and Fortran bindings (C++ removed from MPI-3) Originally for distributed systems but now used for all types, clusters, shared memory, hybrid. 6

Parallel Programming C. Ferner & B. Wilkinson, 2014 Message-Passing Sockets (very low level) Parallel Virtual Machine (created by Oak Ridge National Lab) MPI (Message Passing Interface) - standard  LAM  MPICH  OpenMPI (we use this one) All of these are libraries that can be used from within a C program 9/4/2012 7

Parallel Programming C. Ferner & B. Wilkinson, 2014 To begin MPI_Init() – Initializes the processes and gets them ready to run. Should be the first executable statement in the program MPI_Finalize() – cleans up after the parallel program is done. Should be the last executable statement in the program Although all of the processes exist before and after the Init and Finalize, it is convenient to think of MPI_Init() as when all the processors are created and MPI_Finialize() is when they are destroyed 9/4/2012 8

Parallel Programming C. Ferner & B. Wilkinson, 2014 To begin Process 0 Process 1Process 2Process 3Process 0 MPI_Init() MPI_Finalize() Time 9/4/2012 9

Parallel Programming C. Ferner & B. Wilkinson, 2014 To begin int main(int argc, char **argv) { MPI_Init(&argc, &argv); MPI_Finalize(); } 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 To begin MPI_Init takes two arguments &argc and &argv, which are the arguments of main. MPI_Finalize takes no arguments. Each processor is assigned a rank in the range 0 ≤ rank < NP, where NP is the number of processes being used. 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Other Useful Functions To determine how many processes there are: MPI_Comm_size(MPI_COMM_WORLD, &NP); To determine the current process’ rank among all the processes: MPI_Comm_rank(MPI_COMM_WORLD, &myrank); Each processor is given a unique rank in the range 0 ≤ rank < NP 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 How is the number of processes determined? When you run your MPI program, you can specify how many processes you want: $ mpirun –np 8 The –np option tells mpirun to run your parallel program using the specified number of processes. OR $ mpiexec –n 8 The –n option tells mpiexec to run your parallel program using the specified number of processes. 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 How can these be used? #include int main(int argc, char **argv) { int NP, myrank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &NP); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); printf("Hello world from rank %d of %d.\n", myrank, NP); MPI_Finalize(); } 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Compiling and Running $ mpicc hello.c –o hello $ mpirun –np 5 hello Hello world from rank 0 of 5. Hello world from rank 3 of 5. Hello world from rank 4 of 5. Hello world from rank 1 of 5. Hello world from rank 2 of 5. $ mpicc is essentially gcc but makes sure that the MPI libraries are included Why are the statements not in order of rank? 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 All processes are executing the same code (although asynchronously) How can one have them execute separate code? Or how can one have a section of code executed by only one process? 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Use their rank if (myrank >=0 && myrank < x) { … // Code executed by a subset of processes } OR if (myrank == 0) { … // Code executed by only one process } else { … // Code executed by all other processes } 9/4/ Client/Server model

Parallel Programming C. Ferner & B. Wilkinson, 2014 Sending messages Messages can be sent between processes using the MPI_Send() and MPI_Recv() functions. These are one-to-one communication MPI_Recv is blocking, meaning that execution will stop until the appropriate message is received. There are other non-blocking forms of communication as well as one-to-many and many-to-many. 9/4/

Parallel Programming C. Ferner & B. Wilkinson, Message passing concept using library routines Note each computer executes its own program

Parallel Programming C. Ferner & B. Wilkinson, 2014 Sending messages int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) buf is the address of the data to send count is the number of elements (1 if scalar, N if an array, or strlen+1 if a string) datatype is the type of elements dest is the rank of the destination 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Sending messages int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) tag is user-defined (allows you to mark different message with your own tag). This is useful when two processors are sending multiple messages between each other. Comm is what is know as a Communicator. Basically, it is a subset of processors. MPI_COMM_WORLD is used for all processors. 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Receiving messages int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) buf is the address in which to store the message count is the size of the buf. Can be bigger than the actual message. datatype is the type of elements source is the rank of the sender 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Receiving messages int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) tag is user-defined Comm is the Communicator. MPI_COMM_WORLD is used for all processors. status is a structure that contains information about the transmission 9/4/

24 Parameters of blocking send MPI_Send(buf, count, datatype, dest, tag, comm) Address of send buffer Number of items to send Datatype of each item Rank of destination process Message tag Communicator Notice a pointer Parameters of blocking receive Status after operation MPI_Recv(buf, count, datatype, src, tag, comm, status) Address of receive buffer Maximum number of items to receive Datatype of each item Rank of source process Message tag Communicator In our code we do not check status but good programming practice to do so. Usually send and recv counts are the same.

Parallel Programming C. Ferner & B. Wilkinson, 2014 MPI Datatypes (defined in mpi.h) MPI datatypes MPI_BYTE MPI_PACKED MPI_CHAR MPI_SHORT MPI_INT MPI_LONG MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE MPI_UNSIGNED_CHAR

Parallel Programming C. Ferner & B. Wilkinson, 2014 Example (Hello World 2) #include main(int argc, char **argv ) { char message[256]; int i,rank, NP, tag=99; char machine_name[256]; MPI_Status status; 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Example (Hello World 2) MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &NP); MPI_Comm_rank(MPI_COMM_WORLD, &rank); gethostname(machine_name, 255); 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Example (Hello World 2) if(rank == 0) { printf ("Hello world from master process %d running on %s\n", rank, machine_name); for (i = 1; i < NP; i++) { MPI_Recv(message, 256, MPI_CHAR, i, tag, MPI_COMM_WORLD, &status); printf("Message from process = %d : %s\n", i, message); } 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Example (Hello World 2) else { sprintf(message, "Hello world from process %d running on %s", rank, machine_name); // The destination is the master process (rank 0) MPI_Send(message, strlen(message) + 1, MPI_CHAR, 0, tag, MPI_COMM_WORLD); } MPI_Finalize(); } 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Result $ mpirun –np 9./hello Hello world from master process 0 running on compute-0-0.local Message from process = 1 : Hello world from process 1 running on compute-0-0.local Message from process = 2 : Hello world from process 2 running on compute-0-0.local Message from process = 3 : Hello world from process 3 running on compute-0-0.local Message from process = 4 : Hello world from process 4 running on compute-0-0.local Message from process = 5 : Hello world from process 5 running on compute-0-0.local Message from process = 6 : Hello world from process 6 running on compute-0-0.local Message from process = 7 : Hello world from process 7 running on compute-0-0.local Message from process = 8 : Hello world from process 8 running on compute-0-1.local $ 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Any source or tag In the MPI_Recv, the source can be MPI_ANY_SOURCE The tag can be MPI_ANY_TAG These cause the Recv to take any message destined for the current process regardless of the source and/or regardless of the tag Ex. MPI_Recv(message, 256, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status); 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Another Example (array) int array[100]; … // rank 0 fills the array with data if (rank == 0) MPI_Send (array, 100, MPI_INT, 1, 0, MPI_COMM_WORLD); else if (rank == 1) MPI_Recv(array, 100, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); destination source tag Number of Elements 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Another Example (scalar) int N; … // rank 2 assigns a value to N, which is needed by // rank 3 if (rank == 2) MPI_Send (&N, 1, MPI_INT, 3, 5, MPI_COMM_WORLD); else if (rank == 3) MPI_Recv(&N, 1, MPI_INT, 2, 5, MPI_COMM_WORLD, &status); dest source tag Number of elements for a scalar is only 1 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Another Example (Ring) In the ring example, each process (except the master) receives a token from the process with rank 1 less than its own rank Then each process increments the token and sends it to the next process (with rank 1 more than its own) The last process sends the token to the master 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Another Example (Ring) Each process (excepts the master) receives a token from the process with rank 1 less than its own rank. Then each process increments the token by 2 and sends it to the next process (with rank 1 more than its own). The last process sends the token to the master 35 Slide based upon slides from C. Ferner, UNC-W Question: Do we have pattern for this?

Parallel Programming C. Ferner & B. Wilkinson, 2014 Another Example (Ring) #include int main (int argc, char *argv[]) { int token, NP, myrank; MPI_Status status; MPI_Init (&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &NP); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Another Example (Ring) if (myrank != 0) { // Everyone except the master receives from the processor 1 less // than its own rank. MPI_Recv(&token, 1, MPI_INT, myrank - 1, 0, MPI_COMM_WORLD, &status); printf("Process %d received token %d from process %d\n", myrank, token, myrank - 1); 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Another Example (Ring) } else { // The master sets the initial value before sending. token = -1; } token += 2; MPI_Send(&token, 1, MPI_INT, (myrank + 1) % NP, 0, MPI_COMM_WORLD); 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Another Example (Ring) // Now process 0 can receive from the last process. if (myrank == 0) { MPI_Recv(&token, 1, MPI_INT, NP - 1, 0, MPI_COMM_WORLD, &status); printf("Process %d received token %d from process %d\n", myrank, token, NP - 1); } MPI_Finalize(); } 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Results (Ring) Process 1 received token 1 from process 0 Process 2 received token 3 from process 1 Process 3 received token 5 from process 2 Process 4 received token 7 from process 3 Process 5 received token 9 from process 4 Process 6 received token 11 from process 5 Process 7 received token 13 from process 6 Process 0 received token 15 from process 7 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Send and Receive Semantics The MPI_Send and MPI_Recv are locally blocking They return after local actions are complete  MPI_Send returns after data has been copied into a buffer, message prepared and on its way  MPI_Recv returns after data has been received and buffer copied into user data (blocks until message arrives) There are other variations of send and receive (more on this later) 41 9/4/2012

Parallel Programming C. Ferner & B. Wilkinson, 2014 Matching up sends and recvs Notice in code how you have to be very careful matching up send’s and recv’s. Every send must have matching recv. The sends return after local actions complete but the recv will wait for the message so easy to get deadlock if written wrong Pre-implemented patterns are designed to avoid deadlock. We will look at deadlock again 2.42

Parallel Programming C. Ferner & B. Wilkinson, 2014 Measuring the Execution Time double MPI_Wtime( void ) Returns a double that is the number of seconds since epoch date (e.g. January 1, 1970). double start_time, end_time, elapsed_time; … start_time = MPI_Wtime(); … end_time = MPI_Wtime(); elapsed_time = end_time – start_time; Measure time to execute this section 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Measuring the Execution Time Alternatively, one can use gettimeofday (a system call to the operating system) #include double elapsed_time; struct timeval tv1, tv2; gettimeofday(&tv1, NULL); … gettimeofday(&tv2, NULL); elapsed_time = (tv2.tv_sec - tv1.tv_sec) + ((tv2.tv_usec - tv1.tv_usec) / ); Measure time to execute this section This is useful if you are creating a sequential (non-MPI) version with which to compare. 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Executing program on multiple computers Usually computers specified in a file containing names of computers and possibly number of processes that should run on each computer. Then specify file with –machines option with mpiexec (or –hostfile or –f options). Implementation-specific algorithm selects computers from list to run user processes. Typically MPI would cycle through list in round robin fashion. If a machines file not specified, a default machines file used or it may only run on a single computer. 2.45

Parallel Programming C. Ferner & B. Wilkinson, 2014 Cluster at UNCW User Master node Compute nodes Dedicated Cluster Ethernet interface Switch Computers Compute Nodes: compute-0-0, compute-0- 1, compute-0-2, … Head Node: harpua Submit Host: babbage 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Cluster at UNCW We use the Sun Grid Engine (SGE) to schedule jobs on the cluster This is to allow users to have exclusive use of the compute nodes so that users’ applications don’t interfere with the performance of others The scheduler (SGE) is responsible for allocating compute nodes to jobs exclusively Compile as normal: $ mpicc hello.c –o hello 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 SGE But running is done through a job submission file (or job description file) Some SGE commands:  qsub – submits a job to the schedule to run  qstat – see the status of submitted jobs (waiting, queued, running, terminated, etc.)  qdel - deletes a job (by number) from the system  qhost – see a list of hosts 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 SGE Example job submission file (hello.sge): #!/bin/sh # Usage: qsub hello.sge #$ -S /bin/sh #$ -pe orte 16 # Specify how many processors we want # -- our name --- #$ -N Hello # Name for the job #$ -l h_rt=00:01:00 # Request 1 minute to execute #$ -cwd # Make sure that the.e and.o file arrive in the working directory #$ -j y # Merge the standard out and standard error to one file mpirun -np $NSLOTS./hello 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 SGE Example job submission file (hello.sge): #!/bin/sh # Usage: qsub hello.sge #$ -S /bin/sh #$ -pe orte 16 # Specify how many processors we want 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 SGE Example job submission file (hello.sge): # -- our name --- #$ -N Hello # Name for the job #$ -l h_rt=00:01:00 # Request 1 minute to execute The name of the job plus the name of the output files: Hello.o### and Hello.op### Indicates that the job will need only a minute. This is important so that SGE will clean up if the program hangs or terminates incorrectly. May need to increase the time for longer programs or it will terminate the program before it has completed. 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 SGE Example job submission file (hello.sge): #$ -cwd # Make sure that the.e and.o file arrive in the working directory #$ -j y # Merge the standard out and standard error to one file Do the job in the current directory SGE will create 3 files: Hello.o##, Hello.e##, and Hello.op##. The –j y command will merge the Hello.o and Hello.e files (std out and error). 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 SGE Example job submission file (hello.sge): mpirun -np $NSLOTS./hello And finally the command to run the MPI program. $NSLOTS is the same number given with the #$ -pe orte 16 line. 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 SGE Example $ qstat $ qsub hello.sge Your job 106 ("Hello") has been submitted $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID Hello cferner qw 09/04/ :08:38 16 $ The state of “qw” means queued and waiting. 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 SGE Example $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID Hello cferner r 09/04/ :11:43 16 mpi_assign]$ The state of “r” means running 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 SGE Example $ ls hello hello.c Hello.o106 Hello.po106 hello.sge ring ring.c ring.sge test test.c test.sge $ cat Hello.o106 Hello world from master process 0 running on compute-0-2.local Message from process = 1 : Hello world from process 1 running on compute-0-2.local Message from process = 2 : Hello world from process 2 running on compute-0-2.local … You will want to clean up the output files when you are done with them or you will end up with a bunch of clutter. 9/4/

Parallel Programming C. Ferner & B. Wilkinson, 2014 Deleting a job $ qstat job-ID prior name user state submit/start at queue slots ja-task-ID Hello cferner qw 09/04/ :18:20 16 $ qdel 108 cferner has registered the job 108 for deletion $ qstat $ 9/4/

2.58 Executing program on UNCC cluster On UNCC cci-gridgw.uncc.edu cluster, mpiexec command is mpiexec.hydra. Internal compute nodes have names used just internally. For example, a machines file to use nodes 5, 7 and 8 and the front node of the cci-grid0x cluster would be: cci-grid05 cci-grid07 cci-grid08 cci-gridgw.uncc.edu Then: mpiexec.hydra -machinefile machines -n 4./prog would run prog with four processes, one on cci-grid05, one on cci-grid07, one on cci-grid08, and one on cci- gridgw.uncc.edu.

2.59 Specifying number of processes to execute on each computer Machines file can include how many processes to execute on each computer. For example: # a comment cci-grid05:2# first 2 processes on 05 cci-grid07:3# next 3 processes on 07 cci-grid08:4# next 4 processes on 08 cci-gridgw.uncc.edu:1# Last process on gridgw (09) 10 processes in total. Then: mpiexec.hydra -machinefile machines -n 10./prog If more processes were specified, they would be scheduled in round robin fashion.

2.60 Eclipse IDE PTP Parallel Tools Platform plug-in Supports development of parallel programs (MPI, OpenMP). Possible to edit and execute MPI program on client or a remote machine. Eclipse-PTP installed on the course virtual machine. Hope to explore Eclipse-PTP in assignments.

2.61 Visualization Tools Programs can be watched as they are executed in a space-time diagram (or process-time diagram): Process 1 Process 2 Process 3 Time Computing Waiting Message-passing system routine Message Visualization tools available for MPI, e.g., Upshot.

Parallel Programming C. Ferner & B. Wilkinson, 2014 Questions? 9/4/