MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator Department of Computer Science Iowa State University.

1 MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator Department of Computer Science Iowa State University

2 Purpose To give you: … an overview of some new system-level MPI functions … access to tools that you need to compile and run MPI jobs … some instruction in the creation and use of Makefiles … some instruction on how to tell time in C programs.

3 Makefiles

4 GNU Make  Enables the end user to build and install a package without worrying about the details.  Automatically figures out which files it needs to update based on which source files have changed.  Not language dependent  Not limited to building a package; can be used to install or uninstall

5 Makefile Rules A rule tells Make how to execute a series of commands in order to build a target from source files. Specifies a list of dependancies Dependancies should include ALL files that are dependancies for a target target: dependancies.... commands...

6 Example Makefile for C Source CC=gcc CFLAGS=-Wall INCLUDES= BINARIES=rand test.SUFFIXES:.c.o.c.o: $(CC) $(CFLAGS) -c $*.c all: $(BINARIES) rand.o: rand.c test.o: test.c rand: rand.o $(CC) $(CFLAGS) -o rand rand.o test: test.o $(CC) $(CFLAGS) -o test test.o clean: rm -f a.out core *.o $(BINARIES)

7 Example Makefile for C Source CC=gcc CFLAGS=-Wall INCLUDES= BINARIES=rand test Variables  CC is set to use the GCC compiler For MPI programs, set it to mpicc, not gcc  CFLAGS: -c:compile -Wall:set warnings to all

8 Example Makefile for C Source Target “clean”. Use by typing  make clean Rule states:  In my current directory, run: rm -f a.out core *.o $(BINARIES) rm -f a.out core *.o rand test clean: rm -f a.out core *.o $(BINARIES)

9 Example Makefile for C Source Makefile instruction on how to handle.c files and turn them into object (.o) files  Compile using $(CC) value with $(CFLAGS)  Compile each individual file into its appropriate.o file.SUFFIXES:.c.o.c.o: $(CC) $(CFLAGS) -c $*.c

10 Example Makefile for C Source Target: rand or test  Run $(CC) $(CFLAGS) -o rand rand.o  gcc -Wall -o rand rand.o If you were going to include external libraries to link, they would be linked at the end of the rule. rand.o:rand.c test.o: test.c rand:rand.o $(CC) $(CFLAGS) -o rand rand.o test:test.o $(CC) $(CFLAGS) -o test test.o

11 Random Matrix Generation

12 Random Generator for Matrices Rand  -f:filename to which to write the matrix  -c:number of matrix columns  -r:number of matrix rows  -h:help documentation  -s:seed  -m:max integer in matrix cells

13 Random Generator for Matrices Completely random generation for an m by n matrix Uses a random seed to create the matrix Output file  First line contains the number of rows and the number of columns  Subsequent lines contain matrix cell values, one per line.

14 Random Generator for Matrices For a Matrix with row length m, cell A[i,j] is on line:  m * i + j + 2  Lines are not zero-indexed for the purpose of this calculation. Therefore, for a 5 x 5 matrix (zero-indexed):  A[0, 0] is on line 2  A[0, 1] is on line 3  A[4, 4] is on line 26  A[2, 3] is on line 15

15 Calculating Run Time in C

16 Calculating Running Time in C #include int main() { struct timeval begin, end; double time; gettimeofday(&begin, NULL); sleep(10); gettimeofday(&end, NULL); time = (end.tv_sec - begin.tv_sec) +((end.tv_usec - begin.tv_usec) / 1000000.0); printf("This program ran for %f seconds\n", time); return 0; }

17 C Time Includes seconds and microseconds Used by the gettimeofday() system call gettimeofday()  Returns the number of seconds (and microseconds) since the UNIX Epoch Is this completely accurate?  No, but it's VERY close (within a few microseconds).

18 C Time You MUST use the timeval struct for the gettimeofday() call On UNIX systems, you need to include sys/time.h to use this. Calculation of time is: (end seconds – begin seconds) + ((end microseconds – begin microseconds) / 1000000) You can calculate:  Program run time  Algorithm execution time

19 Using the PBS Job Submission System

20 PBS (Torque/Maui) hpc-class job submission system qsub All queues are managed by the scheduler. PBS scripts can be created at: 

21 Example script #!/bin/csh #PBS -o BATCH_OUTPUT #PBS -e BATCH_ERRORS #PBS -lvmem=256Mb,pmem=256Mb,mem=256Mb,nodes=16:ppn=2,cput=2:00:00,walltime=1:00:00 # Change to directory from which qsub was executed cd $PBS_O_WORKDIR time mpirun -np 32

22 PBS Variables -l (resources)  vmem: total virtual memory  pmem: per task memory  mem: total aggregate memory  nodes – total number of nodes  ppn – processors per node  cput – CPU time  walltime – total time for all CPUs

23 PBS Variables vmem = pmem = mem total CPUs = nodes * ppn cput = walltime * ppn

24 PBS (Torque/Maui) Based on the previous script  BATCH_OUTPUT contains the output from the batch job  BATCH_ERRORS contains the error information from the batch job

25 Some other important information Max CPU – 32 for classwork Max memory – 2.0 GB Max swap – 2.0 GB Short queue -  4 nodes per job; 16 total CPUs  1 hour per job  2 total jobs per user

26 MPI Blocking vs. Non-Blocking Communication

27 MPI Communication Blocking Communication:  MPI_Send  MPI_Recv MPI_Send → Basic blocking send operation. Routine returns only after the application buffer in the sending task is free for reuse. MPI_Recv → Receive a message and block until the requested data is available in the application buffer in the receiving task.

28 MPI Communication Non-blocking Communication  MPI_Isend | MPI_Irecv  MPI_Wait | MPI_Test MPI_Isend → Identifies an area in memory to serve as a send buffer. Processing continues without waiting for the message to be copied out from the buffer. MPI_Irecv → Identifies an area in memory to serve as a receive buffer. Processing continues immediately without waiting for the message to be received and copied into the the buffer. MPI_Test → check the status of a non-blocking send or receive MPI_Wait → block until a specified non-blocking send or receive operation has completed

29 Why non-blocking communication? In some cases, it can increase performance. If there is an expensive operation you need to do, it helps speed up the program  Disk I/O  Heavy processing on already received data BE CAREFUL!!!  If you try to access a buffer when it isn't there, your program WILL fail.

30 int main (int argc, char **argv) { int myRank; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); if (myRank == 0) master(); else slave(); MPI_Finalize(); return 0; } int master() { int i, size, my_answer = 0, their_work = 0; MPI_Status status; MPI_Comm_size(MPI_COMM_WORLD, &size); for (i = 1; i < size; i++) { MPI_Recv ( &their_work, 1, MPI_INT, i, TAG, MPI_COMM_WORLD, &status); my_answer += their_work; } printf("The answer is: %d\n", my_answer); return 0; }

31 int slave() { int i, myRank, size, namelength, work = 0; char name[MPI_MAX_PROCESSOR_NAME]; MPI_Comm_rank(MPI_COMM_WORLD, &myRank); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Get_processor_name(name, &namelength); printf("[%s]: Adding the nubmers %d to %d = ", name, (100 / (size-1)) * (myRank-1) + 1, (100 / (size-1)) * myRank); for (i = (100 / (size-1)) * (myRank-1) + 1; i <= myRank * (100 / (size-1)); i++) { work = work + i; } printf("%d\n", work); MPI_Send(&work, 1, MPI_INT, 0, TAG, MPI_COMM_WORLD); return 0; }

