Supplementary Slides S.1 Empirical Study of Parallel Programs Measuring execution time Visualizing execution trace Debugging Optimization strategies
Supplementary Slides S.2 Empirical Study of Parallel Programs (cont’d) Objective –An initiation into empirical analysis of parallel programs –By example – number summation Basis for coursework Outcome: Ability to –Follow same steps to measure simple parallel programs –Explore the detail functionalities of the tools –Get better insight into and explain behavior of parallel programs –Optimize parallel programs –Use similar tools for program measurements
Supplementary Slides S.3 Homework Contract Requirements –A number generator program –Assemble and compile Hw program –Instrument Hw program with MPI timing functions –A file management script Deliverables –Speedup (and linear speedup) graph plots (on same page) showing # processors against problem size –A file of raw execution times of the form: Data size # processors Execution time –Jumpshot visualization graphs –A report explaining your work especially the instrumentation, the speedup graphs and the Jumpshot graphs
Supplementary Slides S.4 Execution Time: Number Generator Program main(int argc, char **argv) { int i; FILE *fp; if (argc != 4) { printf("randFile filename #ofValues powerOfTwo\n"); return -1; } srand(clock()); fp = fopen(argv[1],"w"); if (fp == NULL) return -1; fprintf(fp,"%d\n",atoi(argv[2])); for (i=0; i<atoi(argv[2]); i++) fprintf(fp,"%d\n",rand()%(int)pow(2,atoi(argv[3]))); fclose(fp); }
Supplementary Slides S.5 Number Generator: Compiling & Running Compiling Running Should generate >4 groups of #s of different sizes: 1000,5000,10000,15000,20000 etc
Supplementary Slides S.6 Number Generator: A Helper Script for var in do./genRandom data$var.txt $var 16; done;
Supplementary Slides S.7 Sample MPI program #include “mpi.h” #include #define MAXSIZE 1000 void main(int argc, char *argv[]){ int myid, numprocs; int data[MAXSIZE], i, x, low, high, myresult, result; char fn[255]; char *fp; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if (myid == 0) { /* Open input file and initialize data */ strcpy(fn,getenv(“HOME”)); strcat(fn,”/MPI/rand_data.txt”); if ((fp = fopen(fn,”r”)) == NULL) { printf(“Can’t open the input file: %s\n\n”, fn); exit(1); } for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]); } … Summation Program
Supplementary Slides S.8 Sample MPI program … MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD); /* broadcast data */ x = n/nproc; /* Add my portion Of data */ low = myid * x; high = low + x; for(i = low; i < high; i++) myresult += data[i]; printf(“I got %d from %d\n”, myresult, myid); /* Compute global sum */ MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf(“The sum is %d.\n”, result); MPI_Finalize(); } Summation Program (cont’d)
Supplementary Slides S.9 Summation Program: Instrumentation Place your instrumentation code carefully You need to justify placement of such code MPI_Wtime() –Returns an elapsed (wall clock) time on the calling processor MPI_Wtick() –returns, as a double precision value, the number of seconds between successive clock ticks. –For example, if the clock is implemented by the hardware as a counter that is incremented every millisecond, the value returned by MPI_WTICK should be 10 -3
Supplementary Slides S.10 Summation Program: Compiling & Running Recompile for different data size, or Take the data size & input file dynamically Sample script: for var1 in forData1000 forData5000 forData10000 forData15000 forData20000 do for var2 in do mpirun -np $var2 $var1; done;
Supplementary Slides S.11 Jumpshot: Visualizing execution trace Jumpshot is a graphical tool for investigating the behavior of parallel programs. –Implemented in Java (Jumpshot can run as Applet) It is a ``post-mortem'' analyzer –Inputs a logfile of time-stamped events The file is written by the companion package CLOG Jumpshot can present multiple views of logfile data. –Per process timelines - the primary view showing with colored bars the state of each process at each time. –State duration histograms view –``mountain range'' view showing the aggregate number of processes in each state at each time.
Supplementary Slides S.12 Visualizing Program Execution Other logfile-based tools with similar features : –Commercial tools include TimeScan and Vampir –Academic tools include ParaGraph TraceView XPVM XMPI Pablo
Supplementary Slides S.13 Linking with Logging Libraries Generating log files: –Compile your MPI code and link using the -mpilog flag: bash-2.04$ mpicc -c numbersSummation.c bash-2.04$ mpicc -o numbersSummation numbersSummation.o –mpilog –Check file names associated with the compiled program bash-2.04$ ls numbersSummation* numbersSummation numbersSummation.o numbersSummation.xls numbersSummation.c numbersSummation.txt
Supplementary Slides S.14 Linking with Logging Libraries (cont’d) Generating log files: –Run the MPI program: bash-2.04$ mpirun -np 8 numbersSummation I got from 0 The sum is Writing logfile.... Finished writing logfile. I got from 3 I got from 6 I got from 2 I got from 7 I got from 4 I got from 1 I got from 5 –Check to verify that the a.clog file is created bash-2.04$ !l ls numbersSummation* numbersSummation numbersSummation.clog numbersSummation.txt numbersSummation.c numbersSummation.o numbersSummation.xls
Supplementary Slides S.15 Linking with Logging Libraries (cont’d) Use Jumpshot to visualize the.clog file –Run vncserver to get Linux remote desktop –Launch Jumpshot on the.clog file May require conversion to.slog-2
Supplementary Slides S.16 Jumpshot: Sample Display
Supplementary Slides S.17 Linking with Tracing Libraries Compile your MPI code and link using the -mpitrace flag: bash-2.04$ mpicc -c numbersSummation.c bash-2.04$ mpicc -o numbersSummation numbersSummation.o –mpitrace –Running: bash-2.04$ mpirun -np 4 numbersSummation Starting MPI_Init... [1] Ending MPI_Init [1] Starting MPI_Comm_size... [1] Ending MPI_Comm_size [1] Starting MPI_Comm_rank... [1] Ending MPI_Comm_rank [1] Starting MPI_Bcast... [2] Ending MPI_Init [3] Ending MPI_Init ……
Supplementary Slides S.18 Linking with Animation Libraries Compile your MPI code and link using the -mpianim flag: bash-2.04$ mpicc -c numbersSummation.c bash-2.04$ mpicc -o numbersSummation -mpianim numbersSummation.o - L/export/tools/mpich/lib -lmpe -L/usr/X11R6/lib -lX11 -lm –Running: bash-2.04$ mpirun -np 4 numbersSummation
Supplementary Slides S.19 Starting mpirun with a Debugger bash-2.04$ mpirun -dbg=gdb -np 4 summation GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... Breakpoint 1 at 0x804cbee Breakpoint 1, 0x0804cbee in MPI_Init () (gdb)
Supplementary Slides S.20 Structural changes may need to be made to a parallel program after measuring its performance –Hot spots exposed etc A number of measures can be taken to optimize a parallel program: 1.Change the number of processes to alter process granularity 2.Increase messages sizes to lessen the effect of startup times 3.Recompute values locally rather than send computed values in additional messages to send these values 4.Latency hiding – overlapping communication with computation 5.Perform critical path analysis – determine the longest path that dominates overall execution time 6.Address effect of memory hierarchy – reducing cache misses by, for example, reordering the memory requests in the program Optimization Strategies
Supplementary Slides S.21 Check documentation for mpich, jumpshot, mpe in: –/tools/mpich/doc References