Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Analysis Tools

Similar presentations


Presentation on theme: "Performance Analysis Tools"— Presentation transcript:

1 Performance Analysis Tools

2 Performance Analysis Goal
Once we have a working parallel program, we want to tune it to run faster Hot spot – An area of code that uses a significant amount of CPU time Bottleneck – An area of code that uses resources inefficiently and slows the program down (e.g. communication)

3 Timers One way to identify hot spots and bottlenecks is to use timers.
We’ve used it to measure the elapsed time of the entire algorithm, but this can be used to measure time spent on different parts of the algorithm

4 Timers Timer Usage Wallclock / CPU Time Resolution Languages time
Shell script Both 1/100th second Any gettimeofday Subroutine Wallclock Microseconds C/C++ read_real_time Nanoseconds C/C++ on IBM AIX Systems MPI_Wtime Subrouting C/C++,Fortran

5 Time command Usage: time mpirun –np # command Result real 0m1.071s
user 0m0.177s sys 0m0.045s

6 Time command Meaning Real time: the total wall clock (start to finish) time your program took to load, execute, and exit. User time: the total amount of CPU time your program took to execute. System time: the amount of CPU time spent on operating system calls in executing your program.

7 gettimeofday gettimeofday is a system call that returns a structure that gives the time since Epoch (January ) int gettimeofday(struct timeval *tv, struct timezone *tz); The timeval structure has seconds and microseconds: struct timeval { time_t tv_sec; /* seconds */ suseconds_t tv_usec; /* microseconds */ };

8 gettimeofday Usage: #include <sys/time.h>
struct timeval tv1, tv2; ... gettimeofday(&tv1, NULL); ... // Work to be timed gettimeofday(&tv2, NULL); // Convert time to seconds elapsed_time = (tv2.tv_sec - tv1.tv_sec) + ((tv2.tv_usec - tv1.tv_usec) / );

9 MPI_Wtime Returns a single double-precision value that is the number of seconds since some time in the past (most likely Epoch) MPI also provides a MPI_Wtick() routine that provides the resolution (most likely microseconds)

10 MPI_Wtime Usage: #include "mpi.h" ... double start,end,resolution;
MPI_Init(&argc, &argv); start = MPI_Wtime(); /* start time */ ... // Work to be timed end = MPI_Wtime(); /* end time */ resolution = MPI_Wtick(); printf("elapsed= %e resolution= %e\n", end-start, resolution);

11 MPI_Wtime Sample output:
Wallclock times(secs): start= end= elapsed= e-03 resolution= e-06 Elapsed time (seconds) Accurate to microseconds

12 read_real_time read_real_time is a system call that returns a structure that gives the time since Epoch (January ) int read_real_time(timebasestruct_t *t, size_t size_of_timebasestruct_t); Designed to measure time accurate to nanoseconds Guarantee correct time units across different IBM RS/6000 architectures.

13 read_real_time #include <sys/time.h> ... timebasestruct_t start, finish; int secs, n_secs; read_real_time(&start, TIMEBASE_SZ); /* do some work */ read_real_time(&finish, TIMEBASE_SZ); /* Make sure both values are in seconds and nanoseconds */ time_base_to_time(&start, TIMEBASE_SZ); time_base_to_time(&finish, TIMEBASE_SZ);

14 read_real_time Usage continued: ...
/* Subtract the starting time from the ending time */ secs = finish.tb_high - start.tb_high; n_secs = finish.tb_low - start.tb_low; /* Fix carry from low-order to high-order during the measurement */ if (n_secs < 0) { secs--; n_secs += ; } printf("Sample time was %d seconds %d nanoseconds\n", secs, n_secs);

15 Profilers Profiler prof gprof Xprofiler mpiP

16 prof Compile your program with the –p option: Run the program
gcc –p <program>.c –o <program> Run the program Profile file created called mon.out Run: prof –m mon.out

17 Sample Output from prof
Name %Time Seconds Cumsecs #Calls msec/call .fft 51.8 0.59 1024 0.576 .main 40.4 0.46 1.05 1 460. .bit_reverse 7.9 0.09 1.14 0.088 .cos 0.0 0.00 256 .sin .catopen 0. .setlocale ._doprnt 7 ._flsbuf 11 ._xflsbuf ._wrtchk ._findbuf ._xwrite .free 2 .free_y .write .exit .memchr 19 .atoi .__nl_langinfo_std 4 .gettimeofday 8 .printf

18 gprof Compile your program with the –p option: Run the program
gcc –gp <program>.c –o <program> Run the program Profile file created called gmon.out Run: gprof <program> gmon.out

19 Sample Output from gprof
ngranularity: Each sample hit covers 4 bytes. Time: 1.17 seconds called/total parents index %time self descendents called+self name index called/total children /1 .__start [2] [1] main [1] /1024 .fft [3] /256 .cos [6] /256 .sin [7] /8 .gettimeofday [11] /7 .printf [16] /1 .atoi [31] /1 .exit [33]

20 xprofiler X Windows profiler based on gprof
Compile and run the program as you would with gprof Run: xprofiler <program> gmon.out Provides a graphical representation of the program execution

21 Library View

22 Function View

23 mpiP Compile an MPI program with –g: Run the MPI program as usual
mpcc -g <program>.c –o <program> -L/usr/local/tools/mpiP/lib -lmpiP -lbfd Run the MPI program as usual A file is created called <program>.N.XXXXX.mpiP Where N is the number of processors and XXXXX is the collector task processor id

24 Sample output from mpiP
@ Command : Version : Build date : Mar , Start time : Stop time : Number of tasks : Collector Rank : Collector PID : Event Buffer Size : Final Trace Dir : Local Trace Dir : Task Map : 0 blue333.pacific.llnl.gov Task Map : 1 blue334.pacific.llnl.gov Task Map : 2 blue335.pacific.llnl.gov Task Map : 3 blue336.pacific.llnl.gov 0

25 Sample output from mpiP
MPI Time (seconds) Task AppTime MPITime MPI% *

26 Sample output from mpiP
Callsites: ID MPICall ParentFunction Filename Line PC 1 Barrier copyglob copyglob.f b9c 2 Barrier copypriv.f cd4 3 Barrier copypriv.f c 4 Barrier copypriv.f Barrier copypriv.f b04 6 Barrier sphot sphot.f f2c 7 Bcast rdopac rdopac.f Comm_rank copyglob copyglob.f a8 9 Comm_rank copypriv copypriv.f c38 10 Comm_rank genxsec genxsec.f c 11 Comm_rank rdinput rdinput.f d4 …

27 Sample output from mpiP
Aggregate Time (top twenty, descending, milliseconds) Call Site Time App% MPI% Bcast e Barrier e Barrier Waitall Reduce Barrier Barrier Barrier Comm_rank Barrier Comm_rank …

28 Sample output from mpiP
Callsite statistics (all, milliseconds): Name Site Rank Count Max Mean Min App% MPI% Barrier Barrier Barrier e e e Barrier e e e Barrier 1 * e e Barrier Barrier Barrier Barrier Barrier 2 * Send Send Send Send 31 *

29 Questions


Download ppt "Performance Analysis Tools"

Similar presentations


Ads by Google