Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications.

Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications

Objective Measure single PE performance –Operation counts, wall time, MFLOP rates –Cache utilization ratio Study scalability –Time spent in MPI calls vs. computation –Time spent in OpenMP parallel sections

Atom Tools atom(1) –Various tools –Low overhead –No recompiling or re-linking in some cases

Useful Tools Flop2: –Floating point operations count Timer5: –Wall time (inclusive & exclusive) per routine Calltrace: –Detailed statistics of calls and their arguments Developed by Dick Foster @ Compaq

Instrumentation Load atom module –module load atom Create routines file –nm –g a.out | awk ‘{if($5==“T”) print $1}’ > routines Edit routines file –place main routine first; remove unwanted ones Instrument executable –cat routines | atom –tool flop2 a.out –cat routines | atom –tool timer5 a.out Execute – a.out.[flop2,timer5] to create fprof.* and tprof.*

Single PE Performance Analysis Procedure Calls Self Time Total Time ========= ===== ========= ========== $null_evol$null_j_ 3072 60596709 79880903 $null_eth$null_d1_ 72458 45499161 45499161 $null_hyper_u$null_u_ 3328 39889655 44500045 $null_hyper_w$null_w_ 3328 19195271 33769541............ ============= ========== ============ ============ Total 1961226 248258934 248258934 Sample Timer5 output file:

Single PE Performance Analysis Procedure Calls Fops ========= ===== ==== $null_evol$null_j_ 3072 20406036288 $null_eth$null_d1_ 72458 20220926518 $null_hyper_u$null_u_ 3328 14062774258 $null_hyper_w$null_w_ 3328 3823795456......... ========================================== Total 1936818 70876179927 Sample Flop2 output file: Obtain MFLOPS = Fops/(Self Time)

MPI calltrace module load atom cat $ATOMPATH/mpicalls | atom –tool \ calltrace a.out Execute a.out.calltrace to generate one trace file per PE Gather timings for desired MPI routines Repeat for increasing number of processors

Sample calltrace statistics: Number of processors 8 PEs 128 PEs 256 PEs Processor grid 2x2x2 8x4x4 8x8x4 Total Run time: 277.028 314.857 422.170 MPI_ISEND Statistics 1.250 1.498 2.265 MPI_RECV Statistics 4.349 19.779 26.537 MPI_WAIT Statistics 9.172 16.311 20.150 MPI_ALLTOALL Statistics 5.072 9.433 12.894 MPI_REDUCE Statistics 0.013 0.162 0.002 MPI_ALLREDUCE Statistics 0.391 2.073 10.313 MPI_BCAST Statistics 0.061 1.135 1.382 MPI_BARRIER Statistics 14.959 28.694 62.028 ____________________________________________________ Total MPI Time 35.267 79.085 135.571

calltrace timings graph

DCPI Digital Continuous Profiling Infrastructure daemon and profiling utilities Very low overhead (1-2%) Aggregate or per-process data and analysis No code modifications Requires interactive access to compute nodes

DCPI Example Driver script – creates map file and host list – calls daemon and profiling scripts Daemon startup script – starts daemon with selected options Daemon shutdown script – halts daemon Profiling script – executes post-processing utility with selected options

DCPI Driver Script PBS job file –dcpi.pbsdcpi.pbs Creates map file and host list –Image map generated by dcpiscan(1) –Host list used by dsh(1) commands Executes daemon and profiling scripts –Start daemon, run test executable, stop daemon, post-process

DCPI Startup Script C shell script –dcpi_start.cshdcpi_start.csh Three arguments defined by driver job –MAP, WORK, EXE Creates database directory (DCPIDB) –Derived from WORK + hostname Starts dcpid(1) process –Events of interest are specified here

DCPI Stop Script C shell script –dcpi_stop.cshdcpi_stop.csh No arguments dcpiquit(1) flushes buffers and halts the daemon process

DCPI Profiling Script C shell script –dcpi_post.cshdcpi_post.csh Three arguments defined by driver job –MAP, WORK, EXE Determines database location (as before) Uses dcpiprof(1) to post-process database files –Profile selection(s) must be consistent with daemon startup options

DCPI Example Output Profiler writes to stdout by default –dcpi.outputdcpi.output Single node output in four sections –Start daemon, run test, halt daemon –Basic dcpiprof output –Memory operations (MOPS) –Floating point operations (FOPS) Reference profiling script for details

Other DCPI Options Per-process output files –See dcpid(1) –bypid option Trim output –See dcpiprof(1) –keep option Host list can also be cropped ProfileMe events for EV67 and later –Focus on –pm events –See dcpiprofileme(1) options

Common DCPI Problems Login denied (dsh) –Requires permission to login on compute nodes Daemon not started in background NFS is flaky for larger node counts (100+) Set filemode of DCPIDB directory correctly Mismatch between startup configuration and profiling specifications –See dcpid(1), dcpiprof(1), and dcpiprofileme(1)

Summary Low-level interfaces provide access to hardware counters Very effective, but requires experience Minimal overhead costs Report timings, flop counts, MFLOP rates for user code and library calls, e.g. MPI More information available, e.g. message sizes, time variability, etc.

Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications.

Similar presentations

Presentation on theme: "Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications.

Similar presentations

Presentation on theme: "Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications."— Presentation transcript:

Similar presentations

About project

Feedback