Download presentation
Presentation is loading. Please wait.
Published byDoreen Foster Modified over 9 years ago
1
Profiling Tools on the NERSC Crays and IBM/SP NERSC User Services N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER
2
2 Outline Profiling Tools on NERSC platforms –Cray PVP (killeen, seymour) –Cray T3E (mcurie) –IBM/SP (gseaborg) UNIX profiling/performance analysis tools References
3
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 3 Why Profile? Characterise application : –Is code cpu bound? –Is code I/O bound? –Is code memory bound? –Analyse communication patterns - D.M. codes Focus optimisation effort... and ultimately.. Improve performance and resource utilisation
4
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 4 Cray PVP/T3E - Application Characterization Job accounting (ja) ja./a.out ja -st -n a.out - see next slide for sample output Look out for : Maximum Memory Used > available memory Total I/O wait time (locked+unlocked) > 50% User CPU time Multitasking breakdown for parallel codes
5
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 5 Job accounting : summary report Elapsed Time : 8 Seconds User CPU Time : 35.5939 Seconds Multitasking/ Multistreaming Breakdown (Concurrent CPUs * Connect seconds = CPU seconds) 1 * 0.0100 =0.0100 2 * 0.0100 = 0.0200 3 * 0.0600 = 0.1800 4 * 8.8500 = 35.4000 (Avg.) (total) (total) 3.99 * 8.9300 = 35.6100 System CPU Time : 0.1226 Seconds I/O Wait Time (Locked) : 0.0000 I/O Wait Time (Unlocked) : 0.0000 CPU Time Memory Integral : 5.3854 Mword-seconds Data Transferred : 0.0001 MWords Maximum memory used : 0.4746 MWords
6
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 6 HPM - Hardware Performance Monitor Helps locate CPU related code bottlenecks reports use of vector registers, instruction buffers, memory ports hpm {options}./a.out {prog_arguments} options = -g2 -> memory access information options = -g3 -> vector register information Look for : Ratio of Floating Ops/CPU second to CPU mem. references per sec should reflect the FpOps in the code
7
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 7 Sample hpm output : (hpm -g0./a.out) Million inst/sec (MIPS) : 7.67 Instructions : 274017290 Avg. clock periods/inst : 26.06 % CP holding issue : 94.02 CP holding issue : 6714667737 Inst.buffer fetches/sec : 0.04M Inst.buf. fetches: 1420802 Floating adds/sec : 15.40M F.P. adds : 550002417 Floating multiplies/sec : 24.36M F.P. multiplies : 870004996 Floating reciprocal/sec : 0.28M F.P. reciprocals : 10000042 Cache hits/sec : 0.00M Cache hits : 45893 CPU mem. references/sec : 34.64M CPU references : 1236978495 Floating ops/CPU second: 40.5M
8
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 8 Cray PVP : CPU Bound Codes: prof/profview Instruments code to provide % cpu time in function calls f90 -lprof prog.f90./a.out -> generates prof.data prof -st./a.out > prof.report Chart (over) indicates relative distribution of CPU execution time by function call –prof -x a.out > pgm.prof –profview pgm.prof
9
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 9 Profview - Sample Output
10
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 10 I/O and Memory Bound Codes : procstat/procview procstat -m -i -R a.raw a.out procview a.raw –I/O Analysis : Reports, Files -> All User Files (Long Report) Bytes Processed or I/O Wait Time –Memory Analysis : Reports -> Processes -> Maximum Memory Used (Long Format)
11
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 11 I/O Bound Codes : procview procview indicates which files consume most real time for I/O processing
12
Memory Bound Codes : procview –“High” (> 10% Elapsed Time) Time to complete Memory requests may indicate memory bound code –Use Graphs option to produce plot of Memory use over elapsed time of application N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 12
13
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 13 ATExpert - Autotasking Prediction Analysis of source code to predict autotasking performance on dedicated Cray PVP f90 -eX -O3 -r4 -o {prog_name} prog.f90 –./a.out –atexpert -> shows predicted speed-up
14
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 14 ATExpert Sample output Indicates predicted speed-up of 4.3 on dedicated 8 processor PVP when source code is autotasked
15
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 15 Also available on Cray PVP Flowtrace/flowview times (using Operating System timers) subroutines and functions during program execution jumptrace/jumpview provides exact timing in function/subroutine by analysis of machine instructions in program perftrace/perfview times subroutines/functions based on statistics gathered from HPM tool
16
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 16 Cray T3E - Apprentice Locate performance problems /inefficiencies MPI and shared memory performance, load balance and communication, memory use Provides hardware performance information and tuning recommendations (Displays -> Observations) Compile/link f90 -o {prog} -eA {prog_name.f90} -lapp cc -o {prog} -happrentice {prog_name.c} -lapp Run code to generate app.rif
17
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 17 Output from : apprentice app.rif
18
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 18 Cray T3E - PAT Generates profile of CPU time in functions; load balance across PEs; h/w counter info. Compile and Link with PAT library f90 -o exe -lpat {source.f} pat.cld Run program as normal mpprun -n {procs} {exe} -> generate exe.pif pat executable exe.pif
19
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 19 Profile based on relative CPU time in function calls Load Balance Histogram for routine “COLL”
20
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 20 Cray T3E - ACTS/TAU Performance analysis of distributed/shared memory applications (C++ in particular) module load tau instrument programs with TAU macros add $(TAU_DEFS), $(TAULIBS) to compile/link run application; view tracefile with pprof, VAMPIR Reference http://acts.nersc.gov/tau http://hpcf.nersc.gov/training/classes/Teleconf/1999july/Wu
21
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 21 Cray T3E - Vampir Analysis of message passing characteristics - generates display of MPI activity over instrumented time period (e.g. sender, receiver, message size, elapsed time) module load VAMPIR; module load vampirtrace Facility to instrument with VAMPIRtrace calls Generate trace file using TAU or VAMPIRtrace Reference : http://hpcf.nersc.gov/software/tools/vampir.html
22
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 22 IBM/SP - Xprofiler Graphical interface for gprof profiles of parallel applications –Compile and link code with “-g -pg” –poe./a.out -procs {n} generates gmon.out.{n} file for each process may introduce significant (upto factor of 2) overhead –(In $TMPDIR) xprofiler./a.out gmon.out.* Report menu provides (gprof) text profile Source statement profiling shown
23
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 23
24
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 24 Statement level profile available by clicking on relevant function graphical output - use Show Source Code option
25
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 25 IBM/SP - Visualization Tool (VT) Message passing trace visualization Realtime system activity monitor (limited) MPI load balance overview : poe./a.out -procs {n} -tlevel=3 copy a.out.trc to $TMPDIR (In $TMPDIR) Invoke vt In trace visualization mode, “Play” a.out.trc see next slide for sample of Interprocessor Communication during program execution
26
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 26
27
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 27 IBM/SP : system_stats IBM Internal Tool module load sptools instrument code with system_stats() call Link with $(SPTOOLS), run code as normal Sample output Summary of the utilization of system resources: node hostname wall(s) user(s) sys(s) size(KB) pswitches 0 gs01015 16.80 13.18 0.04 2748 2138 1 gs01015 16.80 16.07 0.04 2744 1868 2 gs01003 16.80 16.62 0.04 2740 1870 3 gs01003 16.80 16.56 0.03 2732 1841
28
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 28 IBM/SP - trace-mpi IBM Internal tool - Quantitative information on MPI calls –module load USG ; module load trace-mpi –Fortran - add $(TRACE_MPIF) to build –C - add $(TRACE_MPI) to build –poe./a.out -procs {n} - generates mpi.trace_file for each process (executable must call MPI_Finalize) –summary mpi.trace_file.{n} (see over) Useful check for load balance : –grep “Total Communication” mpi.trace.file.*
29
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 29 MPI message-passing summary for mpi.trace_file.3 MPI Function #calls Avg Bytes Time (sec) ------------------------------------------------------------- MPI_Allreduce: 9355 8.0 3.596 MPI_Barrier: 3 0.0 0.017 MPI_Bcast: 66 5.8 0.013 MPI_Scatter: 31 1008.0 0.088 MPI_Comm_rank: 1 0.0 0.000 MPI_Comm_size: 1 0.0 0.000 MPI_Isend: 43023 2003.7 0.893 MPI_Recv: 43023 2003.7 7.481 MPI_Wait: 43023 2003.7 3.739 Total Communication Information: WALL = 15.8277, CPU = 15.53, MBYTES = 258.72 The total amount of wall time = 26.229613
30
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 30 Upcoming on the SP ACTS/TAU (C/C++) currently being ported to the IBM/SP VAMPIR has been ordered, awaiting delivery Performance Monitor Toolkit (HPM) should be available with Phase II system (requires AIX 4.3.4) Also, see Performance API project: –http://icl.cs.utk.edu/projects/papihttp://icl.cs.utk.edu/projects/papi
31
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 31 General/UNIX Profiling Tools Command line profilers and system analysis prof/gprof (enabled for MPI on IBM/SP) csh time command : time./a.out vmstat -> look for high paging over extended time period (application may require more memory) Fortran/C function timers getrusage rtc, irtc etime, dtime, mclock MPI_Wtime
32
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 32 Reference Material NERSC web pages http://hpcf.nersc.gov/software/tools Cray PVP/Cray T3E http://www.cray.com/swpubs –Optimizing Code on Cray PVP Systems –Cray T3E C, Fortran Optimization Guides IBM/SP LLNL Workshop on Performance Tools
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.