Download presentation
Presentation is loading. Please wait.
1
Profiling S3D on Cray XT3 using TAU Sameer Shende tau-team@cs.uoregon.edu
2
TAU Performance SystemProfiling S3D Harness2 Acknowledgements Alan Morris [UO] Kevin Huck [UO] Allen D. Malony [UO] Kenneth Roche [ORNL] Bronis R. de Supinski [LLNL]
3
TAU Performance SystemProfiling S3D Harness3 TAU Parallel Performance System http://www.cs.uoregon.edu/research/tau/ Multi-level performance instrumentation Multi-language automatic source instrumentation Flexible and configurable performance measurement Widely-ported parallel performance profiling system Computer system architectures and operating systems Different programming languages and compilers Support for multiple parallel programming paradigms Multi-threading, message passing, mixed-mode, hybrid
4
TAU Performance SystemProfiling S3D Harness4 TAU Performance System Architecture event selection
5
TAU Performance SystemProfiling S3D Harness5 TAU Performance System Architecture
6
TAU Performance SystemProfiling S3D Harness6 Program Database Toolkit (PDT) Application / Library C / C++ parser Fortran parser F77/90/95 C / C++ IL analyzer Fortran IL analyzer Program Database Files IL DUCTAPE PDBhtml SILOON CHASM TAU_instr Program documentation Application component glue C++ / F90/95 interoperability Automatic source instrumentation
7
TAU Performance SystemProfiling S3D Harness7 PAPI Performance Application Programming Interface The purpose of the PAPI project is to design, standardize and implement a portable and efficient API to access the hardware performance monitor counters found on most modern microprocessors. Parallel Tools Consortium project Developed by University of Tennessee, Knoxville http://icl.cs.utk.edu/papi/
8
TAU Performance SystemProfiling S3D Harness8 S3D - Building with TAU Change name of compiler in build/make.XT3 ftn=> tau_f90.sh cc => tau_cc.sh Set compile time environment variables setenv TAU_MAKEFILE /spin/proj/perc/TOOLS/tau_latest/xt3/lib/ Makefile.tau-callpath-multiplecounters-mpi-papi-pdt-pgi Choose callpath, PAPI counters, MPI profiling, PDT for source instrumentation setenv TAU_OPTIONS ‘-optTauSelectFile=select.tau -optPreProcess’ Selective instrumentation file eliminates instrumentation in lightweight routines Pre-process Fortran source code using cpp before compiling Set runtime environment variables for instrumentation control and event PAPI counter selection in job submission script: export TAU_THROTTLE=1 export COUNTER1 GET_TIME_OF_DAY export COUNTER2 PAPI_FP_INS export COUNTER3 PAPI_L1_DCM export COUNTER4 PAPI_RES_STL export COUNTER5 PAPI_L2_DCM
9
TAU Performance SystemProfiling S3D Harness9 Selective Instrumentation in TAU % cat select.tau BEGIN_EXCLUDE_LIST MCADIF GETRATES TRANSPORT_M::MCAVIS_NEW MCEDIF MCACON CKYTCP THERMCHEM_M::MIXCP THERMCHEM_M::MIXENTH THERMCHEM_M::GIBBSENRG_ALL_DIMT CKRHOY MCEVAL4 THERMCHEM_M::HIS THERMCHEM_M::CPS THERMCHEM_M::ENTROPY END_EXCLUDE_LIST BEGIN_INSTRUMENT_SECTION loops routine="#" END_INSTRUMENT_SECTION
10
TAU Performance SystemProfiling S3D Harness10 TAU’s ParaProf Profile Browser - Manager Derived Metrics Flops = PAPI_FP_INS/wallclock time
11
TAU Performance SystemProfiling S3D Harness11 Main Window - 8 cpus (MPI Ranks 0-7) Some routines execute on different sets of processors
12
TAU Performance SystemProfiling S3D Harness12 Mean Profile Over 8 cpus -- Exclusive Time
13
TAU Performance SystemProfiling S3D Harness13 Mean Percentage -- Exclusive Time
14
TAU Performance SystemProfiling S3D Harness14 Loop Level Profile With PAPI Counter Data
15
TAU Performance SystemProfiling S3D Harness15 ParaProf’s Source Browser
16
TAU Performance SystemProfiling S3D Harness16 Exclusive MFLOPS
17
TAU Performance SystemProfiling S3D Harness17 FP Instructions per L1 Data Cache Miss (rank 0)
18
TAU Performance SystemProfiling S3D Harness18 Level 1 Data Cache Misses
19
TAU Performance SystemProfiling S3D Harness19 Callpath Profiles
20
TAU Performance SystemProfiling S3D Harness20 Callpath Profiles: Flops, Resource Stalls
21
TAU Performance SystemProfiling S3D Harness21 Callpath Thread Relations Window parent routine children
22
TAU Performance SystemProfiling S3D Harness22 Flat Profile
23
TAU Performance SystemProfiling S3D Harness23 TAU’s ParaProf Profile Browser - Manager Different sections of code within the same routine execute on odd and even processors!
24
TAU Performance SystemProfiling S3D Harness24 3D Window: Rank, Routine, Time, Instructions
25
TAU Performance SystemProfiling S3D Harness25 3D Window: Variations in FP/L1 DCM ratios
26
TAU Performance SystemProfiling S3D Harness26 Getting Access to TAU on Jaguar set path=(/spin/proj/perc/TOOLS/tau_latest/x86_64/bin $path) Choose Stub Makefiles (TAU_MAKEFILE env. var.) from /spin/proj/perc/TOOLS/tau_latest/xt3/lib/Makefile.* Makefile.tau-mpi-pdt-pgi (flat profile) Makefile.tau-mpi-pdt-pgi-trace (event trace, for use with Vampir) Makefile.tau-callpath-mpi-pdt-pgi (single metric, callpath profile) Binaries of S3D can be found in: ~sameer/scratch/S3D-BINARIES withtau »papi, multiplecounters, mpi, pdt, pgi options without_tau
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.