Download presentation
Presentation is loading. Please wait.
Published byEvangeline Ferguson Modified over 9 years ago
1
Simplifying the Usage of Performance Evaluation Tools: Experiences with TAU and DyninstAPI Paradyn/Condor Week 2010, Rm 221, Fluno Center, U. of Wisconsin, Madison, 10:45am – 11:30 am Tuesday, 14 th April, 2010 Sameer Shende, Allen D. Malony, Alan Morris Performance Research Laboratory University of Oregon, Eugene, OR {sameer, malony, amorris}@cs.uoregon.edu http://tau.uoregon.edu
2
2 Acknowledgements: University of Oregon Dr. Allen D. Malony, Professor, CIS Dept, and Director, NeuroInformatics Center Alan Morris, Senior software engineer Dr. Chee Wai Lee, Research faculty Wyatt Spear, Software engineer Scott Biersdorff, Software engineer Dr. Robert Yelle, Research faculty Suzanne Millstein, Ph.D. student And Matt Legendre and Dan McNulty, University of Wisconsin at Madison
3
http://tau.uoregon.edu3 Motivation We have made great advances in instrumentation, measurement and analysis techniques Tools are rich in features and have a complex tool dependency Tools are getting more complex to use and to install We need to simplify the usage of our performance evaluation tools!
4
http://tau.uoregon.edu4 TAU Performance System ® Integrated toolkit for performance problem solving Instrumentation, measurement, analysis, visualization Portable performance profiling and tracing facility Performance data management and data mining Based on direct performance measurement approach Open source Available on all HPC platforms Partners LLNL, ANL, ORNL, LANL, PNNL, LBL Research Centre Jülich, TU Dresden TAU Architecture
5
http://tau.uoregon.edu5 TAU Parallel Performance System Goals Portable (open source) parallel performance system Computer system architectures and operating systems Different programming languages and compilers Multi-level, multi-language performance instrumentation Flexible and configurable performance measurement Support for multiple parallel programming paradigms Multi-threading, message passing, mixed-mode, hybrid, object oriented (generic), component-based Support for performance mapping Integration of leading performance technology Scalable (very large) parallel performance analysis
6
http://tau.uoregon.edu6 TAU Performance System Components TAU Architecture Program Analysis Parallel Profile Analysis PDT PerfDMF ParaProf Performance Data Mining Performance Monitoring TAUoverMRNet (ToM) PerfExplorer
7
http://tau.uoregon.edu7 TAU Performance System Architecture
8
http://tau.uoregon.edu8 TAU Performance System Architecture
9
http://tau.uoregon.edu9 Parallel Profile Visualization: ParaProf
10
http://tau.uoregon.edu10 Scalable Visualization: ParaProf (128k cores)
11
http://tau.uoregon.edu11 Scatter Plot: ParaProf (128k cores)
12
http://tau.uoregon.edu12 ParaProf: Communication Matrix Display
13
http://tau.uoregon.edu13 Comparing Effects of Multi-Core Processors AORSA2D magnetized plasma simulation Automatic loop level instrumentation Blue is single node Red is dual core Cray XT3 (4K cores)
14
http://tau.uoregon.edu14 ParaProf: Mflops Sorted by Exclusive Time low mflops?
15
http://tau.uoregon.edu15 Performance Regression Testing
16
http://tau.uoregon.edu16 Usage Scenarios: Evaluate Scalability
17
http://tau.uoregon.edu17 Scaling NAMD with CUDA (Jumpshot with TAU) Data transfer
18
http://tau.uoregon.edu18 Measuring Performance of PGI Accelerated Code
19
http://tau.uoregon.edu19 TAU and Eclipse Provide an interface for configuring TAU’s automatic instrumentation within Eclipse’s build system Manage runtime configuration settings and environment variables for execution of TAU instrumented programs C/C++/Fortran Project in Eclipse Add or modify an Eclipse build configuration w/ TAU Temporary copy of instrumented code Compilation/linking with TAU libraries TAU instrumented libraries Program execution Performance data Program output
20
http://tau.uoregon.edu20 TAU and Eclipse PerfDMF
21
http://tau.uoregon.edu21 Choosing PAPI Counters with TAU in Eclipse
22
http://tau.uoregon.edu22 TAU Performance System Architecture
23
http://tau.uoregon.edu23 TAU Instrumentation Approach Support for standard program events Routines, classes and templates Statement-level blocks Begin/End events (Interval events) Support for user-defined events Begin/End events specified by user Atomic events (e.g., size of memory allocated/freed) Selection of event statistics Support definition of “semantic” entities for mapping Support for event groups (aggregation, selection) Instrumentation optimization Eliminate instrumentation in lightweight routines
24
http://tau.uoregon.edu24 TAU Instrumentation Mechanisms Source code Manual (TAU API, TAU component API) Automatic (robust) C, C++, F77/90/95 (Program Database Toolkit (PDT)) OpenMP (directive rewriting (Opari), POMP2 spec) Object code Compiler-based instrumentation (-optCompInst) Pre-instrumented libraries (e.g., MPI using PMPI) Statically-linked and dynamically-linked (tau_wrap) Executable code Binary re-writing and dynamic instrumentation (DyninstAPI, U. Wisconsin, U. Maryland) Virtual machine instrumentation (e.g., Java using JVMPI) Interpreter based instrumentation (Python) Kernel based instrumentation (KTAU)
25
http://tau.uoregon.edu25 Program Database Toolkit (PDT) Application / Library C / C++ parser Fortran parser F77/90/95 C / C++ IL analyzer Fortran IL analyzer Program Database Files IL DUCTAPE PDBhtml SILOON CHASM TAU_instr Program documentation Application component glue C++ / F90/95 interoperability Automatic source instrumentation
26
http://tau.uoregon.edu26 Automatic Source-Level Instrumentation in TAU TAU v2.19.1+: If source based instrumentation fails, compiler-based instrumentation is used automatically
27
http://tau.uoregon.edu27 Using TAU with Source Code Instrumentation TAU supports several measurement options (profiling, tracing, profiling with hardware counters, etc.) Each measurement configuration of TAU corresponds to a unique stub makefile that is generated when you configure it To instrument source code using PDT Choose an appropriate TAU stub makefile in /lib: % export TAU_MAKEFILE=/usr/local/packages/tau/x86_64/lib/Makefile.tau-mpi-pdt % export TAU_OPTIONS=‘-optVerbose …’ (see tau_compiler.sh -help) And use tau_f90.sh, tau_cxx.sh or tau_cc.sh as Fortran, C++ or C compilers: % mpif90 foo.f90 changes to % tau_f90.sh foo.f90 Execute application and analyze performance data: % pprof (for text based profile display) % paraprof (for GUI)
28
http://tau.uoregon.edu28 TAU Measurement Configuration – Examples % cd /usr/local/packages/tau/x86_64/lib; ls Makefile.* Makefile.tau-pdt Makefile.tau-mpi-pdt Makefile.tau-papi-mpi-pdt Makefile.tau-pthread-pdt Makefile.tau-pthread-mpi-pdt Makefile.tau-openmp-opari-pdt Makefile.tau-openmp-opari-mpi-pdt Makefile.tau-papi-openmp-opari-mpi-pdt … For an MPI+F90 application, you may want to start with: Makefile.tau-mpi-pdt Supports MPI instrumentation & PDT for automatic source instrumentation % setenv TAU_MAKEFILE /usr/local/packages/tau/x86_64/lib/Makefile.tau-mpi-pdt % tau_f90.sh application.f90; mpirun –np 256./a.out
29
http://tau.uoregon.edu29 Compile-Time Environment Variables Optional parameters for TAU_OPTIONS: [tau_compiler.sh –help] -optVerboseTurn on verbose debugging messages -optCompInstUse compiler based instrumentation -optNoCompInstDo not revert to compiler instrumentation if source instrumentation fails. -optDetectMemoryLeaks Turn on debugging memory allocations/ de-allocations to track leaks -optKeepFiles Does not remove intermediate.pdb and.inst.* files -optPreProcess Preprocess Fortran sources before instrumentation -optTauSelectFile="" Specify selective instrumentation file for tau_instrumentor -optLinking="" Options passed to the linker. Typically $(TAU_MPI_FLIBS) $(TAU_LIBS) $(TAU_CXXLIBS) -optCompile="" Options passed to the compiler. Typically $(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS) -optPdtF95Opts="" Add options for Fortran parser in PDT (f95parse/gfparse) -optPdtF95Reset="" Reset options for Fortran parser in PDT (f95parse/gfparse) -optPdtCOpts="" Options for C parser in PDT (cparse). Typically $(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS) -optPdtCxxOpts="" Options for C++ parser in PDT (cxxparse). Typically $(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS)...
30
http://tau.uoregon.edu30 Runtime Environment Variables in TAU Environment VariableDefaultDescription TAU_TRACE0Setting to 1 turns on tracing TAU_CALLPATH0Setting to 1 turns on callpath profiling TAU_TRACK_HEAP or TAU_TRACK_HEADROOM 0Setting to 1 turns on tracking heap memory/headroom at routine entry & exit using context events (e.g., Heap at Entry: main=>foo=>bar) TAU_CALLPATH_DEPTH2Specifies depth of callpath. Setting to 0 generates no callpath or routine information, setting to 1 generates flat profile and context events have just parent information (e.g., Heap Entry: foo) TAU_SYNCHRONIZE_CLOCKS1Synchronize clocks across nodes to correct timestamps in traces TAU_COMM_MATRIX0Setting to 1 generates communication matrix display using context events TAU_THROTTLE1Setting to 0 turns off throttling. Enabled by default to remove instrumentation in lightweight routines that are called frequently TAU_THROTTLE_NUMCALLS100000Specifies the number of calls before testing for throttling TAU_THROTTLE_PERCALL10Specifies value in microseconds. Throttle a routine if it is called over 100000 times and takes less than 10 usec of inclusive time per call TAU_COMPENSATE0Setting to 1 enables runtime compensation of instrumentation overhead TAU_PROFILE_FORMATProfileSetting to “merged” generates a single file. “snapshot” generates xml format TAU_METRICSTIMESetting to a comma separted list generates other metrics. (e.g., TIME:linuxtimers:PAPI_FP_OPS:PAPI_NATIVE_ )
31
http://tau.uoregon.edu31 Simplifying Instrumentation using DyninstAPI TAU uses DyninstAPI to create a binary re-writer (tau_run) TAU’s measurement library (DSO) is loaded by tau_run Both runtime instrumentation and binary re-writing are supported Selection of files and routines based on exclude/include lists Simplifies tool usage greatly! Available on POINT LiveDVD [http://tau.uoregon.edu/point.iso] Usage: % tau_run a.out –o a.inst.out % mpirun –np 4 a.inst.out % paraprof
32
http://tau.uoregon.edu32 Issues Re-writing static executables limited to gcc, limited platforms in beta Currently, we support dynamic executables (v6.1) We are working on supporting both static and dynamic executables We hope to support more platforms, compilers and runtime systems in the future Rewriting shared libraries used by the application LD_PRELOAD’able wrapper libraries can be created using tau_wrap requires interface information in header file
33
http://tau.uoregon.edu33 Binary Rewriting in TAU using DyninstAPI
34
http://tau.uoregon.edu34 Wish List for tau_run Support for more platforms Apple Mac OS X, Windows, IBM BG/P, AIX, … Support for more compilers Support for rewriting shared objects Support for static binary rewriting with validation for compilers other than gcc XLC, PathScale, Cray CCE, Intel, PGI,…
35
http://tau.uoregon.edu35 Other Tools… Other TAU tools that use technologies from the ParaDyn/DyninstAPI group TAU over MRNet (ToM) for runtime Stackwalker API for accessing callstack
36
http://tau.uoregon.edu36 StackWalkerAPI in TAU Requirements overview: Minimal information required (PC is enough) Threaded support necessary Low overhead (for high sample rates) Stack unwinding from a signal handler Malloc could be interrupted Need to walk through signal handler frame
37
http://tau.uoregon.edu37 Issues encountered with StackWalkerAPI StackWalkerAPI: Isn’t thread safe (and locking to use it can cause significant overhead) Uses malloc/new (and so do dependent libraries such as libdwarf) C++ (we would prefer C) Issues walking certain kinds of stack frames Matt Legendre was able to help us out a lot though! Alternatives: TAU is currently using stack walking constructs from HPCToolkit
38
http://tau.uoregon.edu38 Online Monitoring using TAU over MRNet (ToM) Back-End (BE) TAU adapter offloads performance data Filters reduction distributed analysis upstream / downstream Front-End (FE) unpacks, interprets, stores Paths reverse data reduction path multicast control path Push-Pull model source pushes, sink pulls
39
http://tau.uoregon.edu39 Conclusions TAU and DyninstAPI represents mature technology for performance instrumentation, measurement and analysis Using DyninstAPI’s binary re-writing capabilities, we have produced a tool that simplifies code instrumentation We hope to collaborate on other projects and include support for an enhanced stack walker API Questions?
40
http://tau.uoregon.edu40 Support Acknowledgements Department of Energy (DOE) Office of Science MICS, Argonne National Lab ASC/NNSA University of Utah ASC/NNSA Level 1 ASC/NNSA, LLNL Department of Defense (DoD) NSF SDCI Partners: Research Centre Juelich LBL, ORNL, ANL, LANL, PNNL, LLNL TU Dresden ParaTools, Inc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.