TAU: Tuning and Analysis Utilities. TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel.

Slides:



Advertisements
Similar presentations
Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon The TAU Performance.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, Alan Morris University of Oregon {sameer,
Sameer Shende, Allen D. Malony, and Alan Morris {sameer, malony, Steven Parker, and J. Davison de St. Germain {sparker,
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Performance Technology for Complex Parallel Systems Sameer Shende University of Oregon.
Tuning and Analysis Utilities Sameer Shende University of Oregon.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Grouping Performance Data in TAU  Profile Groups  A group of related routines forms a profile group  Statically defined  TAU_DEFAULT, TAU_USER[1-5],
Recent Advances in the TAU Performance System Sameer Shende, Allen D. Malony University of Oregon.
Profiling S3D on Cray XT3 using TAU Sameer Shende
Performance Technology for Complex Parallel Systems Sameer Shende, Allen D. Malony University of Oregon.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Case Study: PETSc ex19  Non-linear solver (snes)  2-D driven cavity code  uses velocity-velocity formulation  finite difference discretization on a.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Performance Technology for Complex Parallel Systems Sameer Shende University of Oregon.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
TAU Performance System Alan Morris, Sameer Shende, Allen D. Malony University of Oregon {amorris, sameer,
Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.
Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee Sameer Shende, and.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
TAU Parallel Performance System DOD UGC 2004 Tutorial Part 1: TAU Overview and Architecture.
Performance Evaluation of S3D using TAU Sameer Shende
TAU Performance Toolkit (WOMPAT 2004 OpenMP Lab) Sameer Shende, Allen D. Malony University of Oregon {sameer,
Allen D. Malony, Sameer Shende, Robert Bell Department of Computer and Information Science Computational Science Institute, NeuroInformatics.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Rudi Eigenmann Department of Electrical and Computer Engineering.
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Tuning and Analysis Utilities Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Performance Observation Sameer Shende and Allen D. Malony cs.uoregon.edu.
Performance Technology for Complex Parallel Systems REFERENCES.
SC’01 Tutorial Nov. 7, 2001 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed.
OMPi: A portable C compiler for OpenMP V2.0 Elias Leontiadis George Tzoumas Vassilios V. Dimakopoulos University of Ioannina.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
Using TAU on SiCortex Alan Morris, Aroon Nataraj Sameer Shende, Allen D. Malony University of Oregon {amorris, anataraj, sameer,
VAMPIR. Visualization and Analysis of MPI Resources Commercial tool from PALLAS GmbH VAMPIRtrace - MPI profiling library VAMPIR - trace visualization.
Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon
SvPablo Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
Allen D. Malony, Sameer S. Shende, Alan Morris, Robert Bell, Kevin Huck, Nick Trebon, Suravee Suthikulpanit, Kai Li, Li Li
Preparatory Research on Performance Tools for HPC HCS Research Laboratory University of Florida November 21, 2003.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck Department of Computer.
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory.
Performance Technology for Complex Parallel Systems Part 1 – Overview and TAU Introduction Allen D. Malony.
Performance Technology for Complex Parallel Systems Part 1 – Overview and TAU Introduction Allen D. Malony.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
TAU integration with Score-P
Tutorial Outline – Part 1
TAU Parallel Performance System
Performance Technology for Complex Parallel and Distributed Systems
A configurable binary instrumenter
TAU Parallel Performance System
TAU: A Framework for Parallel Performance Analysis
Outline Introduction Motivation for performance mapping SEAA model
Allen D. Malony, Sameer Shende
Performance Technology for Complex Parallel and Distributed Systems
Parallel Program Analysis Framework for the DOE ACTS Toolkit
Presentation transcript:

TAU: Tuning and Analysis Utilities

TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed high- performance computing  Targets a general complex system computation model  nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated toolkit for performance instrumentation, measurement, analysis, and visualization  Portable, configurable performance profiling/tracing facility  Open software approach  University of Oregon, LANL, FZJ Germany 

TAU Performance System Architecture

TAU Instrumentation  Flexible instrumentation mechanisms at multiple levels  Source code  manual  automatic using Program Database Toolkit (PDT), OPARI  Object code  pre-instrumented libraries (e.g., MPI using PMPI)  statically linked  dynamically linked (e.g., Virtual machine instrumentation)  fast breakpoints (compiler generated)  Executable code  dynamic instrumentation (pre-execution) using DynInstAPI

TAU Instrumentation (continued)  Targets common measurement interface (TAU API)  Object-based design and implementation  Macro-based, using constructor/destructor techniques  Program units: function, classes, templates, blocks  Uniquely identify functions and templates  name and type signature (name registration)  runtime type identification for template instantiations  C and Fortran instrumentation variants  Instrumentation and measurement optimization

Multi-Level Instrumentation  Uses multiple instrumentation interfaces  Shares information: cooperation between interfaces  Taps information at multiple levels  Provides selective instrumentation at each level  Targets a common performance model  Presents a unified view of execution

TAU Measurement  Performance information  High-resolution timer library (real-time / virtual clocks)  General software counter library (user-defined events)  Hardware performance counters  PAPI (Performance API) (UTK, Ptools Consortium)  consistent, portable API  Organization  Node, context, thread levels  Profile groups for collective events (runtime selective)  Performance data mapping between software levels

TAU Measurement (continued)  Parallel profiling  Function-level, block-level, statement-level  Supports user-defined events  TAU parallel profile database  Function callstack  Hardware counts values (in replace of time)  Tracing  All profile-level events  Inter-process communication events  Timestamp synchronization  User-configurable measurement library (user controlled)

TAU Measurement System Configuration  configure [OPTIONS]  {-c++=, -cc= } Specify C++ and C compilers  {-pthread, -sproc}Use pthread or SGI sproc threads  -openmpUse OpenMP threads  -opari= Specify location of Opari OpenMP tool  -papi= Specify location of PAPI  -pdt= Specify location of PDT  -dyninst= Specify location of DynInst Package  {-mpiinc=, mpilib= }Specify MPI library instrumentation  - TRACE Generate TAU event traces  -PROFILE Generate TAU profiles  -MULTIPLECOUNTERSUse more than one hardware counter  -CPUTIMEUse usertime+system time  -PAPIWALLCLOCKUse PAPI to access wallclock time  -PAPIVIRTUALUse PAPI for virtual (user) time …

TAU Measurement Configuration – Examples ./configure -c++=xlC -cc=xlc –pdt=/usr/packages/pdtoolkit-2.1 -pthread  Use TAU with IBM’s xlC compiler, PDT and the pthread library  Enable TAU profiling (default) ./configure -TRACE –PROFILE  Enable both TAU profiling and tracing ./configure -c++=guidec++ -cc=guidec -papi=/usr/local/packages/papi –openmp -mpiinc=/usr/packages/mpich/include -mpilib=/usr/packages/mpich/lib  Use OpenMP+MPI using KAI's Guide compiler suite and use PAPI for accessing hardware performance counters for measurements  Typically configure multiple measurement libraries

Program Database Toolkit (PDT)  Program code analysis framework for developing source- based tools  High-level interface to source code information  Integrated toolkit for source code parsing, database creation, and database query  commercial grade front end parsers  portable IL analyzer, database format, and access API  open software approach for tool development  Target and integrate multiple source languages  Use in TAU to build automated performance instrumentation tools

PDT Architecture and Tools C/C++ Fortran 77/90

PDT Components  Language front end  Edison Design Group (EDG): C (C99), C++  Mutek Solutions Ltd.: F77, F90  creates an intermediate-language (IL) tree  IL Analyzer  processes the intermediate language (IL) tree  creates “program database” (PDB) formatted file  DUCTAPE  C++ program Database Utilities and Conversion Tools APplication Environment  processes and merges PDB files  C++ library to access the PDB for PDT applications

Including TAU Makefile - Example include /usr/tau/sgi64/lib/Makefile.tau-pthread-kcc CXX = $(TAU_CXX) CC = $(TAU_CC) CFLAGS = $(TAU_DEFS) LIBS = $(TAU_LIBS) OBJS =... TARGET= a.out TARGET: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $(LIBS).cpp.o: $(CC) $(CFLAGS) -c $< -o

TAU Makefile for PDT include /usr/tau/include/Makefile CXX = $(TAU_CXX) CC = $(TAU_CC) PDTPARSE = $(PDTDIR)/$(CONFIG_ARCH)/bin/cxxparse TAUINSTR = $(TAUROOT)/$(CONFIG_ARCH)/bin/tau_instrumentor CFLAGS = $(TAU_DEFS) LIBS = $(TAU_LIBS) OBJS =... TARGET= a.out TARGET: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $(LIBS).cpp.o: $(PDTPARSE) $< $(TAUINSTR) $*.pdb $< -o $*.inst.cpp $(CC) $(CFLAGS) -c $*.inst.cpp -o

Setup: Running Applications % setenv PROFILEDIR /home/data/experiments/profile/01 % setenv TRACEDIR/home/data/experiments/trace/01(optional) % set path=($path / /bin) % setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH\: / /lib For PAPI (1 counter): % setenv PAPI_EVENT PAPI_FP_INS For PAPI (multiplecounters): % setenv COUNTER1 PAPI_FP_INS % setenv COUNTER2 PAPI_L1_DCM % setenv COUNTER3 P_WALL_CLOCK_TIME (PAPI’s wallclock time) % mpirun –np For DyninstAPI: % a.out % tau_run a.out (instruments using default TAU library) % tau_run -XrunTAUsh-papi a.out (uses libTAUsh-papi.so)

TAU Analysis  Profile analysis  pprof  parallel profiler with text-based display  racy  graphical interface to pprof (Tcl/Tk)  jracy  Java implementation of Racy  Trace analysis and visualization  Trace merging and clock adjustment (if necessary)  Trace format conversion (ALOG, SDDF, Vampir)  Vampir (Pallas) trace visualization

Pprof Command  pprof [-c|-b|-m|-t|-e|-i] [-r] [-s] [-n num] [-f file] [-l] [nodes]  -cSort according to number of calls  -bSort according to number of subroutines called  -mSort according to msecs (exclusive time total)  -tSort according to total msecs (inclusive time total)  -eSort according to exclusive time per call  -iSort according to inclusive time per call  -vSort according to standard deviation (exclusive usec)  -rReverse sorting order  -sPrint only summary profile information  -n numPrint only first number of functions  -f fileSpecify full path and filename without node ids  -l List all functions and exit

Pprof Output (NAS Parallel Benchmark – LU)  Intel Quad PIII Xeon, RedHat, PGI F90  F90 + MPICH  Profile for: Node Context Thread  Application events and MPI events

jRacy (NAS Parallel Benchmark – LU) n: node c: context t: thread Global profiles Individual profile Routine profile across all nodes

Vampir Trace Visualization Tool  Visualization and Analysis of MPI Programs  Originally developed by Forschungszentrum Jülich  Current development by Technical University Dresden  Distributed by PALLAS, Germany 

Vampir (NAS Parallel Benchmark – LU) Timeline display Callgraph display Communications display Parallelism display

Applications: EVH1

Applications: VTF (ASCI ASAP Caltech)  C++, C, F90, Python  PDT, MPI

Applications: SAMRAI (LLNL)  C++  PDT, MPI  SAMRAI timers (groups)

Applications: Uintah (U. Utah) (500 cpus) TAU uses SCIRun [U. Utah] for visualization of performance data (online/offline)

Applications: Uintah (contd.) Scalability analysis

TAU Performance System Status  Computing platforms  IBM SP, SGI Origin, ASCI Red, Cray T3E, Compaq SC, HP, Sun, Apple, Windows, IA-32, IA-64 (Linux), Hitachi, NEC  Programming languages  C, C++, Fortran 77/90, HPF, Java  Communication libraries  MPI, PVM, Nexus, Tulip, ACLMPL, MPIJava  Thread libraries  pthread, Java,Windows, SGI sproc, Tulip, SMARTS, OpenMP  Compilers  KAI (KCC, KAP/Pro), PGI, GNU, Fujitsu, HP, Sun, Microsoft, SGI, Cray, IBM, HP, Compaq, Hitachi, NEC, Intel

Support Acknowledgement  TAU and PDT support:  Department of Energy (DOE)  DOE 2000 ACTS contract  DOE MICS contract  DOE ASCI Level 3 (LANL, LLNL)  U. of Utah DOE ASCI Level 1 subcontract  DARPA  NSF National Young Investigator (NYI) award