SC’01 Tutorial Nov. 7, 2001 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon The TAU Performance.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Modern Compiler Internal Representations Silvius Rus 1/23/2002.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Performance Technology for Complex Parallel Systems Sameer Shende University of Oregon.
Tuning and Analysis Utilities Sameer Shende University of Oregon.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Profiling S3D on Cray XT3 using TAU Sameer Shende
TAU: Tuning and Analysis Utilities. TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee Sameer Shende, and.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
TAU Parallel Performance System DOD UGC 2004 Tutorial Part 1: TAU Overview and Architecture.
Object Based Operating Systems1 Learning Objectives Object Orientation and its benefits Controversy over object based operating systems Object based operating.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Performance Observation Sameer Shende and Allen D. Malony cs.uoregon.edu.
Hossein Bastan Isfahan University of Technology 1/23.
Performance Technology for Complex Parallel Systems Part 2 – Complexity Scenarios Sameer Shende.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.
@2011 Mihail L. Sichitiu1 Android Introduction Platform Overview.
UNIX SVR4 COSC513 Zhaohui Chen Jiefei Huang. UNIX SVR4 UNIX system V release 4 is a major new release of the UNIX operating system, developed by AT&T.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
VAMPIR. Visualization and Analysis of MPI Resources Commercial tool from PALLAS GmbH VAMPIRtrace - MPI profiling library VAMPIR - trace visualization.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
SvPablo Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red:
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Source: Operating System Concepts by Silberschatz, Galvin and Gagne.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Allen D. Malony, Sameer S. Shende, Alan Morris, Robert Bell, Kevin Huck, Nick Trebon, Suravee Suthikulpanit, Kai Li, Li Li
Preparatory Research on Performance Tools for HPC HCS Research Laboratory University of Florida November 21, 2003.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Developing Applications with the CSI Framework A General Guide.
1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://
Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck Department of Computer.
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Performance Technology for Complex Parallel Systems Part 1 – Overview and TAU Introduction Allen D. Malony.
Performance Technology for Complex Parallel Systems Part 1 – Overview and TAU Introduction Allen D. Malony.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
TAU integration with Score-P
Tutorial Outline – Part 1
Allen D. Malony, Sameer Shende
TAU Parallel Performance System
Performance Technology for Complex Parallel and Distributed Systems
A configurable binary instrumenter
TAU Parallel Performance System
TAU: A Framework for Parallel Performance Analysis
Allen D. Malony Computer & Information Science Department
Outline Introduction Motivation for performance mapping SEAA model
Performance Technology for Complex Parallel and Distributed Systems
Parallel Program Analysis Framework for the DOE ACTS Toolkit
Presentation transcript:

SC’01 Tutorial Nov. 7, 2001 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed high-performance computing  Targets a general complex system computation model  nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated toolkit for performance instrumentation, measurement, analysis, and visualization  Portable performance profiling/tracing facility  Open software approach

SC’01 Tutorial Nov. 7, 2001 General Complex System Computation Model  Node: physically distinct shared memory machine  Message passing node interconnection network  Context: distinct virtual memory space within node  Thread: execution threads (user/system) in context memory memory Node VM space Context SMP Threads node memory  … … Interconnection Network Inter-node message communication * * physical view model view

SC’01 Tutorial Nov. 7, 2001 TAU Performance System Architecture

SC’01 Tutorial Nov. 7, 2001 TAU Instrumentation  Flexible instrumentation mechanisms at multiple levels  Source code  manual  automatic using Program Database Toolkit (PDT)  Object code  pre-instrumented libraries (e.g., MPI using PMPI)  statically linked  dynamically linked  fast breakpoints (compiler generated)  Executable code  dynamic instrumentation (pre-execution) using DynInstAPI

SC’01 Tutorial Nov. 7, 2001 TAU Instrumentation (continued)  Targets common measurement interface (TAU API)  Object-based design and implementation  Macro-based, using constructor/destructor techniques  Program units: function, classes, templates, blocks  Uniquely identify functions and templates  name and type signature (name registration)  static object creates performance entry  dynamic object receives static object pointer  runtime type identification for template instantiations  C and Fortran instrumentation variants  Instrumentation and measurement optimization

SC’01 Tutorial Nov. 7, 2001 Program Database Toolkit (PDT)  Program code analysis framework for developing source- based tools  High-level interface to source code information  Integrated toolkit for source code parsing, database creation, and database query  commercial grade front end parsers  portable IL analyzer, database format, and access API  open software approach for tool development  Target and integrate multiple source languages  Use in TAU to build automated performance instrumentation tools

SC’01 Tutorial Nov. 7, 2001 PDT Architecture and Tools C/C++ Fortran 77/90

SC’01 Tutorial Nov. 7, 2001 PDT Components  Language front end  Edison Design Group (EDG): C, C++, Java  Mutek Solutions Ltd.: F77, F90  creates an intermediate-language (IL) tree  IL Analyzer  processes the intermediate language (IL) tree  creates “program database” (PDB) formatted file  DUCTAPE (Bernd Mohr, ZAM, Germany)  C++ program Database Utilities and Conversion Tools APplication Environment  processes and merges PDB files  C++ library to access the PDB for PDT applications

SC’01 Tutorial Nov. 7, 2001 TAU Measurement  Performance information  High-resolution timer library (real-time / virtual clocks)  General software counter library (user-defined events)  Hardware performance counters  PCL (Performance Counter Library) (ZAM, Germany)  PAPI (Performance API) (UTK, Ptools Consortium)  consistent, portable API  Organization  Node, context, thread levels  Profile groups for collective events (runtime selective)  Performance data mapping between software levels

SC’01 Tutorial Nov. 7, 2001 TAU Measurement (continued)  Parallel profiling  Function-level, block-level, statement-level  Supports user-defined events  TAU parallel profile database  Function callstack  Hardware counts values (in replace of time)  Tracing  All profile-level events  Interprocess communication events  Timestamp synchronization  User-configurable measurement library (user controlled)

SC’01 Tutorial Nov. 7, 2001 TAU Measurement API  Initialization and runtime configuration  TAU_PROFILE_INIT(argc, argv); TAU_PROFILE_SET_NODE(myNode); TAU_PROFILE_SET_CONTEXT(myContext); TAU_PROFILE_EXIT(message);  Function and class methods  TAU_PROFILE(name, type, group);  Template  TAU_TYPE_STRING(variable, type); TAU_PROFILE(name, type, group); CT(variable);  User-defined timing  TAU_PROFILE_TIMER(timer, name, type, group); TAU_PROFILE_START(timer); TAU_PROFILE_STOP(timer);

SC’01 Tutorial Nov. 7, 2001 TAU Measurement API (continued)  User-defined events  TAU_REGISTER_EVENT(variable, event_name); TAU_EVENT(variable, value); TAU_PROFILE_STMT(statement);  Mapping  TAU_MAPPING(statement, key); TAU_MAPPING_OBJECT(funcIdVar); TAU_MAPPING_LINK(funcIdVar, key);  TAU_MAPPING_PROFILE (funcIdVar); TAU_MAPPING_PROFILE_TIMER(timer, funcIdVar); TAU_MAPPING_PROFILE_START(timer); TAU_MAPPING_PROFILE_STOP(timer);  Reporting  TAU_REPORT_STATISTICS(); TAU_REPORT_THREAD_STATISTICS();

SC’01 Tutorial Nov. 7, 2001 TAU Analysis  Profile analysis  Pprof  parallel profiler with text-based display  Racy  graphical interface to pprof (Tcl/Tk)  jRacy  Java implementation of Racy  Trace analysis and visualization  Trace merging and clock adjustment (if necessary)  Trace format conversion (ALOG, SDDF, Vampir)  Vampir (Pallas) trace visualization

SC’01 Tutorial Nov. 7, 2001 Pprof Command  pprof [-c|-b|-m|-t|-e|-i] [-r] [-s] [-n num] [-f file] [-l] [nodes]  -cSort according to number of calls  -bSort according to number of subroutines called  -mSort according to msecs (exclusive time total)  -tSort according to total msecs (inclusive time total)  -eSort according to exclusive time per call  -iSort according to inclusive time per call  -vSort according to standard deviation (exclusive usec)  -rReverse sorting order  -sPrint only summary profile information  -n numPrint only first number of functions  -f fileSpecify full path and filename without node ids  -l nodesList all functions and exit (prints only info about all contexts/threads of given node numbers)

SC’01 Tutorial Nov. 7, 2001 Pprof Output (NAS Parallel Benchmark – LU)  Intel Quad PIII Xeon, RedHat, PGI F90  F90 + MPICH  Profile for: Node Context Thread  Application events and MPI events

SC’01 Tutorial Nov. 7, 2001 jRacy (NAS Parallel Benchmark – LU) n: node c: context t: thread Global profiles Individual profile Routine profile across all nodes

SC’01 Tutorial Nov. 7, 2001 TAU and PAPI (NAS Parallel Benchmark – LU )  Floating point operations  Replaces execution time  Only requires relinking to different measurement library

SC’01 Tutorial Nov. 7, 2001 Semantic Performance Mapping  Associate performance measurements with high-level semantic abstractions  Need mapping support in the performance measurement system to assign data correctly

SC’01 Tutorial Nov. 7, 2001 Semantic Entities/Attributes/Associations (SEAA)  New dynamic mapping scheme (S. Shende, Ph.D. thesis)  Contrast with ParaMap (Miller and Irvin)  Entities defined at any level of abstraction  Attribute entity with semantic information  Entity-to-entity associations  Two association types (implemented in TAU API)  Embedded – extends data structure of associated object to store performance measurement entity  External – creates an external look-up table using address of object as the key to locate performance measurement entity

SC’01 Tutorial Nov. 7, 2001 TAU Performance System Status  Computing platforms  IBM SP, SGI Origin 2K/3K, Intel Teraflop, Cray T3E, Compaq SC, HP, Sun, Windows, IA-32, IA-64, Linux, …  Programming languages  C, C++, Fortran 77/90, HPF, Java, OpenMP  Communication libraries  MPI, PVM, Nexus, Tulip, ACLMPL, MPIJava  Thread libraries  pthreads, Java,Windows, Tulip, SMARTS, OpenMP  Compilers  KAI, PGI, GNU, Fujitsu, Sun, Microsoft, SGI, Cray, IBM, Compaq

SC’01 Tutorial Nov. 7, 2001 TAU Performance System Status (continued)  Application libraries  Blitz++, A++/P++, ACLVIS, PAWS, SAMRAI, Overture  Application frameworks  POOMA, POOMA-2, MC++, Conejo, Uintah, UPS, …  Performance Projects  Aurora / SCALEA: ACPC, University of Vienna  TAU full distribution (Version 2.10, web download)  Measurement library and profile analysis tools  Automatic software installation  Performance analysis examples  Extensive TAU User’s Guide

SC’01 Tutorial Nov. 7, 2001 PDT Status  Program Database Toolkit (Version 2.0, web download)  EDG C++ front end (Version )  Mutek Fortran 90 front end (Version 2.4.1)  C++ and Fortran 90 IL Analyzer  DUCTAPE library  Standard C++ system header files (KCC Version 4.0f)  PDT-constructed tools  Automatic TAU performance instrumentation  C, C++, Fortran 77, and Fortran 90  Program analysis support for SILOON and CHASM

SC’01 Tutorial Nov. 7, 2001 Usage Scenarios  Message passing computation  Multi-threaded computation  (Abstract) thread-based performance measurement  Multi-threaded parallel execution and asynchronous RTS  Mixed-mode parallel computation  Integrate messaging events with multi-threading events  OpenMP + MPI, Java + MPI, …  Object-oriented programming and C++  Performance measurement of template-derived code  Object-based performance analysis  Hierarchical parallel software frameworks  Multi-level software framework and work scheduling

SC’01 Tutorial Nov. 7, 2001 Evolution of the TAU Performance System  TAU’s existing strength lies in its robust support for performance instrumentation and measurement  TAU will evolve to support new performance capabilities  Online performance data access via application-level API  Whole-system, integrative performance monitoring  Dynamic performance measurement control  Generalize performance mapping  Runtime performance analysis and visualization  Performance experimentation environment and database  Cross-experiment performance analysis  Three-year DOE MICS research and development grant