Sameer Shende, Allen D. Malony {sameer, Department of Computer and Information Science Computational Science Institute University.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Sameer Shende Department of Computer and Information Science NeuroInformatics Center University of Oregon Generating Proxy Components.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Allen D. Malony Department of Computer and Information Science University of Oregon Performance Technology for Scientific (Parallel.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Grouping Performance Data in TAU  Profile Groups  A group of related routines forms a profile group  Statically defined  TAU_DEFAULT, TAU_USER[1-5],
Recent Advances in the TAU Performance System Sameer Shende, Allen D. Malony University of Oregon.
CCA Common Component Architecture Performance Technology for Component Software - TAU Allen D. Malony (U. Oregon) Sameer Shende (U. Oregon) Craig Rasmussen.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Profiling S3D on Cray XT3 using TAU Sameer Shende
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon Integrating Performance.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Allen D. Malony, Sameer Shende, Robert Bell Department of Computer and Information Science Computational Science Institute, NeuroInformatics.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Performance Observation Sameer Shende and Allen D. Malony cs.uoregon.edu.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
Allen D. Malony, Sameer S. Shende, Alan Morris, Robert Bell, Kevin Huck, Nick Trebon, Suravee Suthikulpanit, Kai Li, Li Li
Allen D. Malony, Sameer Shende, Li Li, Kevin Huck Department of Computer and Information Science Performance.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck Department of Computer.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
CCA Common Component Architecture CCA Forum Tutorial Working Group Common Component Architecture.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon Integrating Performance.
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Performance Technology for Scalable Parallel Systems
Parallel Programming By J. H. Wang May 2, 2017.
TAU integration with Score-P
Allen D. Malony, Sameer Shende
TAU Parallel Performance System
TAU Parallel Performance System
TAU: A Framework for Parallel Performance Analysis
Introduction to Apache
Performance Technology for Parallel Component Software
Allen D. Malony Computer & Information Science Department
Outline Introduction Motivation for performance mapping SEAA model
Allen D. Malony, Sameer Shende
Parallel Program Analysis Framework for the DOE ACTS Toolkit
TAU Performance DataBase Framework (PerfDBF)
Generating Proxy Components using PDT
Presentation transcript:

Sameer Shende, Allen D. Malony {sameer, Department of Computer and Information Science Computational Science Institute University of Oregon Recent Advances in the TAU Performance System

University of Utah 2 Oct. 29, 2002 Outline  Introduction to TAU and PDT  New features  Instrumentation  CCA  Integration of Uintah and TAU  Performance Monitoring Framework  Performance Tracking and Reporting: XPARE  Performance Database Framework  Work in Progress  Conclusions

University of Utah 3 Oct. 29, 2002 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed high- performance computing  Targets a general complex system computation model  nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated toolkit for performance instrumentation, measurement, analysis, and visualization  Portable, configurable performance profiling/tracing facility  Open software approach  University of Oregon, LANL, FZJ Germany 

University of Utah 4 Oct. 29, 2002 TAU Performance System Architecture EPILOG Paraver

University of Utah 5 Oct. 29, 2002 Program Database Toolkit (PDT)  Program code analysis framework for developing source- based tools  High-level interface to source code information  Integrated toolkit for source code parsing, database creation, and database query  commercial grade front end parsers  portable IL analyzer, database format, and access API  open software approach for tool development  Target and integrate multiple source languages  Use in TAU to build automated performance instrumentation tools

University of Utah 6 Oct. 29, 2002 PDT Architecture and Tools C/C++ Fortran 77/90

University of Utah 7 Oct. 29, 2002 New Features in TAU  Instrumentation  OPARI – OpenMP directive rewriting approach [POMP, FZJ]  Selective instrumentation –grouping, include/exclude lists  tau_reduce – rule based detection of high overhead lightweight routines  CCA: TAU component interface  Measurement  PAPI [UTK] – Support for multiple hardware counters/time  Callpath profiling (1-level)  Native generation of EPILOG traces [EXPERT, FZJ]  Analysis  Support for Paraver [CEPBA] trace visualizer  jracy – New Java based profile browser in TAU  Availability  Support for new platforms and compilers (NEC, Hitachi, Intel…)

University of Utah 8 Oct. 29, 2002 Instrumentation Control  Selection of which performance events to observe  Could depend on scope, type, level of interest  Could depend on instrumentation overhead  How is selection supported in instrumentation system?  No choice  Include / exclude lists (TAU)  Environment variables  Static vs. dynamic  Problem: Controlling instrumentation of small routines  High relative measurement overhead  Significant intrusion and possible perturbation

University of Utah 9 Oct. 29, 2002 Instrumentation Control: Grouping  Profile Groups  A group of related routines forms a profile group  Statically defined  TAU_DEFAULT, TAU_USER[1-5], TAU_MESSAGE, TAU_IO, …  Dynamically defined  Group name based on string “integrator”, “particles”  Runtime lookup in a map to get unique group identifier  tau_instrumentor file.pdb file.cpp –o file.i.cpp -g “particles” Assigns all routines in file.cpp to group “particles”  Ability to change group names at runtime  Instrumentation control based on profile groups

University of Utah 10 Oct. 29, 2002 TAU Instrumentation Control API  Enabling Profile Groups  TAU_ENABLE_INSTRUMENTATION(); // Global control  TAU_ENABLE_GROUP(TAU_GROUP); // statically defined  TAU_ENABLE_GROUP_NAME(“group name”); // dynamic  TAU_ENABLE_ALL_GROUPS(); // for all groups  Disabling Profile Groups  TAU_DISABLE_INSTRUMENTATION();  TAU_DISABLE_GROUP(TAU_GROUP);  TAU_DISABLE_GROUP_NAME();  TAU_DISABLE_ALL_GROUPS();  Obtaining Profile Group Identifier  TAU_GET_PROFILE_GROUP(“group name”);  Runtime Switching of Profile Groups  TAU_PROFILE_SET_GROUP(TAU_GROUP);  TAU_PROFILE_SET_GROUP_NAME(“group name”);

University of Utah 11 Oct. 29, 2002 TAU Pre-execution Instrumentation Control  Dynamic groups defined at file scope  Group names and group associations may be modified at runtime  Controlling groups at pre-execution time using --profile option % tau_instrumentor app.pdb app.cpp –o app.i.cpp –g “particles” % mpirun –np 4 application –profile particles+field+mesh+io  Enables instrumentation for TAU_DEFAULT and particles, field, mesh and io groups.  Examples:  POOMA v1 (LANL)  Static groups used  VTF (ASAP Caltech)  Dynamic execution instrumentation control by python based controller

University of Utah 12 Oct. 29, 2002 Selective Instrumentation: Include/Exclude Lists % tau_instrumentor Usage : tau_instrumentor [-o ] [-noinline] [-g groupname] [-i headerfile] [-c|-c++|-fortran] [-f ] For selective instrumentation, use –f option % cat selective.dat # Selective instrumentation: Specify an exclude/include list. BEGIN_EXCLUDE_LIST void quicksort(int *, int, int) void sort_5elements(int *) void interchange(int *, int *) END_EXCLUDE_LIST # If an include list is specified, the routines in the list will be the only # routines that are instrumented. # To specify an include list (a list of routines that will be instrumented) # remove the leading # to uncomment the following lines #BEGIN_INCLUDE_LIST #int main(int, char **) #int select_ #END_INCLUDE_LIST

University of Utah 13 Oct. 29, 2002 Rule-Based Overhead Analysis (N. Trebon, UO)  Analyze the performance data to determine events with high (relative) overhead performance measurements  Create a select list for excluding those events  Rule grammar (used in tau_reduce tool) [GroupName:] Field Operator Number  GroupName indicates rule applies to events in group  Field is a event metric attribute (from profile statistics)  numcalls, numsubs, percent, usec, cumusec, count [PAPI], totalcount, stdev, usecs/call, counts/call  Operator is one of >, <, or =  Number is any number  Compound rules possible using & between simple rules

University of Utah 14 Oct. 29, 2002 Example Rules  #Exclude all events that are members of TAU_USER #and use less than 1000 microseconds TAU_USER:usec < 1000  #Exclude all events that have less than 100 #microseconds and are called only once usec < 1000 & numcalls = 1  #Exclude all events that have less than 1000 usecs per #call OR have a (total inclusive) percent less than 5 usecs/call < 1000 percent < 5  Scientific notation can be used  usec>1000 & numcalls> & usecs/call 25

University of Utah 15 Oct. 29, 2002 CCA: Extended Component Design  PKC: Performance Knowledge Component  POC: Performance Observability Component generic component

University of Utah 16 Oct. 29, 2002 Design of Performance Observation Component Performance Component  One performance component per context  Performance component provides a Measurement Port  Measurement Port allows a user to create and access:  Timer (start/stop, set name/type/group)  Event (trigger)  Control (enable/disable groups)  Query (get functions, metrics, counters, dump to disk) Timer Event Control Query Measurement Port

University of Utah 17 Oct. 29, 2002 Measurement Port in CCAFEINE namespace performance { namespace ccaports { class Measurement: public virtual classic::gov::cca::Port { public: virtual ~ Measurement (){} /* Create a Timer */ virtual performance::Timer* createTimer(void) = 0; virtual performance::Timer* createTimer(string name) = 0; virtual performance::Timer* createTimer(string name, string type) = 0; virtual performance::Timer* createTimer(string name, string type, string group) = 0; /* Create a Query interface */ virtual performance::Query* createQuery(void) = 0; /* Create a User Defined Event interface */ virtual performance::Event* createEvent(void) = 0; virtual performance::Event* createEvent(string name) = 0; /** * Create a Control interface for selectively enabling and disabling * the instrumentation based on groups */ virtual performance::Control* createControl(void) = 0; }; }

University of Utah 18 Oct. 29, 2002 Timer Class Interface namespace performance { class Timer { public: virtual ~Timer() {} /* Start the Timer. Implement these methods in * a derived class to provide required functionality. */ virtual void start(void) = 0; /* Stop the Timer.*/ virtual void stop(void) = 0; virtual void setName(string name) = 0; virtual string getName(void) = 0; virtual void setType(string name) = 0; virtual string getType(void) = 0; /**Set the group name associated with the Timer * (e.g., All MPI calls can be grouped into an "MPI" group)*/ virtual void setGroupName(string name) = 0; virtual string getGroupName(void) = 0; virtual void setGroupId(unsigned long group ) = 0; virtual unsigned long getGroupId(void) = 0; }; }

University of Utah 19 Oct. 29, 2002 Control Class Interface namespace performance { class Control { public: ~Control () { } /* Control instrumentation. Enable group Id.*/ virtual void enableGroupId(unsigned long id) = 0; /* Control instrumentation. Disable group Id. */ virtual void disableGroupId(unsigned long id) = 0; /* Control instrumentation. Enable group name. */ virtual void enableGroupName(string name) = 0; /* Control instrumentation. Disable group name.*/ virtual void disableGroupName(string name) = 0; /* Control instrumentation. Enable all groups.*/ virtual void enableAllGroups(void) = 0; /* Control instrumentation. Disable all groups.*/ virtual void disableAllGroups(void) = 0; }; }

University of Utah 20 Oct. 29, 2002 Query Class Interface namespace performance { class Query { public: virtual ~Query() {} /* Get the list of Timer names */ virtual void getTimerNames(const char **& functionList, int& numFuncs) = 0; /* Get the list of Counter names */ virtual void getCounterNames(const char **& counterList, int& numCounters) = 0; /* getTimerData. Returns lists of metrics.*/ virtual void getTimerData(const char **& inTimerList, int numTimers, double **& counterExclusive, double **& counterInclusive, int*& numCalls, int*& numChildCalls, const char **& counterNames, int& numCounters) = 0; virtual void dumpProfileData(void) = 0; virtual void dumpProfileDataIncremental(void) = 0; // timestamped dump virtual void dumpTimerNames(void) = 0; virtual void dumpTimerData(const char **& inTimerList, int numTimers) = 0; virtual void dumpTimerDataIncremental(const char **& inTimerList, int numTimers) = 0; }; }

University of Utah 21 Oct. 29, 2002 Measurement Port Implementation  TAU component implements the MeasurementPort  Implements Timer, Control, Query and Control classes  Registers the port with the CCAFEINE framework  Components target the generic MeasurementPort interface  Runtime selection of TAU component during execution  Instrumentation code independent of underlying tool  Instrumentation code independent of measurement choice  TauMeasurement_CCA port implementation uses a specific TAU measurement library

University of Utah 22 Oct. 29, 2002 Using MeasurementPort #include "ports/Measurement_CCA.h" … double MonteCarloIntegrator::integrate (double lowBound, double upBound, int count) { classic::gov::cca::Port * port; double sum = 0.0; // Get Measurement port port = frameworkServices->getPort ("MeasurementPort"); if (port) measurement_m = dynamic_cast (port); if (measurement_m == 0){ cerr createTimer( string("IntegrateTimer")); t->start(); for (int i = 0; i getRandomNumber (); sum = sum + function_m->evaluate (x); } t->stop();

University of Utah 23 Oct. 29, 2002 Using TAU Component in CCAFEINE repository get TauMeasurement repository get Driver repository get MidpointIntegrator repository get MonteCarloIntegrator repository get RandomGenerator repository get LinearFunction repository get NonlinearFunction repository get PiFunction create LinearFunction lin_func create NonlinearFunction nonlin_func create PiFunction pi_func create MonteCarloIntegrator mc_integrator create RandomGenerator rand create TauMeasurement tau connect mc_integrator RandomGeneratorPort rand RandomGeneratorPort connect mc_integrator FunctionPort nonlin_func FunctionPort connect mc_integrator MeasurementPort tau MeasurementPort create Driver driver connect driver IntegratorPort mc_integrator IntegratorPort go driver Go quit

University of Utah 24 Oct. 29, 2002 Uintah Problem Solving Environment (U.Utah)  Enhanced SCIRun PSE  Pure dataflow  component-based  Shared memory  scalable multi-/mixed-mode parallelism  Interactive only  interactive plus standalone  Design and implement Uintah component architecture  Application programmers provide  description of computation (tasks and variables)  code to perform task on single “patch” (sub-region of space)  Components for scheduling, partitioning, load balance, …  Follows Common Component Architecture (CCA) model  Design and implement Uintah Computational Framework (UCF) on top of the component architecture

University of Utah 25 Oct. 29, 2002 Performance Analysis Objectives for Uintah  Micro tuning  Optimization of simulation code (task) kernels for maximum serial performance  Scalability tuning  Identification of parallel execution bottlenecks  overheads: scheduler, data warehouse, communication  load imbalance  Adjustment of task graph decomposition and scheduling  Performance tracking  Understand performance impacts of code modifications  Throughout course of software development  C-SAFE application and UCF software

University of Utah 26 Oct. 29, 2002 Uintah Task Graph (Material Point Method)  Diagram of named tasks (ovals) and data (edges)  Imminent computation  Dataflow-constrained  MPM  Newtonian material point motion time step  Solid: values defined at material point (particle)  Dashed: values defined at vertex (grid)  Prime (’): values updated during time step

University of Utah 27 Oct. 29, 2002 Task execution time dominates (what task?) MPI communication overheads (where?) Task Execution in Uintah Parallel Scheduler  Profile methods and functions in scheduler and in MPI library Task execution time distribution per process  Need to map performance data!

University of Utah 28 Oct. 29, 2002 Performance Data Mapping using TAU  Two level mappings:  Level 1:  Level 2:  Embedded association vs External association Data (object) Performance Data... Hash Table

University of Utah 29 Oct. 29, 2002 Task Performance Mapping Instrumentation void MPIScheduler::execute(const ProcessorGroup * pc, DataWarehouseP & old_dw, DataWarehouseP & dw ) {... TAU_MAPPING_CREATE( task->getName(), "[MPIScheduler::execute()]", (TauGroup_t)(void*)task->getName(), task->getName(), 0);... TAU_MAPPING_OBJECT(tautimer) TAU_MAPPING_LINK(tautimer,(TauGroup_t)(void*)task->getName()); // EXTERNAL ASSOCIATION... TAU_MAPPING_PROFILE_TIMER(doitprofiler, tautimer, 0) TAU_MAPPING_PROFILE_START(doitprofiler,0); task->doit(pc); TAU_MAPPING_PROFILE_STOP(0);... }

University of Utah 30 Oct. 29, 2002 Task Performance Mapping (Profile) Performance mapping for different tasks Mapped task performance across processes

University of Utah 31 Oct. 29, 2002 Performance Mapping using Tasks and Patches

University of Utah 32 Oct. 29, 2002 Task Performance Mapping (Trace) Work packet computation events colored by task type Distinct phases of computation can be identifed based on task

University of Utah 33 Oct. 29, 2002 Task Performance Mapping (Trace - Zoom) Startup communication imbalance

University of Utah 34 Oct. 29, 2002 Task Performance Mapping (Trace - Parallelism) Communication / load imbalance

University of Utah 35 Oct. 29, 2002 Comparing Uintah Traces for Scalability Analysis 8 processes 32 processes

University of Utah 36 Oct. 29, 2002 Performance Monitoring Framework (K. Li) Application Performance Steering Performance Visualizer Performance Analyzer Performance Data Reader TAU Performance System Performance Data Integrator SCIRun || performance data streams || performance data output file system sample sequencing reader synchronization

University of Utah 37 Oct. 29, D Field Performance Visualization in SCIRun SCIRun program

University of Utah 38 Oct. 29, D Field Performance Visualization in SCIRun SCIRun program

University of Utah 39 Oct. 29, 2002 Uintah Computational Framework (UCF)  University of Utah  UCF analysis  Scheduling  MPI library  components  500 processes  Use for online and offline visualization  Incorporate steering

University of Utah 40 Oct. 29, 2002 Performance Tracking and Reporting  Integrated performance measurement allows performance analysis throughout development lifetime  Applied performance engineering in software design and development (software engineering) process  Create “performance portfolio” from regular performance experimentation (couple with software testing)  Use performance knowledge in making key software design decision, prior to major development stages  Use performance benchmarking and regression testing to identify irregularities  Support automatic reporting of “performance bugs”  Enable cross-platform (cross-generation) evaluation

University of Utah 41 Oct. 29, 2002 XPARE - eXPeriment Alerting and REporting  Experiment launcher automates measurement / analysis  Configuration and compilation of performance tools  Instrumentation control for Uintah experiment type  Execution of multiple performance experiments  Performance data collection, analysis, and storage  Integrated in Uintah software testing harness  Reporting system conducts performance regression tests  Apply performance difference thresholds (alert ruleset)  Alerts users via if thresholds have been exceeded  Web alerting setup and full performance data reporting  Historical performance data analysis

University of Utah 42 Oct. 29, 2002 XPARE System Architecture (A. Morris, Dav) Experiment Launch Mail server Performance Database Performance Reporter Comparison Tool Regression Analyzer Alerting Setup Web server

University of Utah 43 Oct. 29, 2002 Experiment Results Viewing Selection

University of Utah 44 Oct. 29, 2002 Web-Based Experiment Reporting

University of Utah 45 Oct. 29, 2002 Web-Based Experiment Reporting (continued)

University of Utah 46 Oct. 29, 2002 Alerting Setup

University of Utah 47 Oct. 29, 2002 TAU Performance Database Framework (Li Li) Performance analysis programs Performance analysis and query toolkit  profile data only  XML representation  project / experiment / trial PerfDML translators... ORDB PostgreSQL PerfDB Performance data description Raw performance data

Argonne CCA Meeting 48 June 24, 2002 TAU Status  Instrumentation supported:  Source, preprocessor, compiler, MPI, runtime, virtual machine  Languages supported:  C++, C, F90, Java, Python  HPF, ZPL, HPC++, pC++...  Packages supported:  PAPI [UTK], PCL [FZJ] (hardware performance counter access),  Opari, PDT [UO,LANL,FZJ], DyninstAPI [U.Maryland] (instrumentation),  EXPERT, EPILOG[FZJ],Vampir[Pallas], Paraver [CEPBA] (visualization)  Platforms supported:  IBM SP, SGI Origin, Sun, HP Superdome, HP/Compaq Tru64 ES,  Linux clusters (IA-32, IA-64, PowerPC, Alpha), Apple, Windows,  Hitachi SR8000, NEC SX, Cray T3E...  Compilers suites supported:  GNU, Intel KAI (KCC, KAP/Pro), Intel, SGI, IBM, Compaq,HP, Fujitsu, Hitachi, Sun, Apple, Microsoft, NEC, Cray, PGI, Absoft, …  Thread libraries supported:  Pthreads, SGI sproc, OpenMP, Windows, Java, SMARTS

Argonne CCA Meeting 49 June 24, 2002 Work in Progress  Instrumentation of individual tasks  SCIRun based online performance data monitoring  Integration of XPARE with performance database framework  Support for complex SQL queries  Instrumentation of mixed mode (MPI+threads) Uintah executions  Instrumentation of Uintah CCA components using TAU CCA interface

Argonne CCA Meeting 50 June 24, 2002 Concluding Remarks  Modern scientific simulation environments involves a complex (scientific) software engineering process  Iterative, diverse expertise, multiple teams, concurrent  Complex parallel software and systems pose challenging performance analysis problems that require flexible and robust performance technology and methods  Cross-platform, cross-language, large-scale  Fully-integrated performance analysis system  Performance mapping  Need to support performance engineering methodology within scientific software design and development  Performance comparison and tracking

Argonne CCA Meeting 51 June 24, 2002 Acknowledgements  Department of Energy (DOE), ASCI Academic Strategic Alliances Program (ASAP)  Center for the Simulation of Accidental Fires and Explosions (C-SAFE), ASCI/ASAP Level 1 center, University of Utah  Computational Science Institute, ASCI/ASAP Level 3 projects with LLNL / LANL, University of Oregon  ftp://ftp.cs.uoregon.edu/pub/malony/Talks/ishpc2002.ppt