Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University.

Slides:



Advertisements
Similar presentations
Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, Alan Morris University of Oregon {sameer,
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Scalability Study of S3D using TAU Sameer Shende
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Profiling S3D on Cray XT3 using TAU Sameer Shende
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
TAU Performance System
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
Workshop on Performance Tools for Petascale Computing 9:30 – 10:30am, Tuesday, July 17, 2007, Snowbird, UT Sameer S. Shende
TAU Performance System Alan Morris, Sameer Shende, Allen D. Malony University of Oregon {amorris, sameer,
Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory NeuroInformatics Center University.
Workshop on Performance Tools for Petascale Computing 9:30 – 10:30am, Tuesday, July 17, 2007, Snowbird, UT Sameer S. Shende
Performance Evaluation of S3D using TAU Sameer Shende
TAU: Performance Regression Testing Harness for FLASH Sameer Shende
Scalability Study of S3D using TAU Sameer Shende
S3D: Comparing Performance of XT3+XT4 with XT4 Sameer Shende
Allen D. Malony, Sameer Shende, Robert Bell Department of Computer and Information Science Computational Science Institute, NeuroInformatics.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Performance Tools for Empirical Autotuning Allen D. Malony, Nick Chaimov, Kevin Huck, Scott Biersdorff, Sameer Shende
Allen D. Malony Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department.
Allen D. Malony, Kevin Huck Department of Computer and Information Science Performance.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
Chee Wai Lee, Allen D. Malony, Alan Morris Department of Computer and Information Science Performance Research.
Aroon Nataraj, Matthew Sottile, Alan Morris, Allen D. Malony, Sameer Shende { anataraj, matt, amorris, malony,
Allen D. Malony Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department.
Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Alexandru Calotoiu German Research School for.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Using TAU on SiCortex Alan Morris, Aroon Nataraj Sameer Shende, Allen D. Malony University of Oregon {amorris, anataraj, sameer,
John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit.
Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon
Allen D. Malony, Aroon Nataraj Department of Computer and Information Science Performance.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Kevin A. Huck Department of Computer and Information Science Performance Research Laboratory University of.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
PerfExplorer Component for Performance Data Analysis Kevin Huck – University of Oregon Boyana Norris – Argonne National Lab Li Li – Argonne National Lab.
Allen D. Malony, Sameer S. Shende, Alan Morris, Robert Bell, Kevin Huck, Nick Trebon, Suravee Suthikulpanit, Kai Li, Li Li
Allen D. Malony Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory.
Aroon Nataraj, Matthew Sottile, Alan Morris, Allen D. Malony, Sameer Shende { anataraj, matt, amorris, malony,
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
Allen D. Malony Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department.
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
Aroon Nataraj, Matthew Sottile, Alan Morris, Allen D. Malony, Sameer Shende { anataraj, matt, amorris, malony,
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Performance Tool Integration in Programming Environments for GPU Acceleration: Experiences with TAU and HMPP Allen D. Malony1,2, Shangkar Mayanglambam1.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Introduction to the TAU Performance System®
Performance Technology for Scalable Parallel Systems
TAUmon: Scalable Online Performance Data Analysis in TAU
TAU integration with Score-P
Allen D. Malony, Sameer Shende
TAU Parallel Performance System
Allen D. Malony Computer & Information Science Department
Outline Introduction Motivation for performance mapping SEAA model
TAU Performance DataBase Framework (PerfDBF)
Presentation transcript:

Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University of Oregon Performance Technology for Productive, High Performance Computing: the TAU Performance System TM

SC072 TAU Parallel Performance System Acknowledgements  Dr. Sameer Shende, Director, Performance Research Lab  Alan Morris, Senior software engineer  Wyatt Spear, Software engineer  Scott Biersdorff, Software engineer  Dr. Matt Sottile, Research Faculty  Kevin Huck, Ph.D. student  Aroon Nataraj, Ph.D. student  Shangkar Mayanglambam, Ph.D. student  Brad Davidson, Systems administrator

SC073 TAU Parallel Performance System Outline  Overview of features  Instrumentation and measurement  Analysis tools  Parallel profile analysis (ParaProf)  Performance data management (PerfDMF)  Performance data mining (PerfExplorer)  TAU Portal  Kernel and systems monitoring  KTAU, TAUoverSupermon, TAUoverMRNet  Application examples  Demos at DOE NNSA/ASC booth

SC074 TAU Parallel Performance System TAU Performance System Project  Tuning and Analysis Utilities (15+ year project effort)  Performance system framework for HPC systems  Integrated, scalable, and flexible  Target parallel programming paradigms  Integrated toolkit for performance problem solving  Instrumentation, measurement, analysis, and visualization  Portable performance profiling and tracing facility  Performance data management and data mining  Partners  LLNL, ANL, LANL  Research Centre Jülich, TU Dresden

SC075 TAU Parallel Performance System TAU Parallel Performance System Goals  Portable (open source) parallel performance system  Computer system architectures and operating systems  Different programming languages and compilers  Multi-level, multi-language performance instrumentation  Flexible and configurable performance measurement  Support for multiple parallel programming paradigms  Multi-threading, message passing, mixed-mode, hybrid, object oriented (generic), component-based  Support for performance mapping  Integration of leading performance technology  Scalable (very large) parallel performance analysis

SC076 TAU Parallel Performance System TAU Performance System Components TAU Architecture Program Analysis Parallel Profile Analysis PDT PerfDMF ParaProf Performance Data Mining Performance Monitoring TAUoverSupermon PerfExplorer

SC077 TAU Parallel Performance System TAU Performance System Architecture

SC078 TAU Parallel Performance System TAU Performance System Architecture

SC079 TAU Parallel Performance System Building Bridges to Other Tools

SC0710 TAU Parallel Performance System TAU Instrumentation Approach  Support for standard program events  Routines, classes and templates  Statement-level blocks  Begin/End events (Interval events)  Support for user-defined events  Begin/End events specified by user  Atomic events (e.g., size of memory allocated/freed)  Selection of event statistics  Support definition of “semantic” entities for mapping  Support for event groups (aggregation, selection)  Instrumentation optimization  Eliminate instrumentation in lightweight routines

SC0711 TAU Parallel Performance System TAU Instrumentation Mechanisms  Source code  Manual (TAU API, TAU component API)  Automatic (robust)  C, C++, F77/90/95 (Program Database Toolkit (PDT))  OpenMP (directive rewriting (Opari), POMP2 spec)  Object code  Pre-instrumented libraries (e.g., MPI using PMPI)  Statically-linked and dynamically-linked  Executable code  Binary and dynamic instrumentation (Dyninst)  Virtual machine instrumentation (e.g., Java using JVMPI)  TAU_COMPILER to automate instrumentation process

SC0712 TAU Parallel Performance System User-level abstractions problem domain source code object codelibraries instrumentation executable runtime image compiler linkerOS VM instrumentation performance data run preprocessor Multi-Level Instrumentation and Mapping  Multiple interfaces  Information sharing  Between interfaces  Event selection  Within/between levels  Mapping  Associate performance data with high-level semantic abstractions

SC0713 TAU Parallel Performance System TAU Measurement Approach  Portable and scalable parallel profiling solution  Multiple profiling types and options  Event selection and control (enabling/disabling, throttling)  Online profile access and sampling  Online performance profile overhead compensation  Portable and scalable parallel tracing solution  Trace translation to OTF, EPILOG, Paraver, and SLOG2  Trace streams (OTF) and hierarchical trace merging  Robust timing and hardware performance support  Multiple counters (hardware, user-defined, system)  Performance measurement for CCA component software

SC0714 TAU Parallel Performance System TAU Measurement Mechanisms  Parallel profiling  Function-level, block-level, statement-level  Supports user-defined events and mapping events  TAU parallel profile stored (dumped) during execution  Support for flat, callgraph/callpath, phase profiling  Support for memory profiling (headroom, malloc/leaks)  Support for tracking I/O (wrappers, read/write/print calls)  Tracing  All profile-level events  Inter-process communication events  Inclusion of multiple counter data in traced events

SC0715 TAU Parallel Performance System Types of Parallel Performance Profiling  Flat profiles  Metric (e.g., time) spent in an event (callgraph nodes)  Exclusive/inclusive, # of calls, child calls  Callpath profiles (Calldepth profiles)  Time spent along a calling path (edges in callgraph)  “main=> f1 => f2 => MPI_Send” (event name)  TAU_CALLPATH_DEPTH environment variable  Phase profiles  Flat profiles under a phase (nested phases are allowed)  Default “main” phase  Supports static or dynamic (per-iteration) phases

SC0716 TAU Parallel Performance System Performance Analysis and Visualization  Analysis of parallel profile and trace measurement  Parallel profile analysis (ParaProf)  Java-based analysis and visualization tool  Support for large-scale parallel profiles  Performance data management framework (PerfDMF)  Parallel trace analysis  Translation to VTF (V3.0), EPILOG, OTF formats  Integration with Vampir / Vampir Server (TU Dresden)  Profile generation from trace data  Online parallel analysis and visualization  Integration with CUBE browser (KOJAK, UTK, FZJ)

SC0717 TAU Parallel Performance System ParaProf Parallel Performance Profile Analysis

SC0718 TAU Parallel Performance System ParaProf Manager HPMToolkit MpiP TAU Raw files PerfDMF managed (database) Metadata Application Experiment Trial

SC0719 TAU Parallel Performance System ParaProf – Flat Profile (Miranda, BG/L) 8K processors node, context, thread Miranda  hydrodynamics  Fortran + MPI  LLNL Run to 64K

SC0720 TAU Parallel Performance System ParaProf – Stacked View (Miranda)

SC0721 TAU Parallel Performance System ParaProf – Callpath Profile (Flash) Flash  thermonuclear flashes  Fortran + MPI  Argonne

SC0722 TAU Parallel Performance System Comparing Effects of Multi-Core Processors  AORSA2D  Magnetized plasma simulation  Blue is single node / Red is dual core  Cray XT3 (4K cores)

SC0723 TAU Parallel Performance System Comparing FLOPS (AORSA2D, Cray XT3)  Blue is dual core / Red is single node

SC0724 TAU Parallel Performance System ParaProf – Scalable Histogram View (Miranda) 8k processors 16k processors

SC0725 TAU Parallel Performance System ParaProf – Full Profile (Miranda) 16k processors

SC0726 TAU Parallel Performance System ParaProf –Full Profile (Matmult, ANL BGP!) 256 processors

SC0727 TAU Parallel Performance System ParaProf – 3D Scatterplot (Miranda)  Each point is a “thread” of execution  A total of four metrics shown in relation  ParaProf’s visualization library  JOGL

SC0728 TAU Parallel Performance System ORNL Jaguar * Cray XT3/XT4 * 6400 cores Visualizing Hybrid Problems (S3D, XT3+XT4)  S3D combustion simulation (DOE SciDAC PERI)

SC0729 TAU Parallel Performance System Zoom View of Hybrid Execution (S3D, XT3+XT4)  Gap represents XT3 nodes  MPI_Wait takes less time, other routines take more time

SC0730 TAU Parallel Performance System 6400 cores Visualizing Hybrid Execution (S3D, XT3+XT4)  Hybrid execution  Process metadata is used to map performance to machine type  Memory speed accounts for performance difference

SC0731 TAU Parallel Performance System S3D Run on XT4 Only  Better balance across nodes  More performance uniformity

SC0732 TAU Parallel Performance System Profile Snapshots in ParaProf (Flash) Initialization Checkpointing Finalization  Profile snapshots are parallel profiles recorded at runtime  Used to highlight profile changes during execution

SC0733 TAU Parallel Performance System Profile Snapshots in ParaProf (Flash)  Filter snapshots (only show main loop iterations)

SC0734 TAU Parallel Performance System Profile Snapshots in ParaProf (Flash)  Breakdown as a percentage

SC0735 TAU Parallel Performance System Snapshot replay in ParaProf (Flash) All windows dynamically update

SC0736 TAU Parallel Performance System Profile Snapshots in ParaProf (Flash)  Follow progression of various displays through time  3D scatter plot shown below T = 0sT = 11s

SC0737 TAU Parallel Performance System Performance Data Management  Need for robust processing and storage of multiple profile performance data sets  Avoid developing independent data management solutions  Waste of resources  Incompatibility among analysis tools  Goals  Foster multi-experiment performance evaluation  Develop a common, reusable foundation of performance data storage, access and sharing  A core module in an analysis system, and/or as a central repository of performance data

SC0738 TAU Parallel Performance System PerfDMF Approach  Performance Data Management Framework  Originally designed to address critical TAU requirements  Broader goal is to provide an open, flexible framework to support common data management tasks  Extensible toolkit to promote integration and reuse across available performance tools  Supported profile formats: TAU, CUBE (Kojak), Dynaprof, HPC Toolkit (Rice), HPM Toolkit (IBM), gprof, mpiP, psrun (PerfSuite), Open|SpeedShop, …  Supported DBMS: PostgreSQL, MySQL, Oracle, DB2, Derby/Cloudscape  Profile query and analysis API

SC0739 TAU Parallel Performance System PerfDMF Architecture K. Huck, A. Malony, R. Bell, A. Morris, “Design and Implementation of a Parallel Performance Data Management Framework,” ICPP 2005.

SC0740 TAU Parallel Performance System Metadata Collection  Integration of XML metadata for each profile  Three ways to incorporate metadata  Measured hardware/system information (TAU, PERI-DB)  CPU speed, memory in GB, MPI node IDs, …  Application instrumentation (application-specific)  TAU_METADATA() used to insert any name/value pair  Application parameters, input data, domain decomposition  PerfDMF data management tools can incorporate an XML file of additional metadata  Compiler flags, submission scripts, input files, …  Metadata can be imported from / exported to PERI-DB  PERI SciDAC project (UTK, NERSC, UO, PSU, TAMU)

SC0741 TAU Parallel Performance System Metadata for Each Experiment Multiple PerfDMF DBs

SC0742 TAU Parallel Performance System Performance Data Mining  Conduct parallel performance analysis process  In a systematic, collaborative and reusable manner  Manage performance complexity  Discover performance relationship and properties  Automate process  Multi-experiment performance analysis  Large-scale performance data reduction  Summarize characteristics of large processor runs  Implement extensible analysis framework  Abstraction / automation of data mining operations  Interface to existing analysis and data mining tools

SC0743 TAU Parallel Performance System Performance Data Mining (PerfExplorer)  Performance knowledge discovery framework  Data mining analysis applied to parallel performance data  comparative, clustering, correlation, dimension reduction, …  Use the existing TAU infrastructure  TAU performance profiles, PerfDMF  Technology integration  Java API and toolkit for portability  Built on top of PerfDMF  R-project/Omegahat, Octave/Matlab statistical analysis  WEKA data mining package  JFreeChart for visualization, vector output (EPS, SVG)

SC0744 TAU Parallel Performance System Performance Data Mining (PerfExplorer v1) K. Huck and A. Malony, “PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing,” SC 2005.

SC0745 TAU Parallel Performance System Data: GYRO on various architectures Relative Comparisons (GTC, XT3, DOE PERI)  Total execution time  Timesteps per second  Relative efficiency  Relative efficiency per event  Relative speedup  Relative speedup per event  Group fraction of total  Runtime breakdown  Correlate events with total runtime  Relative efficiency per phase  Relative speedup per phase  Distribution visualizations

SC0746 TAU Parallel Performance System Cluster Analysis Data: sPPM on Frost (LLNL), 256 threads count PCA scatterplot topology minavgmax PerfDMF databases

SC0747 TAU Parallel Performance System Correlation Analysis Data: FLASH on BGL(LLNL), 64 nodes Strong negative linear correlation between CALC_CUT_BLOCK_CONTRIBUTIONS and MPI_Barrier

SC0748 TAU Parallel Performance System 4-D Visualization 4 “significant” events are selected clusters and correlations are visible Data: FLASH on BG/L (LLNL), 1024 nodes

SC0749 TAU Parallel Performance System PerfExplorer v2 – Requirements and Features  Component-based analysis process  Analysis operations implemented as modules  Linked together in analysis process and workflow  Scripting  Provides process/workflow development and automation  Metadata input, management, and access  Inference engine  Reasoning about causes of performance phenomena  Analysis knowledge captured in expert rules  Persistence of intermediate results  Provenance  Provides historical record of analysis results

SC0750 TAU Parallel Performance System PerfExplorer v2: Architecture and Interaction

SC0751 TAU Parallel Performance System TAU Integration with IDEs  High performance software development environments  Tools may be complicated to use  Interfaces and mechanisms differ between platforms / OS  Integrated development environments  Consistent development environment  Numerous enhancements to development process  Standard in industrial software development  Integrated performance analysis  Tools limited to single platform or programming language  Rarely compatible with 3rd party analysis tools  Little or no support for parallel projects

SC0752 TAU Parallel Performance System TAU and Eclipse  Provide an interface for configuring TAU’s automatic instrumentation within Eclipse’s build system  Manage runtime configuration settings and environment variables for execution of TAU instrumented programs C/C++/Fortran Project in Eclipse Add or modify an Eclipse build configuration w/ TAU Temporary copy of instrumented code Compilation/linking with TAU libraries TAU instrumented libraries Program execution Performance data Program output

SC0753 TAU Parallel Performance System TAU and Eclipse PerfDMF

SC0754 TAU Parallel Performance System TAU Portal  Web-based access to TAU  Support collaborative performance study  Secure performance data sharing  Does not require TAU installation  Launch TAU performance tools  ParaProf, PerfExplorer  FLASH regression testing  Nightly regression testcases  Uploaded to the database automatically  Interactive review of performance through TAU portal  Multi-experiment analysis

SC0755 TAU Parallel Performance System Portal: Nightly Performance Regression Testing

SC0756 TAU Parallel Performance System TAU Portal: Launch ParaProf/PerfExplorer

SC0757 TAU Parallel Performance System PerfExplorer: Regression Testing

SC0758 TAU Parallel Performance System PerfExplorer: Limiting Events (> 3% ), Oct 2007

SC0759 TAU Parallel Performance System PerfExplorer: Exclusive Time for Events (2007)

SC0760 TAU Parallel Performance System Full System Performance: the KTAU Project  Trend toward extremely large scales  System-level influences are increasingly dominant performance bottleneck contributors  Application sensitivity at scale to the system  Complex I/O path and subsystems another example  Isolating system-level factors non-trivial  OS Kernel instrumentation and measurement is important to understanding system-level influences  How to correlate application and OS performance?  KTAU / TAU (Part of the ANL/UO ZeptoOS Project) A. Nataraj, A. Malony, S. Shende, and A. Morris, “Kernel-level Measurement for Integrated Performance Views: the KTAU Project,” Cluster 2006.

SC0761 TAU Parallel Performance System KTAU System Architecture

SC0762 TAU Parallel Performance System A. Nataraj, A. Morris, A. Malony, M. Sottile, and P. Beckman, “The Ghost in the Machine: Observing the Effects of Kernel Operation on Parallel Application Performance,” SC’07. Wednesday, 10:30-12:00. Applying KTAU+TAU  How does real OS-noise affect real applications?  Requires OS + application performance measurement  Estimate application slowdown due to noise components  interrupts and scheduling are significant  Performance of multi-layered I/O systems  Requires measurement and analysis of multi-component I/O subsystems in system  Tracking of I/O “long path” and assignment to application  Working with Argonne on PVFS2

SC0763 TAU Parallel Performance System TAU Monitoring  Runtime access to parallel performance data  Monitoring modes  Offline / Post-mortem observation and analysis  least requirements for a specialized transport  Online observation  long running applications, especially at scale  dumping to file-system can be suboptimal  Online observation with feedback into application  TAUoverSupermon (Sottile and Minnich, LANL)  TAUoverMRNET (Arnold and Miller, UWisconsin) A. Nataraj, M. Sottile, A. Morris, A. Malony, and S. Shende, “TAUoverSupermon: Low- overhead Online Parallel Performance Monitoring,” Euro-Par 2007.

SC0764 TAU Parallel Performance System Project Affiliations (selected)  Lawrence Livermore National Lab  Hydrodynamics (Miranda), radiation diffusion (KULL)  Open Trace Format (OTF) implementation on BG/L  Argonne National Lab  ZeptoOS project and KTAU  Astrophysical thermonuclear flashes (Flash)  Center for Simulation of Accidental Fires and Explosion  University of Utah, ASCI ASAP Center, C-SAFE  Uintah Computational Framework (UCF)  Oak Ridge National Lab  Contribution to the Joule Report (S3D, AORSA3D)

SC0765 TAU Parallel Performance System Project Affiliations (continued)  Sandia National Lab  Simulation of turbulent reactive flows (S3D)  Combustion code (CFRFS)  Los Alamos National Lab  Monte Carlo transport (MCNP)  SAIC’s Adaptive Grid Eulerian (SAGE)  perflib integration (Jeff Brown)  CCSM / ESMF / WRF climate/earth/weather simulation  NSF, NOAA, DOE, NASA, …  Common component architecture (CCA) integration  Performance Engineering Research Institute (PERI)

SC0766 TAU Parallel Performance System Support Acknowledgements  Department of Energy (DOE)  Office of Science  MICS, Argonne National Lab  ASC/NNSA  University of Utah ASC/NNSA Level 1  ASC/NNSA, Lawrence Livermore National Lab  Department of Defense (DoD)  HPC Modernization Office (HPCMO)  NSF Software and Tools for High-End Computing  Research Centre Juelich  Los Alamos National Laboratory  TU Dresden  ParaTools, Inc.