Allen D. Malony Department of Computer and Information Science Performance Research Laboratory.

Slides:



Advertisements
Similar presentations
Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, Alan Morris University of Oregon {sameer,
Sameer Shende, Allen D. Malony, and Alan Morris {sameer, malony, Steven Parker, and J. Davison de St. Germain {sparker,
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Scalability Study of S3D using TAU Sameer Shende
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Profiling S3D on Cray XT3 using TAU Sameer Shende
TAU Parallel Performance System DOD UGC 2004 Tutorial Allen D. Malony, Sameer Shende, Robert Bell Univesity of Oregon.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University of Oregon Performance Technology.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
Workshop on Performance Tools for Petascale Computing 9:30 – 10:30am, Tuesday, July 17, 2007, Snowbird, UT Sameer S. Shende
TAU Performance System Alan Morris, Sameer Shende, Allen D. Malony University of Oregon {amorris, sameer,
Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.
Early Experiences with KTAU on the IBM Blue Gene / L A. Nataraj, A. Malony, A. Morris, S. Shende Performance.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory NeuroInformatics Center University.
Workshop on Performance Tools for Petascale Computing 9:30 – 10:30am, Tuesday, July 17, 2007, Snowbird, UT Sameer S. Shende
Performance Evaluation of S3D using TAU Sameer Shende
TAU: Performance Regression Testing Harness for FLASH Sameer Shende
Scalability Study of S3D using TAU Sameer Shende
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Performance Tools for Empirical Autotuning Allen D. Malony, Nick Chaimov, Kevin Huck, Scott Biersdorff, Sameer Shende
Allen D. Malony Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department.
Aroon Nataraj, Matthew Sottile, Alan Morris, Allen D. Malony, Sameer Shende { anataraj, matt, amorris, malony,
Allen D. Malony Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department.
Score-P – A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir Alexandru Calotoiu German Research School for.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications Boyana Norris Argonne National Laboratory Van Bui, Lois.
Using TAU on SiCortex Alan Morris, Aroon Nataraj Sameer Shende, Allen D. Malony University of Oregon {amorris, anataraj, sameer,
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University of Oregon Performance Technology.
Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon
Allen D. Malony, Aroon Nataraj Department of Computer and Information Science Performance.
Kevin A. Huck Department of Computer and Information Science Performance Research Laboratory University of.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA This work.
Early Experiences with KTAU on the Blue Gene / L A. Nataraj, A. Malony, A. Morris, S. Shende Performance Research Lab University of Oregon.
PerfExplorer Component for Performance Data Analysis Kevin Huck – University of Oregon Boyana Norris – Argonne National Lab Li Li – Argonne National Lab.
Allen D. Malony, Sameer S. Shende, Alan Morris, Robert Bell, Kevin Huck, Nick Trebon, Suravee Suthikulpanit, Kai Li, Li Li
Allen D. Malony Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck Department of Computer.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Aroon Nataraj, Matthew Sottile, Alan Morris, Allen D. Malony, Sameer Shende { anataraj, matt, amorris, malony,
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
TAU Performance System ® TAU is a profiling and tracing toolkit that supports programs written in C, C++, Fortran, Java, Python,
Allen D. Malony Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department.
Aroon Nataraj, Matthew Sottile, Alan Morris, Allen D. Malony, Sameer Shende { anataraj, matt, amorris, malony,
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Introduction to the TAU Performance System®
Performance Technology for Scalable Parallel Systems
TAU integration with Score-P
Tutorial Outline – Part 1
Allen D. Malony, Sameer Shende
TAU Parallel Performance System
TAU Parallel Performance System
TAU: A Framework for Parallel Performance Analysis
Outline Introduction Motivation for performance mapping SEAA model
Parallel Program Analysis Framework for the DOE ACTS Toolkit
TAU Performance DataBase Framework (PerfDBF)
Presentation transcript:

Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University of Oregon Parallel Performance Technology for Scientific Application Competitiveness: the TAU Parallel Performance System Project

TAU Parallel Performance SystemASC Booth SC062 Acknowledgements  Dr. Sameer Shende, Senior scientist  Alan Morris, Senior software engineer  Wyatt Spear, Software engineer  Scott Biersdorff, Software engineer  Li Li, Ph.D. student  Kevin Huck, Ph.D. student  Aroon Nataraj, Ph.D. student  Brad Davidson, Systems administrator

TAU Parallel Performance SystemASC Booth SC063 Outline  Overview of features  Instrumentation  Measurement  Analysis tools  Parallel profile analysis (ParaProf)  Performance data management (PerfDMF)  Performance data mining (PerfExplorer)  Application examples  Miranda, Flash, ESMF, sPPM, CFRFS  Kernel monitoring and KTAU  Demos at DOE ASC booth

TAU Parallel Performance SystemASC Booth SC064 TAU Performance System  Tuning and Analysis Utilities (14+ year project effort)  Performance system framework for HPC systems  Integrated, scalable, flexible, and parallel  Targets a general complex system computation model  Entities: nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated toolkit for performance problem solving  Instrumentation, measurement, analysis, and visualization  Portable performance profiling and tracing facility  Performance data management and data mining  Partners: LLNL, ANL, Research Center Jülich, LANL

TAU Parallel Performance SystemASC Booth SC065 TAU Parallel Performance System Goals  Portable (open source) parallel performance system  Computer system architectures and operating systems  Different programming languages and compilers  Multi-level, multi-language performance instrumentation  Flexible and configurable performance measurement  Support for multiple parallel programming paradigms  Multi-threading, message passing, mixed-mode, hybrid, object oriented (generic), component-based  Support for performance mapping  Integration of leading performance technology  Scalable (very large) parallel performance analysis

TAU Parallel Performance SystemASC Booth SC066 TAU Performance System Architecture

TAU Parallel Performance SystemASC Booth SC067 TAU Performance System Architecture

TAU Parallel Performance SystemASC Booth SC068 TAU Instrumentation Approach  Support for standard program events  Routines, classes and templates  Statement-level blocks  Support for user-defined events  Begin/End events (“user-defined timers”)  Atomic events (e.g., size of memory allocated/freed)  Selection of event statistics  Support definition of “semantic” entities for mapping  Support for event groups (aggregation, selection)  Instrumentation optimization  Eliminate instrumentation in lightweight routines

TAU Parallel Performance SystemASC Booth SC069 TAU Instrumentation Mechanisms  Source code  Manual (TAU API, TAU component API)  Automatic (robust)  C, C++, F77/90/95 (Program Database Toolkit (PDT))  OpenMP (directive rewriting (Opari), POMP2 spec)  Object code  Pre-instrumented libraries (e.g., MPI using PMPI)  Statically-linked and dynamically-linked  Executable code  Dynamic instrumentation (pre-execution) (DynInstAPI)  Virtual machine instrumentation (e.g., Java using JVMPI)  TAU_COMPILER to automate instrumentation process

TAU Parallel Performance SystemASC Booth SC0610 User-level abstractions problem domain source code object codelibraries instrumentation executable runtime image compiler linkerOS VM instrumentation performance data run preprocessor Multi-Level Instrumentation and Mapping  Multiple interfaces  Information sharing  Between interfaces  Event selection  Within/between levels  Mapping  Associate performance data with high-level semantic abstractions

TAU Parallel Performance SystemASC Booth SC0611 TAU Measurement Approach  Portable and scalable parallel profiling solution  Multiple profiling types and options  Event selection and control (enabling/disabling, throttling)  Online profile access and sampling  Online performance profile overhead compensation  Portable and scalable parallel tracing solution  Trace translation to EPILOG, VTF3, and OTF  Trace streams (OTF) and hierarchical trace merging  Robust timing and hardware performance support  Multiple counters (hardware, user-defined, system)  Performance measurement for CCA component software

TAU Parallel Performance SystemASC Booth SC0612 TAU Measurement Mechanisms  Parallel profiling  Function-level, block-level, statement-level  Supports user-defined events and mapping events  TAU parallel profile stored (dumped) during execution  Support for flat, callgraph/callpath, phase profiling  Support for memory profiling (headroom, leaks)  Tracing  All profile-level events  Inter-process communication events  Inclusion of multiple counter data in traced events

TAU Parallel Performance SystemASC Booth SC0613 Types of Parallel Performance Profiling  Flat profiles  Metric (e.g., time) spent in an event (callgraph nodes)  Exclusive/inclusive, # of calls, child calls  Callpath profiles (Calldepth profiles)  Time spent along a calling path (edges in callgraph)  “main=> f1 => f2 => MPI_Send” (event name)  TAU_CALLPATH_LENGTH environment variable  Phase profiles  Flat profiles under a phase (nested phases are allowed)  Default “main” phase  Supports static or dynamic (per-iteration) phases

TAU Parallel Performance SystemASC Booth SC0614 Performance Analysis and Visualization  Analysis of parallel profile and trace measurement  Parallel profile analysis  ParaProf: parallel profile analysis and presentation  ParaVis: parallel performance visualization package  Profile generation from trace data (tau2pprof)  Performance data management framework (PerfDMF)  Parallel trace analysis  Translation to VTF (V3.0), EPILOG, OTF formats  Integration with VNG (Technical University of Dresden)  Online parallel analysis and visualization  Integration with CUBE browser (KOJAK, UTK, FZJ)

TAU Parallel Performance SystemASC Booth SC0615 ParaProf Parallel Performance Profile Analysis HPMToolkit MpiP TAU Raw files PerfDMF managed (database) Metadata Application Experiment Trial

TAU Parallel Performance SystemASC Booth SC0616 ParaProf – Flat Profile (Miranda, BG/L) 8K processors node, context, thread Miranda  hydrodynamics  Fortran + MPI  LLNL Run to 64K

TAU Parallel Performance SystemASC Booth SC0617 ParaProf – Stacked View (Miranda)

TAU Parallel Performance SystemASC Booth SC0618 ParaProf – Callpath Profile (Flash) Flash  thermonuclear flashes  Fortran + MPI  Argonne

TAU Parallel Performance SystemASC Booth SC0619 ParaProf – Scalable Histogram View (Miranda) 8k processors 16k processors

TAU Parallel Performance SystemASC Booth SC0620 ParaProf – 3D Full Profile (Miranda) 16k processors

TAU Parallel Performance SystemASC Booth SC0621 ParaProf – 3D Full Profile (Flash) 128 processors

TAU Parallel Performance SystemASC Booth SC0622 ParaProf Bar Plot (Zoom in/out +/-)

TAU Parallel Performance SystemASC Booth SC0623 ParaProf – 3D Scatterplot (Miranda)  Each point is a “thread” of execution  A total of four metrics shown in relation  ParaVis 3D profile visualization library  JOGL

TAU Parallel Performance SystemASC Booth SC0624 Component-Based Scientific Applications  How to support performance analysis and tuning process consistent with application development methodology?  Common Component Architecture (CCA) applications  Performance tools should integrate with software  Design performance observation component  Measurement port and measurement interfaces  Build support for application component instrumentation  Interpose a proxy component for each port  Inside the proxy, track caller/callee invocations, timings  Automate the process of proxy component creation  using PDT for static analysis of components  include support for selective instrumentation

TAU Parallel Performance SystemASC Booth SC0625 Flame Reaction-Diffusion (Sandia) CCAFFEINE

TAU Parallel Performance SystemASC Booth SC0626 Earth Systems Modeling Framework  Coupled modeling with modular software framework  Instrumentation for ESMF framework and applications  PDT automatic instrumentation  Fortran 95 code modules  C / C++ code modules  MPI wrapper library for MPI calls  ESMF component instrumentation (using CCA)  CCA measurement port manual instrumentation  Proxy generation using PDT and runtime interposition  Significant callpath profiling used by ESMF team

TAU Parallel Performance SystemASC Booth SC0627 Using TAU Component in ESMF/CCA

TAU Parallel Performance SystemASC Booth SC0628 TAU Traces with Hardware Counters (ESMF)

TAU Parallel Performance SystemASC Booth SC0629 Performance Data Management (PerfDMF) K. Huck, A. Malony, R. Bell, A. Morris, “Design and Implementation of a Parallel Performance Data Management Framework,” ICPP 2005.

TAU Parallel Performance SystemASC Booth SC0630 Performance Data Mining (PerfExplorer) K. Huck and A. Malony, “PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing,” SC 2005, Thursday, 11:30, Room

TAU Parallel Performance SystemASC Booth SC0631 PerfExplorer Analysis Methods  Data summaries, distributions, scatterplots  Clustering  k-means  Hierarchical  Correlation analysis  Dimension reduction  PCA  Random linear projection  Thresholds  Comparative analysis  Data management views

TAU Parallel Performance SystemASC Booth SC0632 Correlation Analysis (Flash)  Describes strength and direction of a linear relationship between two variables (events) in the data

TAU Parallel Performance SystemASC Booth SC0633 Flash Clustering on 16K BG/L Processors  Four significant events automatically selected  Clusters and correlations are visible

TAU Parallel Performance SystemASC Booth SC0634 ZeptoOS and TAU  DOE OS/RTS for Extreme Scale Scientific Computation  ZeptoOS  scalable components for petascale architectures  Argonne National Laboratory and University of Oregon  University of Oregon  Kernel-level performance monitoring  OS component performance assessment and tuning  KTAU (Kernel Tuning and Analysis Utilities)  integration of TAU infrastructure in Linux kernel  integration with ZeptoOS  installation on BG/L  Argonne booth demo/talk: T/W/Th 3:30-4:00 pm

TAU Parallel Performance SystemASC Booth SC0635 Linux Kernel Profiling using TAU – Goals  Fine-grained kernel-level performance measurement  Parallel applications  Support both profiling and tracing  Both process-centric and system-wide view  Merge user-space performance with kernel-space  User-space: (TAU) profile/trace  Kernel-space: (KTAU) profile/trace  Detailed program-OS interaction data  Including interrupts (IRQ)  Analysis and visualization compatible with TAU

TAU Parallel Performance SystemASC Booth SC0636 KTAU System Architecture A. Nataraj, A. Malony, S. Shende, and A. Morris, “Kernel-level Measurement for Integrated Performance Views: the KTAU Project,” Cluster 2006, distinguished paper.

TAU Parallel Performance SystemASC Booth SC0637 Project Affiliations (selected)  Lawrence Livermore National Lab  Hydrodynamics (Miranda), radiation diffusion (KULL)  Open Trace Format (OTF) implementation on BG/L  Argonne National Lab  ZeptoOS project and KTAU  Astrophysical thermonuclear flashes (Flash)  Center for Simulation of Accidental Fires and Explosion  University of Utah, ASCI ASAP Center, C-SAFE  Uintah Computational Framework (UCF)  Oak Ridge National Lab  Contribution to the Joule Report (S3D, AORSA3D)

TAU Parallel Performance SystemASC Booth SC0638 Project Affiliations (continued)  Sandia National Lab  Simulation of turbulent reactive flows (S3D)  Combustion code (CFRFS)  Los Alamos National Lab  Monte Carlo transport (MCNP)  SAIC’s Adaptive Grid Eulerian (SAGE)  perflib integration (Jeff Brown)  CCSM / ESMF / WRF climate/earth/weather simulation  NSF, NOAA, DOE, NASA, …  Common component architecture (CCA) integration  Performance Engineering Research Institute (PERI)

TAU Parallel Performance SystemASC Booth SC0639 TAU Performance System Status  Computing platforms  IBM, SGI, Cray, HP, Sun, Hitachi, NEC, Linux clusters, Apple, Windows, …  Programming languages  C, C++, Fortran 90/95, UPC, HPF, Java, OpenMP, Python  Thread libraries  pthreads, SGI sproc, Java,Windows, OpenMP  Communications libraries  MPI-1/2, PVM, shmem, …  Compilers  IBM, Intel, PGI, GNU, Fujitsu, Sun, NAG, Microsoft, SGI, Cray, HP, NEC, Absoft, Lahey, PathScale, Open64

TAU Parallel Performance SystemASC Booth SC0640 Support Acknowledgements  Department of Energy (DOE)  Office of Science  MICS, Argonne National Lab  ASC/NNSA  University of Utah ASC/NNSA Level 1  ASC/NNSA, Lawrence Livermore National Lab  Department of Defense (DoD)  HPC Modernization Office (HPCMO)  Programming Environment and Training (PET)  NSF Software and Tools for High-End Computing  Research Centre Juelich  Los Alamos National Laboratory  ParaTools, Inc.