Presentation is loading. Please wait.

Presentation is loading. Please wait.

Allen D. Malony Department of Computer and Information Science Performance Research Laboratory.

Similar presentations


Presentation on theme: "Allen D. Malony Department of Computer and Information Science Performance Research Laboratory."— Presentation transcript:

1 Allen D. Malony malony@cs.uoregon.edu http://www.cs.uoregon.edu/research/tau Department of Computer and Information Science Performance Research Laboratory University of Oregon Parallel Performance Technology for Scientific Application Competitiveness: the TAU Parallel Performance System Project

2 TAU Parallel Performance SystemASC Booth SC062 Acknowledgements  Dr. Sameer Shende, Senior scientist  Alan Morris, Senior software engineer  Wyatt Spear, Software engineer  Scott Biersdorff, Software engineer  Li Li, Ph.D. student  Kevin Huck, Ph.D. student  Aroon Nataraj, Ph.D. student  Brad Davidson, Systems administrator

3 TAU Parallel Performance SystemASC Booth SC063 Outline  Overview of features  Instrumentation  Measurement  Analysis tools  Parallel profile analysis (ParaProf)  Performance data management (PerfDMF)  Performance data mining (PerfExplorer)  Application examples  Miranda, Flash, ESMF, sPPM, CFRFS  Kernel monitoring and KTAU  Demos at DOE ASC booth

4 TAU Parallel Performance SystemASC Booth SC064 TAU Performance System  Tuning and Analysis Utilities (14+ year project effort)  Performance system framework for HPC systems  Integrated, scalable, flexible, and parallel  Targets a general complex system computation model  Entities: nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated toolkit for performance problem solving  Instrumentation, measurement, analysis, and visualization  Portable performance profiling and tracing facility  Performance data management and data mining  Partners: LLNL, ANL, Research Center Jülich, LANL

5 TAU Parallel Performance SystemASC Booth SC065 TAU Parallel Performance System Goals  Portable (open source) parallel performance system  Computer system architectures and operating systems  Different programming languages and compilers  Multi-level, multi-language performance instrumentation  Flexible and configurable performance measurement  Support for multiple parallel programming paradigms  Multi-threading, message passing, mixed-mode, hybrid, object oriented (generic), component-based  Support for performance mapping  Integration of leading performance technology  Scalable (very large) parallel performance analysis

6 TAU Parallel Performance SystemASC Booth SC066 TAU Performance System Architecture

7 TAU Parallel Performance SystemASC Booth SC067 TAU Performance System Architecture

8 TAU Parallel Performance SystemASC Booth SC068 TAU Instrumentation Approach  Support for standard program events  Routines, classes and templates  Statement-level blocks  Support for user-defined events  Begin/End events (“user-defined timers”)  Atomic events (e.g., size of memory allocated/freed)  Selection of event statistics  Support definition of “semantic” entities for mapping  Support for event groups (aggregation, selection)  Instrumentation optimization  Eliminate instrumentation in lightweight routines

9 TAU Parallel Performance SystemASC Booth SC069 TAU Instrumentation Mechanisms  Source code  Manual (TAU API, TAU component API)  Automatic (robust)  C, C++, F77/90/95 (Program Database Toolkit (PDT))  OpenMP (directive rewriting (Opari), POMP2 spec)  Object code  Pre-instrumented libraries (e.g., MPI using PMPI)  Statically-linked and dynamically-linked  Executable code  Dynamic instrumentation (pre-execution) (DynInstAPI)  Virtual machine instrumentation (e.g., Java using JVMPI)  TAU_COMPILER to automate instrumentation process

10 TAU Parallel Performance SystemASC Booth SC0610 User-level abstractions problem domain source code object codelibraries instrumentation executable runtime image compiler linkerOS VM instrumentation performance data run preprocessor Multi-Level Instrumentation and Mapping  Multiple interfaces  Information sharing  Between interfaces  Event selection  Within/between levels  Mapping  Associate performance data with high-level semantic abstractions

11 TAU Parallel Performance SystemASC Booth SC0611 TAU Measurement Approach  Portable and scalable parallel profiling solution  Multiple profiling types and options  Event selection and control (enabling/disabling, throttling)  Online profile access and sampling  Online performance profile overhead compensation  Portable and scalable parallel tracing solution  Trace translation to EPILOG, VTF3, and OTF  Trace streams (OTF) and hierarchical trace merging  Robust timing and hardware performance support  Multiple counters (hardware, user-defined, system)  Performance measurement for CCA component software

12 TAU Parallel Performance SystemASC Booth SC0612 TAU Measurement Mechanisms  Parallel profiling  Function-level, block-level, statement-level  Supports user-defined events and mapping events  TAU parallel profile stored (dumped) during execution  Support for flat, callgraph/callpath, phase profiling  Support for memory profiling (headroom, leaks)  Tracing  All profile-level events  Inter-process communication events  Inclusion of multiple counter data in traced events

13 TAU Parallel Performance SystemASC Booth SC0613 Types of Parallel Performance Profiling  Flat profiles  Metric (e.g., time) spent in an event (callgraph nodes)  Exclusive/inclusive, # of calls, child calls  Callpath profiles (Calldepth profiles)  Time spent along a calling path (edges in callgraph)  “main=> f1 => f2 => MPI_Send” (event name)  TAU_CALLPATH_LENGTH environment variable  Phase profiles  Flat profiles under a phase (nested phases are allowed)  Default “main” phase  Supports static or dynamic (per-iteration) phases

14 TAU Parallel Performance SystemASC Booth SC0614 Performance Analysis and Visualization  Analysis of parallel profile and trace measurement  Parallel profile analysis  ParaProf: parallel profile analysis and presentation  ParaVis: parallel performance visualization package  Profile generation from trace data (tau2pprof)  Performance data management framework (PerfDMF)  Parallel trace analysis  Translation to VTF (V3.0), EPILOG, OTF formats  Integration with VNG (Technical University of Dresden)  Online parallel analysis and visualization  Integration with CUBE browser (KOJAK, UTK, FZJ)

15 TAU Parallel Performance SystemASC Booth SC0615 ParaProf Parallel Performance Profile Analysis HPMToolkit MpiP TAU Raw files PerfDMF managed (database) Metadata Application Experiment Trial

16 TAU Parallel Performance SystemASC Booth SC0616 ParaProf – Flat Profile (Miranda, BG/L) 8K processors node, context, thread Miranda  hydrodynamics  Fortran + MPI  LLNL Run to 64K

17 TAU Parallel Performance SystemASC Booth SC0617 ParaProf – Stacked View (Miranda)

18 TAU Parallel Performance SystemASC Booth SC0618 ParaProf – Callpath Profile (Flash) Flash  thermonuclear flashes  Fortran + MPI  Argonne

19 TAU Parallel Performance SystemASC Booth SC0619 ParaProf – Scalable Histogram View (Miranda) 8k processors 16k processors

20 TAU Parallel Performance SystemASC Booth SC0620 ParaProf – 3D Full Profile (Miranda) 16k processors

21 TAU Parallel Performance SystemASC Booth SC0621 ParaProf – 3D Full Profile (Flash) 128 processors

22 TAU Parallel Performance SystemASC Booth SC0622 ParaProf Bar Plot (Zoom in/out +/-)

23 TAU Parallel Performance SystemASC Booth SC0623 ParaProf – 3D Scatterplot (Miranda)  Each point is a “thread” of execution  A total of four metrics shown in relation  ParaVis 3D profile visualization library  JOGL

24 TAU Parallel Performance SystemASC Booth SC0624 Component-Based Scientific Applications  How to support performance analysis and tuning process consistent with application development methodology?  Common Component Architecture (CCA) applications  Performance tools should integrate with software  Design performance observation component  Measurement port and measurement interfaces  Build support for application component instrumentation  Interpose a proxy component for each port  Inside the proxy, track caller/callee invocations, timings  Automate the process of proxy component creation  using PDT for static analysis of components  include support for selective instrumentation

25 TAU Parallel Performance SystemASC Booth SC0625 Flame Reaction-Diffusion (Sandia) CCAFFEINE

26 TAU Parallel Performance SystemASC Booth SC0626 Earth Systems Modeling Framework  Coupled modeling with modular software framework  Instrumentation for ESMF framework and applications  PDT automatic instrumentation  Fortran 95 code modules  C / C++ code modules  MPI wrapper library for MPI calls  ESMF component instrumentation (using CCA)  CCA measurement port manual instrumentation  Proxy generation using PDT and runtime interposition  Significant callpath profiling used by ESMF team

27 TAU Parallel Performance SystemASC Booth SC0627 Using TAU Component in ESMF/CCA

28 TAU Parallel Performance SystemASC Booth SC0628 TAU Traces with Hardware Counters (ESMF)

29 TAU Parallel Performance SystemASC Booth SC0629 Performance Data Management (PerfDMF) K. Huck, A. Malony, R. Bell, A. Morris, “Design and Implementation of a Parallel Performance Data Management Framework,” ICPP 2005.

30 TAU Parallel Performance SystemASC Booth SC0630 Performance Data Mining (PerfExplorer) K. Huck and A. Malony, “PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing,” SC 2005, Thursday, 11:30, Room 606-607.

31 TAU Parallel Performance SystemASC Booth SC0631 PerfExplorer Analysis Methods  Data summaries, distributions, scatterplots  Clustering  k-means  Hierarchical  Correlation analysis  Dimension reduction  PCA  Random linear projection  Thresholds  Comparative analysis  Data management views

32 TAU Parallel Performance SystemASC Booth SC0632 Correlation Analysis (Flash)  Describes strength and direction of a linear relationship between two variables (events) in the data

33 TAU Parallel Performance SystemASC Booth SC0633 Flash Clustering on 16K BG/L Processors  Four significant events automatically selected  Clusters and correlations are visible

34 TAU Parallel Performance SystemASC Booth SC0634 ZeptoOS and TAU  DOE OS/RTS for Extreme Scale Scientific Computation  ZeptoOS  scalable components for petascale architectures  Argonne National Laboratory and University of Oregon  University of Oregon  Kernel-level performance monitoring  OS component performance assessment and tuning  KTAU (Kernel Tuning and Analysis Utilities)  integration of TAU infrastructure in Linux kernel  integration with ZeptoOS  installation on BG/L  Argonne booth demo/talk: T/W/Th 3:30-4:00 pm

35 TAU Parallel Performance SystemASC Booth SC0635 Linux Kernel Profiling using TAU – Goals  Fine-grained kernel-level performance measurement  Parallel applications  Support both profiling and tracing  Both process-centric and system-wide view  Merge user-space performance with kernel-space  User-space: (TAU) profile/trace  Kernel-space: (KTAU) profile/trace  Detailed program-OS interaction data  Including interrupts (IRQ)  Analysis and visualization compatible with TAU

36 TAU Parallel Performance SystemASC Booth SC0636 KTAU System Architecture A. Nataraj, A. Malony, S. Shende, and A. Morris, “Kernel-level Measurement for Integrated Performance Views: the KTAU Project,” Cluster 2006, distinguished paper.

37 TAU Parallel Performance SystemASC Booth SC0637 Project Affiliations (selected)  Lawrence Livermore National Lab  Hydrodynamics (Miranda), radiation diffusion (KULL)  Open Trace Format (OTF) implementation on BG/L  Argonne National Lab  ZeptoOS project and KTAU  Astrophysical thermonuclear flashes (Flash)  Center for Simulation of Accidental Fires and Explosion  University of Utah, ASCI ASAP Center, C-SAFE  Uintah Computational Framework (UCF)  Oak Ridge National Lab  Contribution to the Joule Report (S3D, AORSA3D)

38 TAU Parallel Performance SystemASC Booth SC0638 Project Affiliations (continued)  Sandia National Lab  Simulation of turbulent reactive flows (S3D)  Combustion code (CFRFS)  Los Alamos National Lab  Monte Carlo transport (MCNP)  SAIC’s Adaptive Grid Eulerian (SAGE)  perflib integration (Jeff Brown)  CCSM / ESMF / WRF climate/earth/weather simulation  NSF, NOAA, DOE, NASA, …  Common component architecture (CCA) integration  Performance Engineering Research Institute (PERI)

39 TAU Parallel Performance SystemASC Booth SC0639 TAU Performance System Status  Computing platforms  IBM, SGI, Cray, HP, Sun, Hitachi, NEC, Linux clusters, Apple, Windows, …  Programming languages  C, C++, Fortran 90/95, UPC, HPF, Java, OpenMP, Python  Thread libraries  pthreads, SGI sproc, Java,Windows, OpenMP  Communications libraries  MPI-1/2, PVM, shmem, …  Compilers  IBM, Intel, PGI, GNU, Fujitsu, Sun, NAG, Microsoft, SGI, Cray, HP, NEC, Absoft, Lahey, PathScale, Open64

40 TAU Parallel Performance SystemASC Booth SC0640 Support Acknowledgements  Department of Energy (DOE)  Office of Science  MICS, Argonne National Lab  ASC/NNSA  University of Utah ASC/NNSA Level 1  ASC/NNSA, Lawrence Livermore National Lab  Department of Defense (DoD)  HPC Modernization Office (HPCMO)  Programming Environment and Training (PET)  NSF Software and Tools for High-End Computing  Research Centre Juelich  Los Alamos National Laboratory  ParaTools, Inc.


Download ppt "Allen D. Malony Department of Computer and Information Science Performance Research Laboratory."

Similar presentations


Ads by Google