Download presentation
Presentation is loading. Please wait.
Published byStewart Boone Modified over 9 years ago
1
Allen D. Malony malony@cs.uoregon.edu http://www.cs.uoregon.edu/research/tau Department of Computer and Information Science Performance Research Laboratory University of Oregon Parallel Performance Technology for Scientific Application Competitiveness: the TAU Parallel Performance System Project
2
TAU Parallel Performance SystemASC Booth SC062 Acknowledgements Dr. Sameer Shende, Senior scientist Alan Morris, Senior software engineer Wyatt Spear, Software engineer Scott Biersdorff, Software engineer Li Li, Ph.D. student Kevin Huck, Ph.D. student Aroon Nataraj, Ph.D. student Brad Davidson, Systems administrator
3
TAU Parallel Performance SystemASC Booth SC063 Outline Overview of features Instrumentation Measurement Analysis tools Parallel profile analysis (ParaProf) Performance data management (PerfDMF) Performance data mining (PerfExplorer) Application examples Miranda, Flash, ESMF, sPPM, CFRFS Kernel monitoring and KTAU Demos at DOE ASC booth
4
TAU Parallel Performance SystemASC Booth SC064 TAU Performance System Tuning and Analysis Utilities (14+ year project effort) Performance system framework for HPC systems Integrated, scalable, flexible, and parallel Targets a general complex system computation model Entities: nodes / contexts / threads Multi-level: system / software / parallelism Measurement and analysis abstraction Integrated toolkit for performance problem solving Instrumentation, measurement, analysis, and visualization Portable performance profiling and tracing facility Performance data management and data mining Partners: LLNL, ANL, Research Center Jülich, LANL
5
TAU Parallel Performance SystemASC Booth SC065 TAU Parallel Performance System Goals Portable (open source) parallel performance system Computer system architectures and operating systems Different programming languages and compilers Multi-level, multi-language performance instrumentation Flexible and configurable performance measurement Support for multiple parallel programming paradigms Multi-threading, message passing, mixed-mode, hybrid, object oriented (generic), component-based Support for performance mapping Integration of leading performance technology Scalable (very large) parallel performance analysis
6
TAU Parallel Performance SystemASC Booth SC066 TAU Performance System Architecture
7
TAU Parallel Performance SystemASC Booth SC067 TAU Performance System Architecture
8
TAU Parallel Performance SystemASC Booth SC068 TAU Instrumentation Approach Support for standard program events Routines, classes and templates Statement-level blocks Support for user-defined events Begin/End events (“user-defined timers”) Atomic events (e.g., size of memory allocated/freed) Selection of event statistics Support definition of “semantic” entities for mapping Support for event groups (aggregation, selection) Instrumentation optimization Eliminate instrumentation in lightweight routines
9
TAU Parallel Performance SystemASC Booth SC069 TAU Instrumentation Mechanisms Source code Manual (TAU API, TAU component API) Automatic (robust) C, C++, F77/90/95 (Program Database Toolkit (PDT)) OpenMP (directive rewriting (Opari), POMP2 spec) Object code Pre-instrumented libraries (e.g., MPI using PMPI) Statically-linked and dynamically-linked Executable code Dynamic instrumentation (pre-execution) (DynInstAPI) Virtual machine instrumentation (e.g., Java using JVMPI) TAU_COMPILER to automate instrumentation process
10
TAU Parallel Performance SystemASC Booth SC0610 User-level abstractions problem domain source code object codelibraries instrumentation executable runtime image compiler linkerOS VM instrumentation performance data run preprocessor Multi-Level Instrumentation and Mapping Multiple interfaces Information sharing Between interfaces Event selection Within/between levels Mapping Associate performance data with high-level semantic abstractions
11
TAU Parallel Performance SystemASC Booth SC0611 TAU Measurement Approach Portable and scalable parallel profiling solution Multiple profiling types and options Event selection and control (enabling/disabling, throttling) Online profile access and sampling Online performance profile overhead compensation Portable and scalable parallel tracing solution Trace translation to EPILOG, VTF3, and OTF Trace streams (OTF) and hierarchical trace merging Robust timing and hardware performance support Multiple counters (hardware, user-defined, system) Performance measurement for CCA component software
12
TAU Parallel Performance SystemASC Booth SC0612 TAU Measurement Mechanisms Parallel profiling Function-level, block-level, statement-level Supports user-defined events and mapping events TAU parallel profile stored (dumped) during execution Support for flat, callgraph/callpath, phase profiling Support for memory profiling (headroom, leaks) Tracing All profile-level events Inter-process communication events Inclusion of multiple counter data in traced events
13
TAU Parallel Performance SystemASC Booth SC0613 Types of Parallel Performance Profiling Flat profiles Metric (e.g., time) spent in an event (callgraph nodes) Exclusive/inclusive, # of calls, child calls Callpath profiles (Calldepth profiles) Time spent along a calling path (edges in callgraph) “main=> f1 => f2 => MPI_Send” (event name) TAU_CALLPATH_LENGTH environment variable Phase profiles Flat profiles under a phase (nested phases are allowed) Default “main” phase Supports static or dynamic (per-iteration) phases
14
TAU Parallel Performance SystemASC Booth SC0614 Performance Analysis and Visualization Analysis of parallel profile and trace measurement Parallel profile analysis ParaProf: parallel profile analysis and presentation ParaVis: parallel performance visualization package Profile generation from trace data (tau2pprof) Performance data management framework (PerfDMF) Parallel trace analysis Translation to VTF (V3.0), EPILOG, OTF formats Integration with VNG (Technical University of Dresden) Online parallel analysis and visualization Integration with CUBE browser (KOJAK, UTK, FZJ)
15
TAU Parallel Performance SystemASC Booth SC0615 ParaProf Parallel Performance Profile Analysis HPMToolkit MpiP TAU Raw files PerfDMF managed (database) Metadata Application Experiment Trial
16
TAU Parallel Performance SystemASC Booth SC0616 ParaProf – Flat Profile (Miranda, BG/L) 8K processors node, context, thread Miranda hydrodynamics Fortran + MPI LLNL Run to 64K
17
TAU Parallel Performance SystemASC Booth SC0617 ParaProf – Stacked View (Miranda)
18
TAU Parallel Performance SystemASC Booth SC0618 ParaProf – Callpath Profile (Flash) Flash thermonuclear flashes Fortran + MPI Argonne
19
TAU Parallel Performance SystemASC Booth SC0619 ParaProf – Scalable Histogram View (Miranda) 8k processors 16k processors
20
TAU Parallel Performance SystemASC Booth SC0620 ParaProf – 3D Full Profile (Miranda) 16k processors
21
TAU Parallel Performance SystemASC Booth SC0621 ParaProf – 3D Full Profile (Flash) 128 processors
22
TAU Parallel Performance SystemASC Booth SC0622 ParaProf Bar Plot (Zoom in/out +/-)
23
TAU Parallel Performance SystemASC Booth SC0623 ParaProf – 3D Scatterplot (Miranda) Each point is a “thread” of execution A total of four metrics shown in relation ParaVis 3D profile visualization library JOGL
24
TAU Parallel Performance SystemASC Booth SC0624 Component-Based Scientific Applications How to support performance analysis and tuning process consistent with application development methodology? Common Component Architecture (CCA) applications Performance tools should integrate with software Design performance observation component Measurement port and measurement interfaces Build support for application component instrumentation Interpose a proxy component for each port Inside the proxy, track caller/callee invocations, timings Automate the process of proxy component creation using PDT for static analysis of components include support for selective instrumentation
25
TAU Parallel Performance SystemASC Booth SC0625 Flame Reaction-Diffusion (Sandia) CCAFFEINE
26
TAU Parallel Performance SystemASC Booth SC0626 Earth Systems Modeling Framework Coupled modeling with modular software framework Instrumentation for ESMF framework and applications PDT automatic instrumentation Fortran 95 code modules C / C++ code modules MPI wrapper library for MPI calls ESMF component instrumentation (using CCA) CCA measurement port manual instrumentation Proxy generation using PDT and runtime interposition Significant callpath profiling used by ESMF team
27
TAU Parallel Performance SystemASC Booth SC0627 Using TAU Component in ESMF/CCA
28
TAU Parallel Performance SystemASC Booth SC0628 TAU Traces with Hardware Counters (ESMF)
29
TAU Parallel Performance SystemASC Booth SC0629 Performance Data Management (PerfDMF) K. Huck, A. Malony, R. Bell, A. Morris, “Design and Implementation of a Parallel Performance Data Management Framework,” ICPP 2005.
30
TAU Parallel Performance SystemASC Booth SC0630 Performance Data Mining (PerfExplorer) K. Huck and A. Malony, “PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing,” SC 2005, Thursday, 11:30, Room 606-607.
31
TAU Parallel Performance SystemASC Booth SC0631 PerfExplorer Analysis Methods Data summaries, distributions, scatterplots Clustering k-means Hierarchical Correlation analysis Dimension reduction PCA Random linear projection Thresholds Comparative analysis Data management views
32
TAU Parallel Performance SystemASC Booth SC0632 Correlation Analysis (Flash) Describes strength and direction of a linear relationship between two variables (events) in the data
33
TAU Parallel Performance SystemASC Booth SC0633 Flash Clustering on 16K BG/L Processors Four significant events automatically selected Clusters and correlations are visible
34
TAU Parallel Performance SystemASC Booth SC0634 ZeptoOS and TAU DOE OS/RTS for Extreme Scale Scientific Computation ZeptoOS scalable components for petascale architectures Argonne National Laboratory and University of Oregon University of Oregon Kernel-level performance monitoring OS component performance assessment and tuning KTAU (Kernel Tuning and Analysis Utilities) integration of TAU infrastructure in Linux kernel integration with ZeptoOS installation on BG/L Argonne booth demo/talk: T/W/Th 3:30-4:00 pm
35
TAU Parallel Performance SystemASC Booth SC0635 Linux Kernel Profiling using TAU – Goals Fine-grained kernel-level performance measurement Parallel applications Support both profiling and tracing Both process-centric and system-wide view Merge user-space performance with kernel-space User-space: (TAU) profile/trace Kernel-space: (KTAU) profile/trace Detailed program-OS interaction data Including interrupts (IRQ) Analysis and visualization compatible with TAU
36
TAU Parallel Performance SystemASC Booth SC0636 KTAU System Architecture A. Nataraj, A. Malony, S. Shende, and A. Morris, “Kernel-level Measurement for Integrated Performance Views: the KTAU Project,” Cluster 2006, distinguished paper.
37
TAU Parallel Performance SystemASC Booth SC0637 Project Affiliations (selected) Lawrence Livermore National Lab Hydrodynamics (Miranda), radiation diffusion (KULL) Open Trace Format (OTF) implementation on BG/L Argonne National Lab ZeptoOS project and KTAU Astrophysical thermonuclear flashes (Flash) Center for Simulation of Accidental Fires and Explosion University of Utah, ASCI ASAP Center, C-SAFE Uintah Computational Framework (UCF) Oak Ridge National Lab Contribution to the Joule Report (S3D, AORSA3D)
38
TAU Parallel Performance SystemASC Booth SC0638 Project Affiliations (continued) Sandia National Lab Simulation of turbulent reactive flows (S3D) Combustion code (CFRFS) Los Alamos National Lab Monte Carlo transport (MCNP) SAIC’s Adaptive Grid Eulerian (SAGE) perflib integration (Jeff Brown) CCSM / ESMF / WRF climate/earth/weather simulation NSF, NOAA, DOE, NASA, … Common component architecture (CCA) integration Performance Engineering Research Institute (PERI)
39
TAU Parallel Performance SystemASC Booth SC0639 TAU Performance System Status Computing platforms IBM, SGI, Cray, HP, Sun, Hitachi, NEC, Linux clusters, Apple, Windows, … Programming languages C, C++, Fortran 90/95, UPC, HPF, Java, OpenMP, Python Thread libraries pthreads, SGI sproc, Java,Windows, OpenMP Communications libraries MPI-1/2, PVM, shmem, … Compilers IBM, Intel, PGI, GNU, Fujitsu, Sun, NAG, Microsoft, SGI, Cray, HP, NEC, Absoft, Lahey, PathScale, Open64
40
TAU Parallel Performance SystemASC Booth SC0640 Support Acknowledgements Department of Energy (DOE) Office of Science MICS, Argonne National Lab ASC/NNSA University of Utah ASC/NNSA Level 1 ASC/NNSA, Lawrence Livermore National Lab Department of Defense (DoD) HPC Modernization Office (HPCMO) Programming Environment and Training (PET) NSF Software and Tools for High-End Computing Research Centre Juelich Los Alamos National Laboratory ParaTools, Inc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.