Download presentation
Presentation is loading. Please wait.
Published byColin McCoy Modified over 9 years ago
1
SUPER 1 Bob Lucas University of Southern California Sept. 23, 2011 Science Pipeline Allen D. Malony University of Oregon May 6, 2014 Support for this work was provided through the Scientific Discovery through Advanced Computing (SciDAC) program funded by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research
2
SUPER 2 Fundamental Objectives SUPER funnels the rich intellectual products borne from a history of research and development in performance areas into an effective performance engineering center of mass for the SciDAC program SUPER pulls from prior investments by ASCR and others the technology and expertise that past efforts produced, especially with respect to methodologies, tools, and integration across performance engineering areas – measurement, analysis, modeling – program analysis, optimization and tuning – resilience SUPER focuses on integration of expertise for addressing performance engineering problems across the SciDAC landscape, leveraging the robust performance tools available
3
SUPER 3 Pipeline to Tools/Technology Integration and Application DOE funding Other funding Performance Modeling Reliability Autotuning Optimization Resilience Energy TAU mpiP PBound Active Harmony CHiLL Roofline PAPI PEBIL PSiNtracer GPTL Tools / Technologies RCRToolkit Code analysis Center of mass for performance engineerng ROSE Orio End-to-end Integration SciDAC applications
4
SUPER 4 Performance Engineering Tools/Tech Integration SUPER focuses on integrating developed tools and technologies to build enhanced capabilities
5
SUPER 5 End-to-end Performance Optimization SUPER is establishing processes for applying integrated tools for end-to-end optimization
6
SUPER 6 Tools and Technologies Performance – TAU Performance System, PAPI, mpiP, GPTL Power / Energy – PEBIL, PSiNtracer Autotuning – Active Harmony, CHiLL, Orio Resilience and source analysis Modeling – Pbound, Roofline Optimization
7
SUPER 7 TAU Performance System Tuning and Analysis Utilities (20+ year project) Performance problem solving framework for HPC Integrated performance toolkit – Multi-level performance instrumentation – Flexible and configurable performance measurement – Widely-ported performance profiling / tracing system – Performance data management and data mining – Open source (BSD-style license) Broad use in complex software, systems, applications Long history of funding by DOE, NSF, and DoD
8
SUPER 8 TAU’s Funding and Development Pipeline Funding pipeline: 2001 – 2011 PRIMA MOGO Vancouve r CCA ZeptoOS Source code analysis (PDT) Performance data management (PerfDMF) Automated source instrumentation Modeling and computational QoS Productive Evolution Flexible performance measurements, Performance mapping in software layers Kernel-level measurement Runtime scalable monitoring Knowledge Performance knowledge Performance data mining (PerfExplorer) Measurement infrastructure refactor TAU + Scalasca Score-P Parallel performance visualization Automatic library wrapping Heterogeneous performance Accelerator analysis POINT Glassbox Open source interoperation Performance engineering Cross-layer Integration DOE NSF
9
SUPER 9 TAU Technologies TAU + Scalasca Score-P ParaProf TAUdb PerfExplorer
10
SUPER 10 Impact of TAUdb and PerfExplorer TAUdb CUDA OpenCL CHiLL + AH Orio ROSE Geant4 MPAS-O CESM PerfExplorer XGC1
11
SUPER 11 End-to-End Performance Variability Analysis (CESM) Use of GPTL (General Purpose Timing Library) Lightweight profiling to bundle with app NSF + DOE funding Couple with platform systems information TAUdb extended to support this data
12
SUPER 12 Geant4 Performance Analysis and Tuning Geant 4 is extremely important to the design and execution of HEG experiments – How to evolve design to best exploit current/future architectures? – Geant4 tHEP and ASCR partnership Not a standard performance analysis/tuning scenario – Quantifying performance impact of OO design choices – Class-based performance analysis polymorphism (same function name, many implementations) virtual functions (what object types are functions invoked on?)
13
SUPER 13 Using TAU in Geant4 TAU collects data for Simplified Calorimeter experiment – Sampling profiles: low-overhead measurements of full-scale experiments – Instrumentation-based: selectively instrumented classes and functions to collect precise measurements for functions (and whole classes) identified through sampling Data stored in TAUdb (shared with physics collaborators) New analysis enabled by TAUdb and PerfExplorer – Class-based profiles: hardware counters and derived metrics – Compare impacts of high-level and low-level optimizations changing inheritance structure (design) (high) performance metrics (cache misses, vectorization, …) (low)
14
SUPER 14 Performance API (PAPI) PAPI is middleware that provides a consistent interface and methodology for the performance counter hardware in major microprocessors PAPI enables software engineers to see the relation between software performance and hardware events PAPI component architecture provides access to a collection of components that expose performance measurement opportunities across the system – network, I/O system, accelerators, power/energy
15
SUPER 15 PAPI Pipeline DOE support – ASCR (2002-05) – PERC (2001-06) – PERI (2006-11) PAPI is widely available on processors and is heavily used in SUPER across areas PaRSEC (UTK)TAU (UO) PerfSuite (NCSA)HPCToolkit (Rice) SCALASCA (FZJ, UTK)VampirVampir (TUD) Open|Speedshop (LLNL)SvPablo (RENCI)
16
SUPER 16 Performance Analysis for Communication (mpiP) Lightweight and scalable profiling tool for MPI applications DOE funding history – ASC, PERC, PERI SUPER is extending mpiP to collect communication topology information for point-to-point and collective communication – SciDAC application characterization studies – Benchmarks and applications from DOE-funded Oxbow project Developing an automated approach for characterizing the communication topology LAMMPS LULESH
17
SUPER 17 Analyzing and Modeling Performance and Power How can we get energy efficient HPC? Understand and model how computation and communication affect the overall performance and energy requirements of HPC applications Use performance and power models to design software and hardware-aware “green” techniques to optimize energy footprint PEBIL and PSiNtracer (PMaC Labs) RCRToolkit (RENCI)
18
SUPER 18 Analysis and Modeling with PEBIL and PSiNstracer Capture fundamental operations used by the application – Requires low-leve, specific details of application – Analysis required on large-scale production codes PEBIL binary instrumentation – Static analysis (memory, FP counts, op parallelism, …) – Dynamic ( cache hits, execution counts, loop length, …) PSiNstracer communication characterization – Profiles all communication routines during a run Funding heritage – DOE (ASCR, PERC, PERI) – DoD, NSF
19
SUPER 19 RCRToolkit for Runtime Resource Monitoring Resource Centric Reflection (RCR) Toolkit – Node-wide performance monitoring and analysis – Uncore (“outside the core”) – Access through shared blackboard (RCRblackboard) Funding pipeline – DoD ACS MAESTRO and ATPER – DOE (XGC, XPRESS) – NSF GENI Impact – Adaptive scheduling for power and energy – Target deterministic strategies for (auto)tuning – SciDAC end applications amenable to using
20
SUPER 20 Autotuning Pipeline SUPER brings several research efforts together to enable the use and integration of automatic tuning methods and tools – Active Harmony (University of Maryland) – CHiLL (Utah, USC) – Orio (Argonne, UO) Powerful capability for performance engineering – Parameter exploration automation – Couple with code transformation techniques Impact can be significant in improving ability to explore multi-dimensional performance space
21
SUPER 21 Active Harmony Active Harmony (AH) is an auto-tuning framework that supports online and offline auto-tuning – Flexible, plugin-based architecture How does it works? – Measures program performance – Adapts tunable parameters – Search heuristics explore options Development funding pipeline – NSF (1997–2000) – DOD (1997–2000, 2010–present) – DOE (ASCR, 2001–2012) – DOE (SciDAC, 2001–present) Active Harmony 3 2 1 Client Application Candidate Points Evaluated Performance Search Strategy FETCH REPORT 1 2 3
22
SUPER 22 Active Harmony Integration CHiLL integration – Plugin used to access AH search methods – Explores performance space from code generation TAU integration – Plugin used within AH to read from / write to TAUdb – TAU used with CHiLL and AH to capture performance Application – Used with MPAS-O (partitioning optimization) – Developed new auto-tuned FFT (1.8x faster than FFTW)
23
SUPER 23 CHiLL Autotuning Pipeline CHiLL autotuning system developed in PERI (Utah) – Compiler framework for loop transformations – Integrated into the PERI autotuning framework – Integrated this in SciDAC with other research at Utah Funding pipeline – NSF NGS (2002) – NSF CSR (2005) – DOE PERI (2006) – DOE ASCR XTUNE (2008) Broadening the autotuning research agenda in SUPER – Heterogeneous systems – Other objectives, in particular energy and resilience
24
SUPER 24 Orio Autotuning Framework Express any properties of the computation that can possibly be exploited to optimize Orio approach – Optimization specifications capture typical optimizations – tiling, unrolling, … specialized implementations – different input sizes – Transform code based on knowledge CUDA, OpenCL, OpenMP, … – Empirical analysis of variants (different code output) – Search for best Orio integration with TAU for empirical autotuning\ SUPER impact on PETSc and other libraries
25
SUPER 25 Modeling through Source and Empirical Analysis Performance bounds give the upper limit in performance that can be expected for a given application on a given system Different existing approaches: – Fully automatic (ignores machine information) – Theoretical peak (based on FP units) – Fully dynamic (profiling-based, time, overhead) Pbound approach (Argonne) – Application signatures + architecture bounds Roofline modeling (LBL)
26
SUPER 26 PBound Developed under PERC, PERI, and SUPER – ROSE-based tool that generates performance bounds from source code (C, C++, Fortran) – Example: what is the best achievable execution time? Based on static (source code) analysis – Produces parameterized closed-form expressions expressing the computational and data load/store requirements of application kernels Coupled with architectural information – Produces upper bounds on the performance of the application
27
SUPER 27 Roofline Modeling Roofline models characterize architectures and help visualize application performance within the architectural roofline – Shows the range of possible application performance – Determines how optimizations affect application performance P erformance space determined by either: – Static performance models such as those generated by Pbound – Empirical models based upon platform experiments
28
SUPER 28 Resilience Pipeline Express knowledge of application requirements – Semiconductor Research Corporation (SRC) – Multiscale Systems (MUSYC) Focused Center Research Program (FCRP) New grant from ARO – Transition technology into the ROSE compiler (LLNL) – Create runtime system based on JPL technology Additional NSF and SRC funding with Utah – Automatic derivation of predicates – Help detect silent errors Hardware component based FPGAs – Use FPGAs as co-processors – Originally funded by DARPA under the ACS (Adaptive Computing Systems) Work continues in SUPER – Collaborating with LLNL’s resilience research team – Broaden the space of applications and assertions
29
SUPER 29 SUPER Science Pipeline Impact and Outcomes Tools continue to improve and are widely distributed and downloaded 75 papers produced 35 presentations among the institutions 24 students matriculated and/or graduated 4 postdocs 10 internships at DOE national labs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.