October 18, 2001 LACSI Symposium, Santa Fe, NM1 Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress Shirley Moore.

Slides:



Advertisements
Similar presentations
Performance Analysis and Optimization through Run-time Simulation and Statistics Philip J. Mucci University Of Tennessee
Advertisements

Weather Research & Forecasting: A General Overview
DOE Global Modeling Strategic Goals Anjuli Bamzai Program Manager Climate Change Prediction Program DOE/OBER/Climate Change Res Div
Earth System Curator Spanning the Gap Between Models and Datasets.
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Global Climate Modeling Research John Drake Computational Climate Dynamics Group Computer.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
A Parallel Structured Ecological Model for High End Shared Memory Computers Dali Wang Department of Computer Science, University of Tennessee, Knoxville.
Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee Sameer Shende, and.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
Instrumentation and Profiling David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
An Automated Component-Based Performance Experiment and Modeling Environment Van Bui, Boyana Norris, Lois Curfman McInnes, and Li Li Argonne National Laboratory,
PAPI Tool Evaluation Bryan Golden 1/4/2004 HCS Research Laboratory University of Florida.
NSF NCAR | NASA GSFC | DOE LANL ANL | NOAA NCEP GFDL | MIT Adoption and field tests of M.I.T General Circulation Model (MITgcm) with ESMF Chris Hill ESMF.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
PAPI Update Shirley Browne, Cricket Deane, George Ho, Philip Mucci University of Tennessee Computer.
1 Jack Dongarra University of Tennesseehttp://
DCS Overview MCS/DCS Technical Interchange Meeting August, 2000.
QCD Project Overview Ying Zhang September 26, 2005.
Computer Science Department University of Texas at El Paso PCAT Performance Counter Assessment Team PAPI Development Team SC 2003, Phoenix, AZ – November.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3: Operating Systems Computer Science: An Overview Tenth Edition.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
November 13, 2006 Performance Engineering Research Institute 1 Scientific Discovery through Advanced Computation Performance Engineering.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit.
Performance Analysis using PAPI and Hardware Performance Counters on the IBM Power3 Philip Mucci Shirley Moore
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview Part 2: History (continued)
Profiling Memory Subsystem Performance in an Advanced POWER Virtualization Environment The prominent role of the memory hierarchy as one of the major bottlenecks.
High Energy and Nuclear Physics Collaborations and Links Stu Loken Berkeley Lab HENP Field Representative.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Advanced Simulation and Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear.
OSes: 3. OS Structs 1 Operating Systems v Objectives –summarise OSes from several perspectives Certificate Program in Software Development CSE-TC and CSIM,
Processes Introduction to Operating Systems: Module 3.
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
CCSM Portability and Performance, Software Engineering Challenges, and Future Targets Tony Craig National Center for Atmospheric Research Boulder, Colorado,
Summertime Fun Everyone loves performance Shirley Browne, George Ho, Jeff Horner, Kevin London, Philip Mucci, John Thurman.
Contract Year 1 Review Computational Environment (CE) Shirley Moore University of Tennessee-Knoxville May 16, 2002.
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://
CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
Computer Science Department University of Texas at El Paso PCAT Performance Counter Assessment Team PAPI Development Team UGC 2003, Bellevue, WA – June.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Performance Data Standard and API Shirley Browne, Jack Dongarra, and Philip Mucci University of Tennessee from the Ptools Annual Meeting, May 1998.
The Performance Evaluation Research Center (PERC) Participating Institutions: Argonne Natl. Lab.Univ. of California, San Diego Lawrence Berkeley Natl.
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
PAPI on Blue Gene L Using network performance counters to layout tasks for improved performance.
Warren M. Washington NCAR The Parallel Climate Model (PCM) and Transition to a Climate Change Version of the Community Climate System Model (CCSM)
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Data Requirements for Climate and Carbon Research John Drake, Climate Dynamics Group Computer.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Reference Implementation of the High Performance Debugging (HPD) Standard Kevin London ( ) Shirley Browne ( ) Robert.
Simbios Simbios ™ The National Center for Physics-Based Simulation of Biological Structures at Stanford SimTK Framework CCA for Physics Based Simulation.
Shirley Moore Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress Shirley Moore
VisIt Project Overview
Performance Technology for Scalable Parallel Systems
Performance Analysis, Tools and Optimization
Software Practices for a Performance Portable Climate System Model
Computer Science I CSC 135.
Mariana Vertenstein CCSM Software Engineering Group NCAR
Parallel Program Analysis Framework for the DOE ACTS Toolkit
Department of Computer Science, University of Tennessee, Knoxville
Presentation transcript:

October 18, 2001 LACSI Symposium, Santa Fe, NM1 Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress Shirley Moore

October 18, 2001 LACSI Symposium, Santa Fe, NM2 Scalability Issues Code instrumentation –Hand instrumentation too tedious for large codes Runtime control of data collection Batch queueing systems –Cause problems for interactive tools Tracefile size and complexity Data analysis

October 18, 2001 LACSI Symposium, Santa Fe, NM3 Cross-platform Issues Goal: similar user interfaces across different platforms Tools necessarily rely on platform- dependent substrates – e.g., for accessing hardware counters. Standardization of interfaces and data formats promotes interoperability and allows design of portable tools.

October 18, 2001 LACSI Symposium, Santa Fe, NM4 Where is Standardization Needed? Performance data –Trace records vs. summary statistics –Data format –Data semantics Library interfaces –Access to hardware counters –Statistical profiling –Dynamic instrumentation

October 18, 2001 LACSI Symposium, Santa Fe, NM5 Standardization? (cont.) User interfaces –Common set of commands –Common functionality Timing routines Memory utilization information

October 18, 2001 LACSI Symposium, Santa Fe, NM6 Parallel Tools Consortium Interaction between vendors, researchers, and users Venue for standardization Current projects –PAPI –DPCL

October 18, 2001 LACSI Symposium, Santa Fe, NM7 Hardware Counters Small set of registers that count events, which are occurrences of specific signals related to the processor’s function Monitoring these events facilitates correlation between the structure of the source/object code and the efficiency of the mapping of that code to the underlying architecture.

October 18, 2001 LACSI Symposium, Santa Fe, NM8 Goals of PAPI Solid foundation for cross platform performance analysis tools Free tool developers from re- implementing counter access Standardization between vendors, academics and users Encourage vendors to provide hardware and OS support for counter access Reference implementations for a number of HPC architectures Well documented and easy to use

October 18, 2001 LACSI Symposium, Santa Fe, NM9 PAPI Implementation Tools!!! PAPI Low Level PAPI High Level Hardware Performance Counter Operating System Kernel Extension PAPI Machine Dependent Substrate Machine Specific Layer Portable Layer

October 18, 2001 LACSI Symposium, Santa Fe, NM10 PAPI Preset Events Proposed standard set of events deemed most relevant for application performance tuning Defined in papiStdEventDefs.h Mapped to native events on a given platform –Run tests/avail to see list of PAPI preset events available on a platform

October 18, 2001 LACSI Symposium, Santa Fe, NM11 Statistical Profiling PAPI provides support for execution profiling based on any counter event. PAPI_profil() creates a histogram by text address of overflow counts for a specified region of the application code. Used in vprof tool from Sandia Lab

October 18, 2001 LACSI Symposium, Santa Fe, NM12 PAPI Reference Implementations Linux/x86, Windows 2000 –Requires patch to Linux kernel, driver for Windows Linux/IA-64 Sun Solaris 2.8/Ultra I/II IBM AIX 4.3+/Power –Contact IBM for pmtoolkit SGI IRIX/MIPS Compaq Tru64/Alpha Ev6 & Ev67 Requires OS device driver patch from Compaq Per-thread and per-process counts not possible Extremely limited number of events Cray T3E/Unicos

October 18, 2001 LACSI Symposium, Santa Fe, NM13 PAPI Future Work Improve accuracy of hardware counter and statistical profiling data –Microbenchmarks to measure accuracy (Pat Teller, UTEP) –Use hardware support for overflow interrupts –Use Event Address Registers (EARs) where available Data structure based performance counters (collaboration with UMd) –Qualify event counting by address range –Page level counters in cache coherence hardware

October 18, 2001 LACSI Symposium, Santa Fe, NM14 PAPI Future (cont.) Memory utilization extensions (following list suggested by Jack Horner, LANL) –Memory available on a node –Total memory available/used –High-water-mark memory used by process/thread –Disk swapping by process –Process-memory locality –Location of memory used by an object Dynamic instrumentation – e.g., PAPI probe modules

October 18, 2001 LACSI Symposium, Santa Fe, NM15 For More Information –Software and documentation –Reference materials –Papers and presentations –Third-party tools –Mailing lists

October 18, 2001 LACSI Symposium, Santa Fe, NM16 DPCL Dynamic Probe Class Library Built of top of IBM version of University of Maryland’s dyninst Current platforms –IBM AIX –Linux/x86 (limited functionality) Dyninst ported to more platforms but by itself lacks functionality for easily instrumenting parallel applications.

October 18, 2001 LACSI Symposium, Santa Fe, NM17 Infrastructure Components? Parsers for common languages Access to hardware counter data Communication behavior instrumentation and analysis Dynamic instrumentation capability Runtime control of data collection and analysis Performance data management

October 18, 2001 LACSI Symposium, Santa Fe, NM18 Case Studies Test tools on large-scale applications in production environment Reveal limitations of tools and point out areas where improvements are needed Develop performance tuning methodologies for large-scale codes

October 18, 2001 LACSI Symposium, Santa Fe, NM19 PERC: Performance Evaluation Research Center Developing a science for understanding performance of scientific applications on high-end computer systems.Developing a science for understanding performance of scientific applications on high-end computer systems. Developing engineering strategies for improving performance on these systems.Developing engineering strategies for improving performance on these systems. DOE Labs: ANL, LBNL, LLNL, ORNLDOE Labs: ANL, LBNL, LLNL, ORNL Universities: UCSD, UI-UC, UMD, UTKUniversities: UCSD, UI-UC, UMD, UTK Funded by SciDAC: Scientific Discovery through Advanced ComputingFunded by SciDAC: Scientific Discovery through Advanced Computing

October 18, 2001 LACSI Symposium, Santa Fe, NM20 PERC: Real-World Applications High Energy and Nuclear PhysicsHigh Energy and Nuclear Physics –Shedding New Light on Exploding Stars: Terascale Simulations of Neutrino-Driven SuperNovae and Their NucleoSynthesis –Advanced Computing for 21st Century Accelerator Science and Technology Biology and Environmental ResearchBiology and Environmental Research –Collaborative Design and Development of the Community Climate System Model for Terascale Computers Fusion Energy SciencesFusion Energy Sciences –Numerical Computation of Wave-Plasma Interactions in Multi- dimensional Systems Advanced Scientific ComputingAdvanced Scientific Computing –Terascale Optimal PDE Solvers (TOPS) –Applied Partial Differential Equations Center (APDEC) –Scientific Data Management (SDM) Chemical SciencesChemical Sciences –Accurate Properties for Open-Shell States of Large Molecules …and more……and more…

October 18, 2001 LACSI Symposium, Santa Fe, NM21 Parallel Climate Transition Model Components for Ocean, Atmosphere, Sea Ice, Land Surface and River Transport Developed by Warren Washington’s group at NCAR POP: Parallel Ocean Program from LANL CCM3: Community Climate Model 3.2 from NCAR including LSM: Land Surface Model ICE: CICE from LANL and CCSM from NCAR RTM: River Transport Module from UT Austin Fortran 90 with MPI

October 18, 2001 LACSI Symposium, Santa Fe, NM22 PCTM: Parallel Climate Transition Model Flux Coupler Land Surface Model Ocean Model Atmosphere Model Sea Ice Model Sequential Execution of Parallelized Modules River Model

October 18, 2001 LACSI Symposium, Santa Fe, NM23 PCTM Instrumentation Vampir tracefile in tens of gigabytes range even for toy problem Hand instrumentation with PAPI tedious UIUC working on SvPablo instrumentation Must work in batch queueing environment Plan to try other tools –MPE logging and jumpshot –TAU –VGV?

October 18, 2001 LACSI Symposium, Santa Fe, NM24 In Progress Standardization and reference implementations for memory utilization information (funded by DoD HPCMP PET, Ptools-sponsored project) Repositories of application performance evaluation case studies (e.g., SciDAC PERC) Portable dynamic instrumentation for parallel applications (DOE MICS project – UTK, UMd, UWisc) Increased functionality and accuracy of hardware counter data collection (DoD HPCMP, DOE MICS)

October 18, 2001 LACSI Symposium, Santa Fe, NM25 Next Steps Additional areas for standardization? –Scalable trace file format –Metadata standards for performance data –New hardware counter metrics (e.g., SMP and DMP events, data-centric counters) –Others?

October 18, 2001 LACSI Symposium, Santa Fe, NM26 Next Steps (cont.) Sharing of tools and data –Open source software –Machine and software profiles –Runtime performance data –Benchmark results –Application examples and case studies Long-term goal: common performance tool infrastructure across HPC systems