Challenges in Performance Evaluation and Improvement of Scientific Codes Boyana Norris Argonne National Laboratory Ivana.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Workshop finale dei Progetti Grid del PON "Ricerca" Avviso Febbraio 2009 Catania Abstract In the contest of the S.Co.P.E. italian.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
A Backtracking Correction Heuristic for Graph Coloring Algorithms Sanjukta Bhowmick and Paul Hovland Argonne National Laboratory Funded by DOE.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
© 2008 IBM Corporation Behavioral Models for Software Development Andrei Kirshin, Dolev Dotan, Alan Hartman January 2008.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Technologies for Computational Science Boyana Norris Argonne National Laboratory
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
An Automated Component-Based Performance Experiment and Modeling Environment Van Bui, Boyana Norris, Lois Curfman McInnes, and Li Li Argonne National Laboratory,
CCA Forum Fall Meeting October CCA Common Component Architecture Update on TASCS Component Technology Initiatives CCA Fall Meeting October.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
High-Performance Component- Based Scientific Software Engineering Boyana Norris Argonne National Laboratory CSDMS Meeting:
CQoS Update Li Li, Boyana Norris, Lois Curfman McInnes Argonne National Laboratory Kevin Huck University of Oregon.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
Component Infrastructure of CQoS and Its Application in Scientific Computations Li Li 1, Boyana Norris 1, Lois Curfman McInnes 1, Kevin Huck 2, Joseph.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Magnetic Field Measurement System as Part of a Software Family Jerzy M. Nogiec Joe DiMarco Fermilab.
Victor Eijkhout and Erika Fuentes, ICL, University of Tennessee SuperComputing 2003 A Proposed Standard for Numerical Metadata.
November 13, 2006 Performance Engineering Research Institute 1 Scientific Discovery through Advanced Computation Performance Engineering.
A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications Boyana Norris Argonne National Laboratory Van Bui, Lois.
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
A Framework to Evaluate Intelligent Environments Chao Chen Supervisor: Dr. Sumi Helal Mobile & Pervasive Computing Lab CISE Department April 21, 2007.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
" Characterizing the Relationship between ILU-type Preconditioners and the Storage Hierarchy" " Characterizing the Relationship between ILU-type Preconditioners.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Components for Beam Dynamics Douglas R. Dechow, Tech-X Lois Curfman McInnes, ANL Boyana Norris, ANL With thanks to the Common Component Architecture (CCA)
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
PerfExplorer Component for Performance Data Analysis Kevin Huck – University of Oregon Boyana Norris – Argonne National Lab Li Li – Argonne National Lab.
University of Maryland Towards Automated Tuning of Parallel Programs Jeffrey K. Hollingsworth Department of Computer Science University.
A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Computational Science & Engineering meeting national needs Steven F. Ashby SIAG-CSE Chair March 24, 2003.
Connections to Other Packages The Cactus Team Albert Einstein Institute
21 Sep UPC Performance Analysis Tool: Status and Plans Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant.
1 University of Maryland Runtime Program Evolution Jeff Hollingsworth © Copyright 2000, Jeffrey K. Hollingsworth, All Rights Reserved. University of Maryland.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
Survey of Tools to Support Safe Adaptation with Validation Alain Esteva-Ramirez School of Computing and Information Sciences Florida International University.
The Performance Evaluation Research Center (PERC) Participating Institutions: Argonne Natl. Lab.Univ. of California, San Diego Lawrence Berkeley Natl.
Testing plan outline Adam Leko Hans Sherburne HCS Research Laboratory University of Florida.
General requirements for BES III offline & EF selection software Weidong Li.
Improving System Availability in Distributed Environments Sam Malek with Marija Mikic-Rakic Nels.
Center for Component Technology for Terascale Simulation Software (CCTTSS) 110 April 2002CCA Forum, Townsend, TN CCA Status, Code Walkthroughs, and Demonstrations.
Facilitating Document Annotation Using Content and Querying Value.
Quality of Service for Numerical Components Lori Freitag Diachin, Paul Hovland, Kate Keahey, Lois McInnes, Boyana Norris, Padma Raghavan.
VGrADS and GridSolve Asim YarKhan Jack Dongarra, Zhiao Shi, Fengguang Song Innovative Computing Laboratory University of Tennessee VGrADS Workshop – September.
Enabling Adaptive Algorithms through Component-Based Software Engineering Boyana Norris Argonne National Laboratory RWTH.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Boyana Norris Argonne National Laboratory Ivana Veljkovic
Xing Cai University of Oslo
EIN 6133 Enterprise Engineering
Many-core Software Development Platforms
Behavioral Models for Software Development
Declarative Transfer Learning from Deep CNNs at Scale
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Coevolutionary Automated Software Correction
Presentation transcript:

Challenges in Performance Evaluation and Improvement of Scientific Codes Boyana Norris Argonne National Laboratory Ivana Veljkovic Pennsylvania State University

SIAM CSE2 February 13, 2005 Outline Performance evaluation challenges Component-based approach Motivating example: adaptive linear system solution A component infrastructure for performance monitoring and adaptation of applications Summary and future work

SIAM CSE3 February 13, 2005 Acknowledgments Ivana Veljkovic, Padma Raghavan (Penn State) Sanjukta Bhowmick (ANL/Columbia) Lois Curfman McInnes (ANL) TAU developers (U. Oregon) PERC members Sponsor: DOE and NSF

SIAM CSE4 February 13, 2005 Challenges in performance evaluation +Many tools for performance data gathering and analysis  PAPI, TAU, SvPablo, Kojak, …  Various interfaces, levels of automation, and approaches to information presentation  User’s point of view -What do the different tools do? Which is most appropriate for a given application? -(How) can multiple tools be used in concert? -I have tons of performance data, now what? -What automatic tuning tools are available, what exactly do they do? -How hard is it to install/learn/use tool X? -Is instrumented code portable? What’s the overhead of instrumentation? How does code evolution affect the performance analysis process?

SIAM CSE5 February 13, 2005 Incomplete list of tools Source instrumentation: TAU/PDT, KOJAK (MPI/OpenMP), SvPablo, Performance Assertions, …TAUPDTKOJAK SvPablo Binary instrumentation: HPCToolkit, Paradyn, DyninstAPI, …HPCToolkitParadynDyninstAPI Performance monitoring: MetaSim Tracer (memory), PAPI, HPCToolkit, Sigma++ (memory), DPOMP (OpenMP), mpiP, gprof, psrun, …MetaSimPAPI HPCToolkit Modeling/analysis/prediction: MetaSim Convolver (memory), DIMEMAS(network), SvPablo (scalability), Paradyn, Sigma++, …MetaSimDIMEMASSvPablo Paradyn Source/binary optimization: Automated Empirical Optimization of Software (ATLAS), OSKI, ROSEATLASOSKI Runtime adaptation: ActiveHarmony, SALSAActiveHarmonySALSA

SIAM CSE6 February 13, 2005 Incomplete list of tools Source instrumentation: TAU/PDT, KOJAK (MPI/OpenMP), SvPablo, Performance Assertions, …TAUPDTKOJAK SvPablo Binary instrumentation: HPCToolkit, Paradyn, DyninstAPI, …HPCToolkitParadynDyninstAPI Performance monitoring: MetaSim Tracer (memory), PAPI, HPCToolkit, Sigma++ (memory), DPOMP (OpenMP), mpiP, gprof, psrun, …MetaSimPAPI HPCToolkit Modeling/analysis/prediction: MetaSim Convolver (memory), DIMEMAS(network), SvPablo (scalability), Paradyn, Sigma++, …MetaSimDIMEMASSvPablo Paradyn Source/binary optimization: Automated Empirical Optimization of Software (ATLAS), OSKI, ROSEATLASOSKI Runtime adaptation: ActiveHarmony, SALSAActiveHarmonySALSA

SIAM CSE7 February 13, 2005 Challenges (where is the complexity?) More effective use  integration Tool developer’s perspective  Overhead of initially implementing one-to-one interoperabilty  Managing dependencies on other tools  Maintaining interoperabilty as different tools evolve Individual Scientist Perspective  Learning curve for performance tools  less time to focus on own research (modeling, physics, mathematics)  Potentially significant time investment needed to find out whether/how using someone else’s tool would improve performance  tend to do own hand-coded optimizations (time- consuming, non-reusable)  Lack of tools that automate (at least partially) algorithm discovery, assembly, configuration, and enable runtime adaptivity

SIAM CSE8 February 13, 2005 What can be done How to manage complexity? Provide  Performance tools that are truly interoperable  Uniform easy access to tools  Component implementations of software, esp. supporting numerical codes, such as linear algebra algorithms  New algorithms (e.g., interactive/dynamic techniques, algorithm composition) Implementation approach: components, both for tools and the application software

SIAM CSE9 February 13, 2005 What is being done No “integrated” environment for performance monitoring, analysis, and optimization Most past efforts  One-to-one tool interoperability More recently  OSPAT (initial meeting at SC’04), focus on common data representation and interfaces  Tool-independent performance databases: PerfDMF  Eclipse parallel tools project (LANL)  …

SIAM CSE10 February 13, 2005 OSPAT The following areas were recommended for OSPAT to investigate:  A common instrumentation API for source level, compiler level, library level, binary instrumentation  A common probe interface for routine entry and exit events  A common profile database schema  An API to walk the callstack and examine the heap memory  A common API for thread creation and fork interface  Visualization components for drawing histograms and hierarchical displays typically used by performance tools

SIAM CSE11 February 13, 2005 Components Working definition: a component is a piece of software that can be composed with other components within a framework; composition can be either static (at link time) or dynamic (at run time)  “plug-and-play” model for building applications  For more info: C. Szyperski, Component Software: Beyond Object- Oriented Programming, ACM Press, New York, 1998 Components enable  Tool interoperability  Automation of performance instrumentation/monitoring  Application adaptivity (automated or user-guided)

SIAM CSE12 February 13, 2005 Example: component infrastructure for multimethod linear solvers Goal: provide a framework for  Performance monitoring of numerical components  Dynamic adaptativity, based on: Off-line analyses of past performance information Online analysis of current execution performance information Motivating application examples:  Driven cavity flow [Coffey et al, 2003], nonlinear PDE solution  FUN3D – incompressible and compressible Euler equations Prior work in multimethod linear solvers  McInnes et al, ’03, Bhowmick et al,’03 and ’05, Norris at al. ’05.

SIAM CSE13 February 13, 2005 Example: driven cavity flow Linear solver: GMRES(30), vary only fill level of ILU preconditioner Adaptive heuristic based on:  Previous linear solution convergence rate, nonlinear solution convergence rate, rate of increase of linear solution iterations 96x96 mesh, Grashof = 10 5, lid velocity = 100 Intel P4 Xeon, dual 2.2 GHz, 4GB RAM

SIAM CSE14 February 13, 2005 Example: Compressible PETSc-FUN3D Finite volume discretization, variable order Roe scheme on a tetrahedral, vertex- centered mesh Initial discretization: first- order scheme; switch to second-order after shock position has settled down Large sparse linear system solution takes approximately 72% of overall solution time Original FUN3D developer: W.K. Anderson et al., NASA Langley Image: Dinesh Kaushik

SIAM CSE15 February 13, 2005 PETSc-FUN3d, cont. A3: Nonsequence-based adaptive strategy based on polynomial interpolation [Bhowmick et al., ’05] A3 vs base method time: ~1% slowdown - 32% improvement Hand-tuned adaptive vs base method time: 7% - 42% improvement

SIAM CSE16 February 13, 2005 Component architecture PerfDMF Metadata extractor Checkpoint Runtime DB TAU Experiment Monitor Off-line analysis insert extract start, stop, trigger checkpoint adapt request adapt: algorithm, parameters extract query

SIAM CSE17 February 13, 2005 Future work Integration of ongoing efforts in  Performance tools: common interfaces and data represenation (leverage OSPAT, PerfDMF, TAU performance interfaces, and similar efforts)  Numerical components: emerging common interfaces (e.g., TOPS solver interfaces) increase choice of solution method  automated composition and adaptation strategies Long term  Is a more organized (but not too restrictive) environment for scientific software lifecycle development possible/desirable?

SIAM CSE18 February 13, 2005 Ext. dependencies, Version control Configure, make,… Performance tools Job management, Results Debugging Typical application development “cycle” Implementatio n Production Execution Production Execution Compilation, Linking Compilation, Linking Deployment Testing Design Performance evaluation

SIAM CSE19 February 13, 2005 Future work Beyond components  Work flow  Reproducible results – associate all necessary information for reproducing particular application instance  Ontology of tools and tools to guide selection and use

SIAM CSE20 February 13, 2005 Summary No shortage of performance evaluation, analysis, and optimization technology (and new capabilities are continuously added) Little shared infrastructure, limiting the utility of performance technology in scientific computing Components, both in performance tools, and numerical software can be used to manage complexity and enable better performance through dynamic adaptation or multimethod solvers A life-cycle environment may be the best long-term solution Some relevant sites:   (performance tools)  (component specification)