Technologies for Computational Science Boyana Norris Argonne National Laboratory

Slides:



Advertisements
Similar presentations
Integration of MBSE and Virtual Engineering for Detailed Design
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Yi Heng Second Order Differentiation Bommerholz – Summer School 2006.
Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
Automatic Differentiation Tutorial
A Discrete Adjoint-Based Approach for Optimization Problems on 3D Unstructured Meshes Dimitri J. Mavriplis Department of Mechanical Engineering University.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
ARCS Data Analysis Software An overview of the ARCS software management plan Michael Aivazis California Institute of Technology ARCS Baseline Review March.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
© Copyright Eliyahu Brutman Programming Techniques Course.
Software Engineering Module 1 -Components Teaching unit 3 – Advanced development Ernesto Damiani Free University of Bozen - Bolzano Lesson 2 – Components.
© 2008 IBM Corporation Behavioral Models for Software Development Andrei Kirshin, Dolev Dotan, Alan Hartman January 2008.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Domain-Specific Software Engineering Alex Adamec.
The Re-engineering and Reuse of Software
Basic Concepts The Unified Modeling Language (UML) SYSC System Analysis and Design.
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Center for Component Technology for Terascale Simulation Software (aka Common Component Architecture) (aka CCA) Rob Armstrong & the CCA Working Group Sandia.
Challenges in Performance Evaluation and Improvement of Scientific Codes Boyana Norris Argonne National Laboratory Ivana.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
1 TOPS Solver Components Language-independent software components for the scalable solution of large linear and nonlinear algebraic systems arising from.
CQoS Update Li Li, Boyana Norris, Lois Curfman McInnes Argonne National Laboratory Kevin Huck University of Oregon.
Michelle Mills Strout OpenAnalysis: Representation- Independent Program Analysis CCA Meeting January 17, 2008.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
Victor Eijkhout and Erika Fuentes, ICL, University of Tennessee SuperComputing 2003 A Proposed Standard for Numerical Metadata.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Grid Computing Research Lab SUNY Binghamton 1 XCAT-C++: A High Performance Distributed CCA Framework Madhu Govindaraju.
The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:
Components for Beam Dynamics Douglas R. Dechow, Tech-X Lois Curfman McInnes, ANL Boyana Norris, ANL With thanks to the Common Component Architecture (CCA)
March 27, 2007HPC 07 - Norfolk, VA1 C++ Reflection for High Performance Problem Solving Environments Tharaka Devadithya 1, Kenneth Chiu 2, Wei Lu 1 1.
Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.
Combinatorial Scientific Computing and Petascale Simulation (CSCAPES) A SciDAC Institute Funded by DOE’s Office of Science Investigators Alex Pothen, Florin.
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Software Engineering Principles. SE Principles Principles are statements describing desirable properties of the product and process.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.
Texas A&M University, Department of Aerospace Engineering AN EMBEDDED FUNCTION TOOL FOR MODELING AND SIMULATING ESTIMATION PROBLEMS IN AEROSPACE ENGINEERING.
Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
Center for Component Technology for Terascale Simulation Software (CCTTSS) 110 April 2002CCA Forum, Townsend, TN CCA Status, Code Walkthroughs, and Demonstrations.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
Quality of Service for Numerical Components Lori Freitag Diachin, Paul Hovland, Kate Keahey, Lois McInnes, Boyana Norris, Padma Raghavan.
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
CCA Common Component Architecture CCA Forum Tutorial Working Group Common Component Architecture.
CSCAPES Mission Research and development Provide load balancing and parallelization toolkits for petascale computation Develop advanced automatic differentiation.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Boyana Norris Argonne National Laboratory Ivana Veljkovic
Object-Oriented Software Engineering Using UML, Patterns, and Java,
Parallel Programming By J. H. Wang May 2, 2017.
GENERAL VIEW OF KRATOS MULTIPHYSICS
Verification and Validation Using Code-Based Sensitivity Techniques
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Technologies for Computational Science Boyana Norris Argonne National Laboratory

2 March 15, 2005 Outline Automatic differentiation  Applications in optimization  How AD works Components for scientific computing Performance evaluation and modeling Bringing it all together

3 March 15, 2005 What is automatic differentiation? Automatic Differentiation (AD): a technology for automatically augmenting computer programs, including arbitrarily complex simulations, with statements for the computation of derivatives, also known as sensitivities. The Computational Differentiation Project at Argonne National Laboratory

4 March 15, 2005 What is it good for? The need to accurately and efficiently compute derivatives of complicated simulation codes arises regularly in  Optimization (finding a minimum)  Solving nonlinear differential equations  Sensitivity and uncertainty analysis  Inverse Problems, including: Data assimilation Parameter identification AD tools automate the generation of derivative code without precluding the exploitation of high- level knowledge.

5 March 15, 2005 Sensitivity Analysis MM5 (a mesoscale weather model, NCAR and Penn State) Impact of perturbations of initial temperature on temperature in the system; low- amplitude supersonic waves clearly visible with AD (left), but not visible with divided difference approximations of derivatives (right).

6 March 15, 2005 Parameter Tuning Sea Ice Model (Todd Arbetter, University of Colorado) Ice thickness for the standard (left) and tuned (right) parameter values, with actual observations at two locations indicated.

7 March 15, 2005 Often we look for extreme, or optimum, values that a function has on a given domain. More formally: Unconstrained minimization problems are ones in which Note: Since a maximum of f is a minimum of -f, we need only to look for the minimum. Optimization Problems

8 March 15, 2005 Method for finding x such that f(x) = 0 For optimization, we want  f(x*) = 0, so iterate: Newton’s Method     1 kkkk kxfxfxx  1,...1,0 ),()]([   12 kkkk kxfxfxx  1,...1,0 ),()]([

9 March 15, 2005 Example: Minimum Surface Objective: Find a surface with the minimal area that satisfies Dirichlet boundary conditions and is constrained to lie above a solid plate. Solution Error

10 March 15, 2005 Example: Minimum Surface (Cont.) Solution Error

11 March 15, 2005 We can compute derivatives via: Analytic code  By hand  Automatic differentiation Numerical approximation: finite differencing (FD). For finite differences, recall:

12 March 15, 2005 Why use AD? Compared with other methods (numerical differentiation via finite differences, hand coding, etc.), AD offers a number of advantages:  Accuracy  Performance  Reduced effort  Algorithm-awareness

13 March 15, 2005 More accurate derivatives = faster convergence Application: modeling transonic flow over an ONERA M6 airplane wing.

14 March 15, 2005 Who uses it? AD has been successfully employed in applications in: Atmospheric chemistry Breast cancer modeling Computational fluid dynamics Mesoscale climate modeling Network Enabled Optimization System Semiconductor device modeling And also: groundwater remediation, multidisciplinary design optimization, reactor engineering, super- conductor simulation, multibody simulations, molecular dynamics simulations, power system analysis, water reservoir simulation, and storm modeling.

15 March 15, 2005 How AD Works Every programming language provides a limited number of elementary mathematical functions, e.g., +, -, *, /, sin, cos,… Thus, every function computed by a program may be viewed as the composition of these so-called intrinsic functions Derivatives for the intrinsic functions are known and can be combined using the chain rule of differential calculus

16 March 15, 2005 A Simple Example (Fortran) x = /4.0 a = sin(x) b = cos(x) t = a/b Differentiated program x = /4.0 dxdx = 1.0 ! Initialize “seed matrix” a = sin(x) dadx = cos(x)*dxdx ! TL/CR b = cos(x) dbdx = -sin(x)*dxdx ! TL/CR t = a/b dtda = 1.0/b ! TL dtdb = -a/(b*b) ! TL dtdx = dtda*dadx + dtdb*dbdx ! CR Key dtdx: CR: Chain rule TL: Table lookup Original program

17 March 15, 2005 Modes of AD Forward mode  Mode used in simple example  Propagates derivative vectors, often denoted  u or g_u  Derivative vector  u contains derivatives of u with respect to independent variables  Time and storage proportional to vector length (# indeps) Reverse (or adjoint) mode  Propagates adjoints, denoted ū or u_bar  Adjoint ū contains derivatives of dependent variables with respect to u  Propagation starts with dependent variables—must reverse flow of computation  Time proportional to adjoint vector length (# dependents)  Storage proportional to number of operations  Because of this limitation, often applied to subprograms

18 March 15, 2005 Another Simple Example (C code) DERIV_val(y): value of program variable y DERIV_grad(y): derivative object associated with y Original code: y = x1*x2*x3*x4; typedef struct { double value; double grad[ad_GRAD_MAX]; } DERIV_TYPE; ad_loc_0 = DERIV_val(x1) * DERIV_val(x2); ad_loc_1 = ad_loc_0 * DERIV_val(x3); dy/dx4 ad_loc_2 = ad_loc_1 * DERIV_val(x4); y ad_adj_0 = ad_loc_0 * DERIV_val(x4); dy/dx3 ad_adj_1 = DERIV_val(x3) * DERIV_val(x4); ad_adj_2 = DERIV_val(x1) * ad_adj_1; dy/dx2 ad_adj_3 = DERIV_val(x2) * ad_adj_1; dy/dx1 ad_axpy_4(DERIV_grad(y), ad_adj_3, DERIV_grad(x1), ad_adj_2, DERIV_grad(x2), ad_adj_0, DERIV_grad(x3), ad_loc_1, DERIV_grad(x4)); DERIV_val(y) = ad_loc_2; reverse (or adjoint) mode of AD original value forward mode of AD

19 March 15, 2005 The AD Process ApplicationCodeApplicationCodeADToolADTool Code with Derivatives Derivatives ControlFilesControlFiles DerivativeProgramDerivativeProgram Compile & Link Compile User’sDerivativeDriverUser’sDerivativeDriver ADSupportLibrariesADSupportLibraries

20 March 15, 2005 Ways of Implementing AD Operator Overloading  Use language features to generate trace (“tape”) of computation -> implicit computational graph  Easy to implement; hard to optimize  Examples: ADOL-C Source Transformation (ST)  Relies on compiler technology  Hard to implement; more powerful  Examples: ADIFOR, ADIC, ODYSSEE, TAMC

21 March 15, 2005 Example AD Tool Architecture (ST) AD engine isolated front- and backends via XAIF (XML AD Interface Format)  XML representation of the computational graph  Unifies “relevant” Fortran and C constructs  Implements abstractions, e.g. “derivative object” Shared “plug-in” differentiation modules

22 March 15, 2005 XAIF Representation Reverse Mode

23 March 15, 2005 XAIF - Abstraction of the Program at “AD-Level”: Expression Example Only the core structure of the program is reflected in XAIF: Control flow Variable information for active variables Basic blocks – Expression DAGs + * var_2var_3 const var_1 =

24 March 15, 2005 Estimates of Incremental Computational Costs

25 March 15, 2005 Hessian Module The Hessian module can compute H, H*V, V^T*H*V, W^T*H*V, as well as arbitrary elements of the Hessian (e.g., diagonal, n predetermined entries). Tradeoffs in code generation between source expansion and speed. Hessian/Function Ratio:

26 March 15, 2005 Techniques for Improving Performance of AD Code Exploit sparsity (SparsLinC and/or coloring) Exploit parallelism  data: stripmine derivative computation  task: multithread independent loops  time: break computation into phases; pipeline derivative computations Exploit interface contractions  For computations of the form  Compute dg/dx, df/dg, multiply to form df/dx Exploit mathematics (e.g., differentiating through linear/nonlinear equation solvers) CD

27 March 15, 2005 ANL Tools for AD ADIFOR was developed in collaboration with Rice University  full support for Fortran 77  support for parallelism via MPI and PVM  support for sparse Jacobians ADIC is the first & only compiler-based AD tool for ANSI C  support for the complete ANSI standard  will soon support a large subset of C++  XAIF specification and differentiation modules (OpenAD project) 

28 March 15, 2005 AD in Numerical Toolkits NEOS Network-Enabled Optimization Server   Efficient computation of gradients for large problems, where the objective function has the form PETSc (Portable Extensible Toolkit for Scientific Computation) solvers (work in progress)  User only needs to provide the sequential “subdomain update” function in F77 or ANSI-C.  Differentiated version of toolkit enables optimization/sensitivity analysis of models based on PETSc Differentiated version of toolkit 

29 March 15, 2005 PETSc codeUser code Application Initialization Minimum Function Evaluation Hessian Evaluation Post- Processing PCKSP Numerical Library Linear Solvers (SLES) Solve min F(u) Optimization Solution (PETSc & TAO) AD-generated code Main Routine Nonlinear Solvers (SNES) Gradient Evaluation Semi-smooth Methods Others Complementarity Newton Trust Region GPCGInterior PointLMVMKTOthers Bound Constrained Optimization Levenberg Marquardt Gauss- Newton LMVM Levenberg Marquardt with Bound Constraints Others Nonlinear Least Squares LMVM with Bound Constraints Line Search Trust Region Newton-based Methods Limited Memory Variable Metric (LMVM) Method Unconstrained Minimization Conjugate Gradient Methods Fletcher- Reeves Polak- Ribiére Polak- Ribiére-Plus Others TAO interfaces to external libraries for parallel vectors, matrices, and linear solvers: PETSc (initial interface) Trilinos (SNL - capability via ESI – thanks to M. Heroux and A. Williams) Global Arrays (PNNL, J. Nieplocha et al.) Etc.

30 March 15, 2005 Using AD with the Toolkit for Advanced Optimization (TAO) Parallel Hessian assembly G lobal-to-local scatter of ghost values Parallel function assembly Local Function computation G lobal-to-local scatter of ghost values Local Hessian computation Local Min.Function computation ADIFOR or ADIC Local Hessian computation Script file Coded manually; can be automated Seed matrix initialization PETSc codeUser codeAD-generated code

31 March 15, 2005 Outline Automatic differentiation Components for scientific computing  Introduction  Example applications Performance evaluation and modeling Summary CCA Common Component Architecture

32 March 15, 2005 Software development approaches Libraries: collections of subroutines Object-oriented libraries: collections of classes Components Architectures Unstructured code (everything in main)

33 March 15, 2005 Components Working definition: a component is a piece of software that can be composed with other components within a framework; composition can be either static (at link time) or dynamic (at run time)  “plug-and-play” model for building applications  For more info: C. Szyperski, Component Software: Beyond Object- Oriented Programming, ACM Press, New York, 1998 Components enable  Software and tool interoperability  Automation of performance instrumentation/monitoring  Application adaptivity (automated or user-guided) Pictorial intro

34 March 15, 2005 Object-oriented vs component-oriented development Component-oriented development can be viewed as augmenting OOD with certain policies, e.g., require that certain abstract interfaces be implemented Components, once compiled, require a special execution environment OO techniques are useful for building individual components by relatively small teams; component technologies facilitate sharing of code developed by different groups by addressing issues in  Language interoperability Via interface definition language (IDL)  Well-defined abstract interfaces Enable “plug-and-play”  Dynamic composability Components can discover information about their environment (e.g., interface discovery) from framework and connected components Can convert from an object orientation to a component orientation  Automatic tools can help with conversion (ongoing work by C. Rasmussen and M. Sottile, LANL)

35 March 15, 2005 Motivating scientific applications Discretization Algebraic Solvers Parallel I/O Meshes Data Redistribution Physics Optimization Derivative Computation DiagnosticsSteeringVisualization Adaptive Solution Astrophysics Molecular structures Aerodynamics Fusion

36 March 15, 2005 Motivation: For Application Developers and Users You have difficulty managing multiple third-party libraries in your code You (want to) use more than two languages in your application Your code is long-lived and different pieces evolve at different rates You want to be able to swap competing implementations of the same idea and test without modifying any of your code You want to compose your application with some other(s) that weren’t originally designed to be combined

37 March 15, 2005 The model for scientific component programming Science Industry ? CCA

38 March 15, 2005 CCA Delivers Performance Local No CCA overhead within components Small overhead between components Small overhead for language interoperability Be aware of costs & design with them in mind  Small costs, easily amortized Parallel No CCA overhead on parallel computing Use your favorite parallel programming model Supports SPMD and MPMD approaches Distributed (remote) No CCA overhead – performance depends on networks, protocols CCA frameworks support OGSA/Grid Services/Web Services and other approaches Maximum 0.2% overhead for CCA vs native C++ code for parallel molecular dynamics up to 170 CPUs Aggregate time for linear solver component in unconstrained minimization problem w/ PETSc

39 March 15, 2005 Overhead from Component Invocation Invoke a component with different arguments Array Complex Double Complex Compare with f77 method invocation Environment  500 MHz Pentium III  Linux  GCC Components took 3X longer Ensure granularity is appropriate! Paper by Bernholdt, Elwasif, Kohl and Epperly Function arg type f77Component Array 80 ns224ns Complex 75ns209ns Double complex 86ns241ns

40 March 15, 2005 Language interoperability: what is so hard? Native cfortran.h SWIG JNI Siloon Chasm Platform Dependent C C++ f77 f90 Python Java

41 March 15, 2005 SIDL/Babel makes all supported languages peers C C++ f77 f90 Python Java This is not a Lowest Common Denominator Solution!

42 March 15, 2005 CCA Concepts: Components and Ports  Components provide or use one or more ports  Components include some code which interacts with a CCA framework  Frameworks provide services, such as component instantiation and port connection Objective Function FunctionPort Optimization Algorithm OptimizerPort GradientPort HessianPort GradientPort Function Gradient HessianPort Function Hessian Implementation details:  CCA components… Inherit from gov.cca.Component Implement setServices method to register ports this component will provide and use Implement the ports they provide Use ports on other components Call getPort/releasePort methods of framework Services object  Ports (interfaces) extend the gov.cca.Port interface

43 March 15, 2005 Given a rectangular 2-dimensional domain and boundary values along the edges of the domain Find the surface with minimal area that satisfies the boundary conditions, i.e., compute min f(x), where f: R  R Solve using optimization components based on TAO (ANL) Example: Unconstrained Minimization Problem

44 March 15, 2005 Unconstrained Minimization Using a Structured Mesh Reused TAO Solver Driver/Physics

45 March 15, 2005 Computational Chemistry: Molecular Optimization Problem Domain: Optimization of molecular structures using quantum chemical methods Investigators: Yuri Alexeev (PNNL), Steve Benson (ANL), Curtis Janssen (SNL), Joe Kenny (SNL), Manoj Krishnan (PNNL), Lois McInnes (ANL), Jarek Nieplocha (PNNL), Jason Sarich (ANL), Theresa Windus (PNNL) Goals: Demonstrate interoperability among software packages, develop experience with large existing code bases, seed interest in chemistry domain

46 March 15, 2005 Molecular Optimization Overview Decouple geometry optimization from electronic structure Demonstrate interoperability of electronic structure components Build towards more challenging optimization problems, e.g., protein/ligand binding studies Components in gray can be swapped in to create new applications with different capabilities.

47 March 15, 2005 Wiring Diagram for Molecular Optimization Electronic structures components: MPQC (SNL) NWChem (PNNL) Optimization components: TAO (ANL) Linear algebra components: Global Arrays (PNNL) PETSc (ANL)

48 March 15, 2005 Outline Automatic differentiation Components for scientific computing Performance evaluation and modeling  Performance evaluation challenges  Component-based approach  Motivating example: adaptive linear system solution  A component infrastructure for performance monitoring and adaptation of applications Summary

49 March 15, 2005 Why Performance Model? Performance models enable understanding of the factors that affect performance  Inform the tuning process (of application and machine) Identify bottlenecks Identify underperforming components  Guide applications to the best machine  Enable applications-driven architecture design  Extrapolate the performance of future systems

50 March 15, 2005 Challenges in performance evaluation +Many tools for performance data gathering and analysis  PAPI, TAU, SvPablo, Kojak, …  Various interfaces, levels of automation, and approaches to information presentation  User’s point of view -What do the different tools do? Which is most appropriate for a given application? -(How) can multiple tools be used in concert? -I have tons of performance data, now what? -What automatic tuning tools are available, what exactly do they do? -How hard is it to install/learn/use tool X? -Is instrumented code portable? What’s the overhead of instrumentation? How does code evolution affect the performance analysis process?

51 March 15, 2005 Incomplete list of tools Source instrumentation: TAU/PDT, KOJAK (MPI/OpenMP), SvPablo, Performance Assertions, …TAUPDTKOJAK SvPablo Binary instrumentation: HPCToolkit, Paradyn, DyninstAPI, …HPCToolkitParadynDyninstAPI Performance monitoring: MetaSim Tracer (memory), PAPI, HPCToolkit, Sigma++ (memory), DPOMP (OpenMP), mpiP, gprof, psrun, …MetaSimPAPI HPCToolkit Modeling/analysis/prediction: MetaSim Convolver (memory), DIMEMAS(network), SvPablo (scalability), Paradyn, Sigma++, …MetaSimDIMEMASSvPablo Paradyn Source/binary optimization: Automated Empirical Optimization of Software (ATLAS), OSKI, ROSEATLASOSKI Runtime adaptation: ActiveHarmony, SALSAActiveHarmonySALSA

52 March 15, 2005 Incomplete list of tools Source instrumentation: TAU/PDT, KOJAK (MPI/OpenMP), SvPablo, Performance Assertions, …TAUPDTKOJAK SvPablo Binary instrumentation: HPCToolkit, Paradyn, DyninstAPI, …HPCToolkitParadynDyninstAPI Performance monitoring: MetaSim Tracer (memory), PAPI, HPCToolkit, Sigma++ (memory), DPOMP (OpenMP), mpiP, gprof, psrun, …MetaSimPAPI HPCToolkit Modeling/analysis/prediction: MetaSim Convolver (memory), DIMEMAS(network), SvPablo (scalability), Paradyn, Sigma++, …MetaSimDIMEMASSvPablo Paradyn Source/binary optimization: Automated Empirical Optimization of Software (ATLAS), OSKI, ROSEATLASOSKI Runtime adaptation: ActiveHarmony, SALSAActiveHarmonySALSA

53 March 15, 2005 Incomplete list of tools Source instrumentation: TAU/PDT, KOJAK (MPI/OpenMP), SvPablo, Performance Assertions, …TAUPDTKOJAK SvPablo Binary instrumentation: HPCToolkit, Paradyn, DyninstAPI, …HPCToolkitParadynDyninstAPI Performance monitoring: MetaSim Tracer (memory), PAPI, HPCToolkit, Sigma++ (memory), DPOMP (OpenMP), mpiP, gprof, psrun, …MetaSimPAPI HPCToolkit Modeling/analysis/prediction: MetaSim Convolver (memory), DIMEMAS(network), SvPablo (scalability), Paradyn, Sigma++, …MetaSimDIMEMASSvPablo Paradyn Source/binary optimization: Automated Empirical Optimization of Software (ATLAS), OSKI, ROSEATLASOSKI Runtime adaptation: ActiveHarmony, SALSAActiveHarmonySALSA

54 March 15, 2005 Challenges (where is the complexity?) More effective use  integration Tool developer’s perspective  Overhead of initially implementing one-to-one interoperabilty  Ongoing management of dependencies on other tools Individual Scientist Perspective  Learning curve for performance tools  less time to focus on own research (modeling, physics, mathematics, optimization)  Potentially significant time investment needed to find out whether/how using someone else’s tool would improve performance  tend to do own hand-coded optimizations (time- consuming, non-reusable)  Lack of tools that automate (at least partially) algorithm discovery, assembly, configuration, and enable runtime adaptivity

55 March 15, 2005 What can be done How to manage complexity? Provide  Performance tools that are truly interoperable  Uniform easy access to tools  Component implementations of software, esp. supporting numerical codes, such as linear algebra algorithms  New algorithms (e.g., interactive/dynamic techniques, algorithm composition) Implementation approach: components, both for tools and the application software

56 March 15, 2005 Performance Evaluation Research Center (

57 March 15, 2005 What is being done No “integrated” environment for performance monitoring, analysis, and optimization (yet) Most past efforts  One-to-one tool interoperability More recently  OSPAT (initial meeting at SC’04), focus on common data representation and interfaces  Tool-independent performance databases: PerfDMF  Eclipse parallel tools project (LANL)  …

58 March 15, 2005 OSPAT The following areas were recommended for OSPAT to investigate:  A common instrumentation API for source level, compiler level, library level, binary instrumentation  A common probe interface for routine entry and exit events  A common profile database schema  An API to walk the callstack and examine the heap memory  A common API for thread creation and fork interface  Visualization components for drawing histograms and hierarchical displays typically used by performance tools

59 March 15, 2005 Example: component infrastructure for multimethod linear solvers Goal: provide a framework for  Performance monitoring of numerical components  Dynamic adaptativity, based on: Off-line analyses of past performance information Online analysis of current execution performance information Motivating application examples:  Driven cavity flow [Coffey et al, 2003], nonlinear PDE solution  FUN3D – incompressible and compressible Euler equations Prior work in multimethod linear solvers  McInnes et al, ’03, Bhowmick et al,’03 and ’05, Norris at al. ’05.

60 March 15, 2005 Adaptive Linear System Solution Motivation:  Approximately 80% of total solution time devoted to linear system solution  Multi-phase nonlinear solution method, requiring the solution of linear systems with varying levels of ill-conditioning [Kelley and Keyes, 1998] New approach aiming to reduce overall time to solution  Combine more robust (but more costly) methods when needed in some phases with faster (but less powerful) methods in other phases  Dynamically select a new preconditioner in each phase based on CFL number

61 March 15, 2005 Example: driven cavity flow Linear solver: GMRES(30), vary only fill level of ILU preconditioner Adaptive heuristic based on:  Previous linear solution convergence rate, nonlinear solution convergence rate, rate of increase of linear solution iterations 96x96 mesh, Grashof = 10 5, lid velocity = 100 Intel P4 Xeon, dual 2.2 GHz, 4GB RAM

62 March 15, 2005 Bringing it all together Integration of ongoing efforts in  Performance tools: common interfaces and data represenation (leverage OSPAT, PerfDMF, TAU performance interfaces, and similar efforts)  Numerical components: emerging common interfaces (e.g., TOPS solver interfaces) increase choice of solution method  automated composition and adaptation strategies  Code generation, e.g., AD Long term  Is a more organized (but not too restrictive) environment for scientific software lifecycle development possible/desirable?

63 March 15, 2005 Multimethod linear solver components Nonlinear Solver Mesh Linear Solver Adaptive Heuristic Performance Monitor MeshCheckpointingPhysics Linear Solver A Nonlinear Solver Linear Solver B Linear Solver C

64 March 15, 2005 AD as Component Factory Both NEOS and PETSc rely on a well- defined function interface in order to provide derivatives via AD Extend this idea to components Function AD Tool Jacobian

65 March 15, 2005 Summary Automation at all levels of the application development process can simplify and speed up application development and result in better software quality and performance  AD addresses the wide-spread need for accurate and efficient derivative computations  CCA defines a high-performance component model, enabling large-scale software development  A growing array of performance tools and methodologies aid in understanding and fine-tuning application performance Current and future work: bringing these technologies together in a coherent way, making large-scale scientific application development as easy as possible

66 March 15, 2005 Acknowledgments Paul Hovland, Jean Utke, Lois Curfman McInnes (ANL) Sanjukta Bhowmick (ANL/Columbia) Ivana Veljkovic, Padma Raghavan (Penn State) Sameer Shende, Al Malony (U. Oregon) CCA and PERC members Funding: DOE and NSF

67 March 15, 2005 For More Information Automatic differentiation  Andreas Griewank. Evaluating Derivatives: Principles and Techniques of Alogrithmic Differentiation, SIAM,  : publications, tools, etc.  : ADIC server  neos.mcs.anl.gov : NEOS server neos.mcs.anl.gov Common component architecture  Performance tools  perc.nersc.gov perc.nersc.gov Student opportunities at MCS/ANL  www-fp.mcs.anl.gov/division/information/educational_programs/studentopps.html www-fp.mcs.anl.gov/division/information/educational_programs/studentopps.html Boyana Norris  Web: