Technologies for Computational Science Boyana Norris Argonne National Laboratory
2 March 15, 2005 Outline Automatic differentiation Applications in optimization How AD works Components for scientific computing Performance evaluation and modeling Bringing it all together
3 March 15, 2005 What is automatic differentiation? Automatic Differentiation (AD): a technology for automatically augmenting computer programs, including arbitrarily complex simulations, with statements for the computation of derivatives, also known as sensitivities. The Computational Differentiation Project at Argonne National Laboratory
4 March 15, 2005 What is it good for? The need to accurately and efficiently compute derivatives of complicated simulation codes arises regularly in Optimization (finding a minimum) Solving nonlinear differential equations Sensitivity and uncertainty analysis Inverse Problems, including: Data assimilation Parameter identification AD tools automate the generation of derivative code without precluding the exploitation of high- level knowledge.
5 March 15, 2005 Sensitivity Analysis MM5 (a mesoscale weather model, NCAR and Penn State) Impact of perturbations of initial temperature on temperature in the system; low- amplitude supersonic waves clearly visible with AD (left), but not visible with divided difference approximations of derivatives (right).
6 March 15, 2005 Parameter Tuning Sea Ice Model (Todd Arbetter, University of Colorado) Ice thickness for the standard (left) and tuned (right) parameter values, with actual observations at two locations indicated.
7 March 15, 2005 Often we look for extreme, or optimum, values that a function has on a given domain. More formally: Unconstrained minimization problems are ones in which Note: Since a maximum of f is a minimum of -f, we need only to look for the minimum. Optimization Problems
8 March 15, 2005 Method for finding x such that f(x) = 0 For optimization, we want f(x*) = 0, so iterate: Newton’s Method 1 kkkk kxfxfxx 1,...1,0 ),()]([ 12 kkkk kxfxfxx 1,...1,0 ),()]([
9 March 15, 2005 Example: Minimum Surface Objective: Find a surface with the minimal area that satisfies Dirichlet boundary conditions and is constrained to lie above a solid plate. Solution Error
10 March 15, 2005 Example: Minimum Surface (Cont.) Solution Error
11 March 15, 2005 We can compute derivatives via: Analytic code By hand Automatic differentiation Numerical approximation: finite differencing (FD). For finite differences, recall:
12 March 15, 2005 Why use AD? Compared with other methods (numerical differentiation via finite differences, hand coding, etc.), AD offers a number of advantages: Accuracy Performance Reduced effort Algorithm-awareness
13 March 15, 2005 More accurate derivatives = faster convergence Application: modeling transonic flow over an ONERA M6 airplane wing.
14 March 15, 2005 Who uses it? AD has been successfully employed in applications in: Atmospheric chemistry Breast cancer modeling Computational fluid dynamics Mesoscale climate modeling Network Enabled Optimization System Semiconductor device modeling And also: groundwater remediation, multidisciplinary design optimization, reactor engineering, super- conductor simulation, multibody simulations, molecular dynamics simulations, power system analysis, water reservoir simulation, and storm modeling.
15 March 15, 2005 How AD Works Every programming language provides a limited number of elementary mathematical functions, e.g., +, -, *, /, sin, cos,… Thus, every function computed by a program may be viewed as the composition of these so-called intrinsic functions Derivatives for the intrinsic functions are known and can be combined using the chain rule of differential calculus
16 March 15, 2005 A Simple Example (Fortran) x = /4.0 a = sin(x) b = cos(x) t = a/b Differentiated program x = /4.0 dxdx = 1.0 ! Initialize “seed matrix” a = sin(x) dadx = cos(x)*dxdx ! TL/CR b = cos(x) dbdx = -sin(x)*dxdx ! TL/CR t = a/b dtda = 1.0/b ! TL dtdb = -a/(b*b) ! TL dtdx = dtda*dadx + dtdb*dbdx ! CR Key dtdx: CR: Chain rule TL: Table lookup Original program
17 March 15, 2005 Modes of AD Forward mode Mode used in simple example Propagates derivative vectors, often denoted u or g_u Derivative vector u contains derivatives of u with respect to independent variables Time and storage proportional to vector length (# indeps) Reverse (or adjoint) mode Propagates adjoints, denoted ū or u_bar Adjoint ū contains derivatives of dependent variables with respect to u Propagation starts with dependent variables—must reverse flow of computation Time proportional to adjoint vector length (# dependents) Storage proportional to number of operations Because of this limitation, often applied to subprograms
18 March 15, 2005 Another Simple Example (C code) DERIV_val(y): value of program variable y DERIV_grad(y): derivative object associated with y Original code: y = x1*x2*x3*x4; typedef struct { double value; double grad[ad_GRAD_MAX]; } DERIV_TYPE; ad_loc_0 = DERIV_val(x1) * DERIV_val(x2); ad_loc_1 = ad_loc_0 * DERIV_val(x3); dy/dx4 ad_loc_2 = ad_loc_1 * DERIV_val(x4); y ad_adj_0 = ad_loc_0 * DERIV_val(x4); dy/dx3 ad_adj_1 = DERIV_val(x3) * DERIV_val(x4); ad_adj_2 = DERIV_val(x1) * ad_adj_1; dy/dx2 ad_adj_3 = DERIV_val(x2) * ad_adj_1; dy/dx1 ad_axpy_4(DERIV_grad(y), ad_adj_3, DERIV_grad(x1), ad_adj_2, DERIV_grad(x2), ad_adj_0, DERIV_grad(x3), ad_loc_1, DERIV_grad(x4)); DERIV_val(y) = ad_loc_2; reverse (or adjoint) mode of AD original value forward mode of AD
19 March 15, 2005 The AD Process ApplicationCodeApplicationCodeADToolADTool Code with Derivatives Derivatives ControlFilesControlFiles DerivativeProgramDerivativeProgram Compile & Link Compile User’sDerivativeDriverUser’sDerivativeDriver ADSupportLibrariesADSupportLibraries
20 March 15, 2005 Ways of Implementing AD Operator Overloading Use language features to generate trace (“tape”) of computation -> implicit computational graph Easy to implement; hard to optimize Examples: ADOL-C Source Transformation (ST) Relies on compiler technology Hard to implement; more powerful Examples: ADIFOR, ADIC, ODYSSEE, TAMC
21 March 15, 2005 Example AD Tool Architecture (ST) AD engine isolated front- and backends via XAIF (XML AD Interface Format) XML representation of the computational graph Unifies “relevant” Fortran and C constructs Implements abstractions, e.g. “derivative object” Shared “plug-in” differentiation modules
22 March 15, 2005 XAIF Representation Reverse Mode
23 March 15, 2005 XAIF - Abstraction of the Program at “AD-Level”: Expression Example Only the core structure of the program is reflected in XAIF: Control flow Variable information for active variables Basic blocks – Expression DAGs + * var_2var_3 const var_1 =
24 March 15, 2005 Estimates of Incremental Computational Costs
25 March 15, 2005 Hessian Module The Hessian module can compute H, H*V, V^T*H*V, W^T*H*V, as well as arbitrary elements of the Hessian (e.g., diagonal, n predetermined entries). Tradeoffs in code generation between source expansion and speed. Hessian/Function Ratio:
26 March 15, 2005 Techniques for Improving Performance of AD Code Exploit sparsity (SparsLinC and/or coloring) Exploit parallelism data: stripmine derivative computation task: multithread independent loops time: break computation into phases; pipeline derivative computations Exploit interface contractions For computations of the form Compute dg/dx, df/dg, multiply to form df/dx Exploit mathematics (e.g., differentiating through linear/nonlinear equation solvers) CD
27 March 15, 2005 ANL Tools for AD ADIFOR was developed in collaboration with Rice University full support for Fortran 77 support for parallelism via MPI and PVM support for sparse Jacobians ADIC is the first & only compiler-based AD tool for ANSI C support for the complete ANSI standard will soon support a large subset of C++ XAIF specification and differentiation modules (OpenAD project)
28 March 15, 2005 AD in Numerical Toolkits NEOS Network-Enabled Optimization Server Efficient computation of gradients for large problems, where the objective function has the form PETSc (Portable Extensible Toolkit for Scientific Computation) solvers (work in progress) User only needs to provide the sequential “subdomain update” function in F77 or ANSI-C. Differentiated version of toolkit enables optimization/sensitivity analysis of models based on PETSc Differentiated version of toolkit
29 March 15, 2005 PETSc codeUser code Application Initialization Minimum Function Evaluation Hessian Evaluation Post- Processing PCKSP Numerical Library Linear Solvers (SLES) Solve min F(u) Optimization Solution (PETSc & TAO) AD-generated code Main Routine Nonlinear Solvers (SNES) Gradient Evaluation Semi-smooth Methods Others Complementarity Newton Trust Region GPCGInterior PointLMVMKTOthers Bound Constrained Optimization Levenberg Marquardt Gauss- Newton LMVM Levenberg Marquardt with Bound Constraints Others Nonlinear Least Squares LMVM with Bound Constraints Line Search Trust Region Newton-based Methods Limited Memory Variable Metric (LMVM) Method Unconstrained Minimization Conjugate Gradient Methods Fletcher- Reeves Polak- Ribiére Polak- Ribiére-Plus Others TAO interfaces to external libraries for parallel vectors, matrices, and linear solvers: PETSc (initial interface) Trilinos (SNL - capability via ESI – thanks to M. Heroux and A. Williams) Global Arrays (PNNL, J. Nieplocha et al.) Etc.
30 March 15, 2005 Using AD with the Toolkit for Advanced Optimization (TAO) Parallel Hessian assembly G lobal-to-local scatter of ghost values Parallel function assembly Local Function computation G lobal-to-local scatter of ghost values Local Hessian computation Local Min.Function computation ADIFOR or ADIC Local Hessian computation Script file Coded manually; can be automated Seed matrix initialization PETSc codeUser codeAD-generated code
31 March 15, 2005 Outline Automatic differentiation Components for scientific computing Introduction Example applications Performance evaluation and modeling Summary CCA Common Component Architecture
32 March 15, 2005 Software development approaches Libraries: collections of subroutines Object-oriented libraries: collections of classes Components Architectures Unstructured code (everything in main)
33 March 15, 2005 Components Working definition: a component is a piece of software that can be composed with other components within a framework; composition can be either static (at link time) or dynamic (at run time) “plug-and-play” model for building applications For more info: C. Szyperski, Component Software: Beyond Object- Oriented Programming, ACM Press, New York, 1998 Components enable Software and tool interoperability Automation of performance instrumentation/monitoring Application adaptivity (automated or user-guided) Pictorial intro
34 March 15, 2005 Object-oriented vs component-oriented development Component-oriented development can be viewed as augmenting OOD with certain policies, e.g., require that certain abstract interfaces be implemented Components, once compiled, require a special execution environment OO techniques are useful for building individual components by relatively small teams; component technologies facilitate sharing of code developed by different groups by addressing issues in Language interoperability Via interface definition language (IDL) Well-defined abstract interfaces Enable “plug-and-play” Dynamic composability Components can discover information about their environment (e.g., interface discovery) from framework and connected components Can convert from an object orientation to a component orientation Automatic tools can help with conversion (ongoing work by C. Rasmussen and M. Sottile, LANL)
35 March 15, 2005 Motivating scientific applications Discretization Algebraic Solvers Parallel I/O Meshes Data Redistribution Physics Optimization Derivative Computation DiagnosticsSteeringVisualization Adaptive Solution Astrophysics Molecular structures Aerodynamics Fusion
36 March 15, 2005 Motivation: For Application Developers and Users You have difficulty managing multiple third-party libraries in your code You (want to) use more than two languages in your application Your code is long-lived and different pieces evolve at different rates You want to be able to swap competing implementations of the same idea and test without modifying any of your code You want to compose your application with some other(s) that weren’t originally designed to be combined
37 March 15, 2005 The model for scientific component programming Science Industry ? CCA
38 March 15, 2005 CCA Delivers Performance Local No CCA overhead within components Small overhead between components Small overhead for language interoperability Be aware of costs & design with them in mind Small costs, easily amortized Parallel No CCA overhead on parallel computing Use your favorite parallel programming model Supports SPMD and MPMD approaches Distributed (remote) No CCA overhead – performance depends on networks, protocols CCA frameworks support OGSA/Grid Services/Web Services and other approaches Maximum 0.2% overhead for CCA vs native C++ code for parallel molecular dynamics up to 170 CPUs Aggregate time for linear solver component in unconstrained minimization problem w/ PETSc
39 March 15, 2005 Overhead from Component Invocation Invoke a component with different arguments Array Complex Double Complex Compare with f77 method invocation Environment 500 MHz Pentium III Linux GCC Components took 3X longer Ensure granularity is appropriate! Paper by Bernholdt, Elwasif, Kohl and Epperly Function arg type f77Component Array 80 ns224ns Complex 75ns209ns Double complex 86ns241ns
40 March 15, 2005 Language interoperability: what is so hard? Native cfortran.h SWIG JNI Siloon Chasm Platform Dependent C C++ f77 f90 Python Java
41 March 15, 2005 SIDL/Babel makes all supported languages peers C C++ f77 f90 Python Java This is not a Lowest Common Denominator Solution!
42 March 15, 2005 CCA Concepts: Components and Ports Components provide or use one or more ports Components include some code which interacts with a CCA framework Frameworks provide services, such as component instantiation and port connection Objective Function FunctionPort Optimization Algorithm OptimizerPort GradientPort HessianPort GradientPort Function Gradient HessianPort Function Hessian Implementation details: CCA components… Inherit from gov.cca.Component Implement setServices method to register ports this component will provide and use Implement the ports they provide Use ports on other components Call getPort/releasePort methods of framework Services object Ports (interfaces) extend the gov.cca.Port interface
43 March 15, 2005 Given a rectangular 2-dimensional domain and boundary values along the edges of the domain Find the surface with minimal area that satisfies the boundary conditions, i.e., compute min f(x), where f: R R Solve using optimization components based on TAO (ANL) Example: Unconstrained Minimization Problem
44 March 15, 2005 Unconstrained Minimization Using a Structured Mesh Reused TAO Solver Driver/Physics
45 March 15, 2005 Computational Chemistry: Molecular Optimization Problem Domain: Optimization of molecular structures using quantum chemical methods Investigators: Yuri Alexeev (PNNL), Steve Benson (ANL), Curtis Janssen (SNL), Joe Kenny (SNL), Manoj Krishnan (PNNL), Lois McInnes (ANL), Jarek Nieplocha (PNNL), Jason Sarich (ANL), Theresa Windus (PNNL) Goals: Demonstrate interoperability among software packages, develop experience with large existing code bases, seed interest in chemistry domain
46 March 15, 2005 Molecular Optimization Overview Decouple geometry optimization from electronic structure Demonstrate interoperability of electronic structure components Build towards more challenging optimization problems, e.g., protein/ligand binding studies Components in gray can be swapped in to create new applications with different capabilities.
47 March 15, 2005 Wiring Diagram for Molecular Optimization Electronic structures components: MPQC (SNL) NWChem (PNNL) Optimization components: TAO (ANL) Linear algebra components: Global Arrays (PNNL) PETSc (ANL)
48 March 15, 2005 Outline Automatic differentiation Components for scientific computing Performance evaluation and modeling Performance evaluation challenges Component-based approach Motivating example: adaptive linear system solution A component infrastructure for performance monitoring and adaptation of applications Summary
49 March 15, 2005 Why Performance Model? Performance models enable understanding of the factors that affect performance Inform the tuning process (of application and machine) Identify bottlenecks Identify underperforming components Guide applications to the best machine Enable applications-driven architecture design Extrapolate the performance of future systems
50 March 15, 2005 Challenges in performance evaluation +Many tools for performance data gathering and analysis PAPI, TAU, SvPablo, Kojak, … Various interfaces, levels of automation, and approaches to information presentation User’s point of view -What do the different tools do? Which is most appropriate for a given application? -(How) can multiple tools be used in concert? -I have tons of performance data, now what? -What automatic tuning tools are available, what exactly do they do? -How hard is it to install/learn/use tool X? -Is instrumented code portable? What’s the overhead of instrumentation? How does code evolution affect the performance analysis process?
51 March 15, 2005 Incomplete list of tools Source instrumentation: TAU/PDT, KOJAK (MPI/OpenMP), SvPablo, Performance Assertions, …TAUPDTKOJAK SvPablo Binary instrumentation: HPCToolkit, Paradyn, DyninstAPI, …HPCToolkitParadynDyninstAPI Performance monitoring: MetaSim Tracer (memory), PAPI, HPCToolkit, Sigma++ (memory), DPOMP (OpenMP), mpiP, gprof, psrun, …MetaSimPAPI HPCToolkit Modeling/analysis/prediction: MetaSim Convolver (memory), DIMEMAS(network), SvPablo (scalability), Paradyn, Sigma++, …MetaSimDIMEMASSvPablo Paradyn Source/binary optimization: Automated Empirical Optimization of Software (ATLAS), OSKI, ROSEATLASOSKI Runtime adaptation: ActiveHarmony, SALSAActiveHarmonySALSA
52 March 15, 2005 Incomplete list of tools Source instrumentation: TAU/PDT, KOJAK (MPI/OpenMP), SvPablo, Performance Assertions, …TAUPDTKOJAK SvPablo Binary instrumentation: HPCToolkit, Paradyn, DyninstAPI, …HPCToolkitParadynDyninstAPI Performance monitoring: MetaSim Tracer (memory), PAPI, HPCToolkit, Sigma++ (memory), DPOMP (OpenMP), mpiP, gprof, psrun, …MetaSimPAPI HPCToolkit Modeling/analysis/prediction: MetaSim Convolver (memory), DIMEMAS(network), SvPablo (scalability), Paradyn, Sigma++, …MetaSimDIMEMASSvPablo Paradyn Source/binary optimization: Automated Empirical Optimization of Software (ATLAS), OSKI, ROSEATLASOSKI Runtime adaptation: ActiveHarmony, SALSAActiveHarmonySALSA
53 March 15, 2005 Incomplete list of tools Source instrumentation: TAU/PDT, KOJAK (MPI/OpenMP), SvPablo, Performance Assertions, …TAUPDTKOJAK SvPablo Binary instrumentation: HPCToolkit, Paradyn, DyninstAPI, …HPCToolkitParadynDyninstAPI Performance monitoring: MetaSim Tracer (memory), PAPI, HPCToolkit, Sigma++ (memory), DPOMP (OpenMP), mpiP, gprof, psrun, …MetaSimPAPI HPCToolkit Modeling/analysis/prediction: MetaSim Convolver (memory), DIMEMAS(network), SvPablo (scalability), Paradyn, Sigma++, …MetaSimDIMEMASSvPablo Paradyn Source/binary optimization: Automated Empirical Optimization of Software (ATLAS), OSKI, ROSEATLASOSKI Runtime adaptation: ActiveHarmony, SALSAActiveHarmonySALSA
54 March 15, 2005 Challenges (where is the complexity?) More effective use integration Tool developer’s perspective Overhead of initially implementing one-to-one interoperabilty Ongoing management of dependencies on other tools Individual Scientist Perspective Learning curve for performance tools less time to focus on own research (modeling, physics, mathematics, optimization) Potentially significant time investment needed to find out whether/how using someone else’s tool would improve performance tend to do own hand-coded optimizations (time- consuming, non-reusable) Lack of tools that automate (at least partially) algorithm discovery, assembly, configuration, and enable runtime adaptivity
55 March 15, 2005 What can be done How to manage complexity? Provide Performance tools that are truly interoperable Uniform easy access to tools Component implementations of software, esp. supporting numerical codes, such as linear algebra algorithms New algorithms (e.g., interactive/dynamic techniques, algorithm composition) Implementation approach: components, both for tools and the application software
56 March 15, 2005 Performance Evaluation Research Center (
57 March 15, 2005 What is being done No “integrated” environment for performance monitoring, analysis, and optimization (yet) Most past efforts One-to-one tool interoperability More recently OSPAT (initial meeting at SC’04), focus on common data representation and interfaces Tool-independent performance databases: PerfDMF Eclipse parallel tools project (LANL) …
58 March 15, 2005 OSPAT The following areas were recommended for OSPAT to investigate: A common instrumentation API for source level, compiler level, library level, binary instrumentation A common probe interface for routine entry and exit events A common profile database schema An API to walk the callstack and examine the heap memory A common API for thread creation and fork interface Visualization components for drawing histograms and hierarchical displays typically used by performance tools
59 March 15, 2005 Example: component infrastructure for multimethod linear solvers Goal: provide a framework for Performance monitoring of numerical components Dynamic adaptativity, based on: Off-line analyses of past performance information Online analysis of current execution performance information Motivating application examples: Driven cavity flow [Coffey et al, 2003], nonlinear PDE solution FUN3D – incompressible and compressible Euler equations Prior work in multimethod linear solvers McInnes et al, ’03, Bhowmick et al,’03 and ’05, Norris at al. ’05.
60 March 15, 2005 Adaptive Linear System Solution Motivation: Approximately 80% of total solution time devoted to linear system solution Multi-phase nonlinear solution method, requiring the solution of linear systems with varying levels of ill-conditioning [Kelley and Keyes, 1998] New approach aiming to reduce overall time to solution Combine more robust (but more costly) methods when needed in some phases with faster (but less powerful) methods in other phases Dynamically select a new preconditioner in each phase based on CFL number
61 March 15, 2005 Example: driven cavity flow Linear solver: GMRES(30), vary only fill level of ILU preconditioner Adaptive heuristic based on: Previous linear solution convergence rate, nonlinear solution convergence rate, rate of increase of linear solution iterations 96x96 mesh, Grashof = 10 5, lid velocity = 100 Intel P4 Xeon, dual 2.2 GHz, 4GB RAM
62 March 15, 2005 Bringing it all together Integration of ongoing efforts in Performance tools: common interfaces and data represenation (leverage OSPAT, PerfDMF, TAU performance interfaces, and similar efforts) Numerical components: emerging common interfaces (e.g., TOPS solver interfaces) increase choice of solution method automated composition and adaptation strategies Code generation, e.g., AD Long term Is a more organized (but not too restrictive) environment for scientific software lifecycle development possible/desirable?
63 March 15, 2005 Multimethod linear solver components Nonlinear Solver Mesh Linear Solver Adaptive Heuristic Performance Monitor MeshCheckpointingPhysics Linear Solver A Nonlinear Solver Linear Solver B Linear Solver C
64 March 15, 2005 AD as Component Factory Both NEOS and PETSc rely on a well- defined function interface in order to provide derivatives via AD Extend this idea to components Function AD Tool Jacobian
65 March 15, 2005 Summary Automation at all levels of the application development process can simplify and speed up application development and result in better software quality and performance AD addresses the wide-spread need for accurate and efficient derivative computations CCA defines a high-performance component model, enabling large-scale software development A growing array of performance tools and methodologies aid in understanding and fine-tuning application performance Current and future work: bringing these technologies together in a coherent way, making large-scale scientific application development as easy as possible
66 March 15, 2005 Acknowledgments Paul Hovland, Jean Utke, Lois Curfman McInnes (ANL) Sanjukta Bhowmick (ANL/Columbia) Ivana Veljkovic, Padma Raghavan (Penn State) Sameer Shende, Al Malony (U. Oregon) CCA and PERC members Funding: DOE and NSF
67 March 15, 2005 For More Information Automatic differentiation Andreas Griewank. Evaluating Derivatives: Principles and Techniques of Alogrithmic Differentiation, SIAM, : publications, tools, etc. : ADIC server neos.mcs.anl.gov : NEOS server neos.mcs.anl.gov Common component architecture Performance tools perc.nersc.gov perc.nersc.gov Student opportunities at MCS/ANL www-fp.mcs.anl.gov/division/information/educational_programs/studentopps.html www-fp.mcs.anl.gov/division/information/educational_programs/studentopps.html Boyana Norris Web: