Combinatorial Aspects of Automatic Differentiation Paul D. Hovland Mathematics & Computer Science Division Argonne National Laboratory.

Slides:



Advertisements
Similar presentations
Chapter 6 Matrix Algebra.
Advertisements

Yi Heng Second Order Differentiation Bommerholz – Summer School 2006.
Automatic Differentiation Tutorial
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Jaewook Shin, Priyadarshini Malusare and Paul D. Hovland Mathematics and Computer Science Division Argonne National Laboratory Design and Implementation.
Lecture 11: Code Optimization CS 540 George Mason University.
A Backtracking Correction Heuristic for Graph Coloring Algorithms Sanjukta Bhowmick and Paul Hovland Argonne National Laboratory Funded by DOE.
A Discrete Adjoint-Based Approach for Optimization Problems on 3D Unstructured Meshes Dimitri J. Mavriplis Department of Mechanical Engineering University.
P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2013 – 12269: Continuous Solution for Boundary Value Problems.
P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2014 – 35148: Continuous Solution for Boundary Value Problems.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Solving Linear Systems (Numerical Recipes, Chap 2)
Engineering Optimization – Concepts and Applications Engineering Optimization Concepts and Applications Fred van Keulen Matthijs Langelaar CLA H21.1
The loss function, the normal equation,
1cs542g-term Notes. 2 Solving Nonlinear Systems  Most thoroughly explored in the context of optimization  For systems arising in implicit time.
Linear Algebraic Equations
MA5233: Computational Mathematics
Motion Analysis (contd.) Slides are from RPI Registration Class.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Transfer Functions Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: The following terminology.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
Ordinary Differential Equations (ODEs)
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 6. Eigenvalue problems.
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Chapter 10 Review: Matrix Algebra
Combinatorial Scientific Computing is concerned with the development, analysis and utilization of discrete algorithms in scientific and engineering applications.
Systems of Linear Equation and Matrices
Matrix Algebra. Quick Review Quick Review Solutions.
Slide Chapter 7 Systems and Matrices 7.1 Solving Systems of Two Equations.
A Brief Overview of Methods for Computing Derivatives Wenbin Yu Department of Mechanical & Aerospace Engineering Utah State University, Logan, UT.
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
An Introduction to Programming and Algorithms. Course Objectives A basic understanding of engineering problem solving process. A basic understanding of.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Analysis of Algorithms
CMPS 1371 Introduction to Computing for Engineers MATRICES.
Automatic Differentiation: Introduction Automatic differentiation (AD) is a technology for transforming a subprogram that computes some function into a.
Linear Systems Iterative Solutions CSE 541 Roger Crawfis.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Progress in identification of damping: Energy-based method with incomplete and noisy data Marco Prandina University of Liverpool.
Combinatorial Scientific Computing and Petascale Simulation (CSCAPES) A SciDAC Institute Funded by DOE’s Office of Science Investigators Alex Pothen, Florin.
S ystems Analysis Laboratory Helsinki University of Technology Automated Solution of Realistic Near-Optimal Aircraft Trajectories Using Computational Optimal.
Numerical Analysis Intro to Scientific Computing.
Texas A&M University, Department of Aerospace Engineering AN EMBEDDED FUNCTION TOOL FOR MODELING AND SIMULATING ESTIMATION PROBLEMS IN AEROSPACE ENGINEERING.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Circuit Simulation using Matrix Exponential Method Shih-Hung Weng, Quan Chen and Chung-Kuan Cheng CSE Department, UC San Diego, CA Contact:
BITS Pilani Pilani Campus Data Structure and Algorithms Design Dr. Maheswari Karthikeyan Lecture1.
Quality of Service for Numerical Components Lori Freitag Diachin, Paul Hovland, Kate Keahey, Lois McInnes, Boyana Norris, Padma Raghavan.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
CSCAPES Mission Research and development Provide load balancing and parallelization toolkits for petascale computation Develop advanced automatic differentiation.
Texas A&M University, Department of Aerospace Engineering AUTOMATIC GENERATION AND INTEGRATION OF EQUATIONS OF MOTION BY OPERATOR OVER- LOADING TECHNIQUES.
6.5.4 Back-Propagation Computation in Fully-Connected MLP.
Chapter 7. Classification and Prediction
A Backtracking Correction Heuristic
Objective of This Course
Mathematical Foundations of BME Reza Shadmehr
Verification and Validation Using Code-Based Sensitivity Techniques
Math review - scalars, vectors, and matrices
Major Design Strategies
Chapter 3 Modeling in the Time Domain
Presentation transcript:

Combinatorial Aspects of Automatic Differentiation Paul D. Hovland Mathematics & Computer Science Division Argonne National Laboratory

Group Members Andrew Lyons (U.Chicago) Priyadarshini Malusare Boyana Norris Ilya Safro Jaewook Shin Jean Utke (joint w/ UChicago) Programmer/postdoc TBD Alumni: J. Abate, S. Bhowmick, C. Bischof, A. Griewank, P. Khademi, J. Kim, U. Naumann, L. Roh, M. Strout, B. Winnicka

Funding Current: –DOE: Applied Mathematics Base Program –DOE: Computer Science Base Program –DOE: CSCAPES SciDAC Institute –NASA: ECCO-II Consortium –NSF: Collaborations in Math & Geoscience Past: –DOE: Applied Math –NASA Langley –NSF: ITR

Outline Introduction to automatic differentiation (AD) Some application highlights Some combinatorial problems in AD –Derivative accumulation –Minimal representation –Optimal checkpointing strategy –Graph coloring Summary of available tools More application highlights Conclusions

Lingo AD = Automatic Differentiation CFD = computational fluid dynamics Gradient = first derivatives of a scalar function Jacobian = first derivatives of a vector function Hessian = second derivatives of a (scalar) function ODE = ordinary differential equation PDE = partial differential equation Sparse matrix = matrix where most entries are zero Krylov method = iterative algorithm for solving system of linear equations Optimization = finding minimum of some function

Why Automatic Differentiation? Derivatives are used for –Measuring the sensitivity of a simulation to unknown or poorly known parameters (e.g.,how does ocean bottom topography affect flow?) –Assessing the role of algorithm parameters in a numerical solution (e.g., how does the filter radius impact a large eddy simulation?) –Computing a descent direction in numerical optimization (e.g., compute gradients and Hessians for use in aircraft design) –Solving discretized nonlinear PDEs (e.g., compute Jacobians or Jacobian-vector products for combustion simulations)

Why Automatic Differentiation? (cont.) Alternative #1:hand-coded derivatives –hand-coding is tedious and error-prone –coding time grows with program size and complexity –automatically generated code may be faster –no natural way to compute derivative matrix-vector products (Jv, J T v, Hv) without forming full matrix –maintenance is a problem (must maintain consistency) Alternative #2: finite difference approximations –introduce truncation error that in the best case halves the digits of accuracy –cost grows with number of independents –no natural way to compute J T v products

AD in a Nutshell Technique for computing analytic derivatives of programs (millions of loc) Derivatives used in optimization, nonlinear PDEs, sensitivity analysis, inverse problems, etc. AD = analytic differentiation of elementary functions + propagation by chain rule –Every programming language provides a limited number of elementary mathematical functions –Thus, every function computed by a program may be viewed as the composition of these so-called intrinsic functions –Derivatives for the intrinsic functions are known and can be combined using the chain rule of differential calculus Associativity of the chain rule leads to two main modes: forward and reverse Can be implemented using source transformation or operator overloading

What is feasible & practical Jacobians of functions with small number (1—1000) of independent variables (forward mode) Jacobians of functions with small number (1—100) of dependent variables (reverse/adjoint mode) Very (extremely) large, but (very) sparse Jacobians and Hessians (forward mode plus coloring) Jacobian-vector products (forward mode) Transposed-Jacobian-vector products (adjoint mode) Hessian-vector products (forward + adjoint modes) Large, dense Jacobian matrices that are effectively sparse or effectively low rank (e.g., see Abdel-Khalik et al., AD2008)

Application: Sensitivity analysis in simplified climate model Sensitivity of flow through Drake Passage to ocean bottom topography –Finite difference approximations: 23 days –Naïve automatic differentiation: 2 hours 23 minutes –Smart automatic differentiation: 22 minutes

Application: solution of nonlinear PDEs Jacobian-free Newton-Krylov solution of model problem (driven cavity) AD + TFQMR: AD + BiCGStab: FD(w=10 -5 ) + GMRES: FD(w=10 -3 ) + GMRES: AD + GMRES: FD(w=10 -5 ) + BiCGStab: FD(w=10 -7 ) + GMRES: does not converge FD + TFQMR: does not converge AD = automatic differentiation FD = finite differences W = noise estimate for Brown-Saad

Application: mesh quality optimization Optimization used to move mesh vertices to create elements as close to equilateral triangles/tetrahedrons as possible Semi-automatic differentiation is 10-25% faster than hand-coding for gradient and 5-10% faster than hand-coding for Hessian Automatic differentiation is a factor 2-5 times faster than finite differences Before After

Combinatorial problems in AD Derivative accumulation Minimal representation Optimal checkpointing strategy Graph coloring

Accumulating Derivatives Represent function using a directed acyclic graph (DAG) Computational graph –Vertices are intermediate variables, annotated with function/operator –Edges are unweighted Linearized computational graph –Edge weights are partial derivatives –Vertex labels are not needed Compute sum of weights over all paths from independent to dependent variable(s), where the path weight is the product of the weights of all edges along the path [Baur & Strassen] Find an order in which to compute path weights that minimizes cost (flops): identify common subpaths (=common subexpressions in Jacobian)

A simple example b = sin(y)*y a = exp(x) c = a*b f = a*c y x sin exp * * a b f * c

A simple example t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c y x f a c t0 y d0 b a a y x sin exp * * a b f * c

Brute force Compute products of edge weights along all paths Sum all paths from same source to same target Hope the compiler does a good job recognizing common subexpressions dfdy = d0*y*a*a + t0*a*a dfdx = a*b*a + a*c 8 mults 2 adds y x f a c t0 y d0 b a a v1v1 v2v2 V -1 v0v0 v4v4 v3v3 v5v5

Vertex elimination f a c b a Multiply each in edge by each out edge, add the product to the edge from the predecessor to the successor Conserves path weights This procedure always terminates The terminal form is a bipartite graph

Vertex elimination f Multiply each in edge by each out edge, add the product to the edge from the predecessor to the successor Conserves path weights This procedure always terminates The terminal form is a bipartite graph a*a c + a*b

Forward mode: eliminate vertices in topological order y x f a c t0 y d0 b a a t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c v1v1 v2v2 v3v3 v4v4

Forward mode: eliminate vertices in topological order x y f a c d1 b a a t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = t0 + d0*y v2v2 v3v3 v4v4

Forward mode: eliminate vertices in topological order x y f c d2 b a a t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = t0 + d0*y d2 = d1*a v3v3 v4v4

Forward mode: eliminate vertices in topological order x y f d4 d2d3 a t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = t0 + d0*y d2 = d1*a d3 = a*b d4 = a*c v4v4

Forward mode: eliminate vertices in topological order x y f dfdxdfdy t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = t0 + d0*y d2 = d1*a d3 = a*b d4 = a*c dfdy = d2*a dfdx = d4 + d3*a 6 mults 2 adds

Reverse mode: eliminate in reverse topological order y x f a c t0 y d0 b a a t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c v1v1 v2v2 v3v3 v4v4

Reverse mode: eliminate in reverse topological order y x f d1 d2 t0 y d0a t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = a*a d2 = c + b*a v1v1 v2v2 v3v3

Reverse mode: eliminate in reverse topological order y x f d4 d2 d3 d0a t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = a*a d2 = c + b*a d3 = t0*d1 d4 = y*d1 v1v1 v3v3

Reverse mode: eliminate in reverse topological order y x f d2 dfdy a t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = a*a d2 = c + b*a d3 = t0*d1 d4 = y*d1 dfdy = d3 + d0*d4 v3v3

Reverse mode: eliminate in reverse topological order x y f dfdxdfdy t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = a*a d2 = c + b*a d3 = t0*d1 d4 = y*d1 dfdy = d3 + d0*d4 dfdx = a*d2 6 mults 2 adds

“Cross-country” mode y x f a c t0 y d0 b a a t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c v1v1 v2v2 v3v3 v4v4

“Cross-country” mode x y f a c d1 b a a t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = t0 + d0*y v2v2 v3v3 v4v4

“Cross-country” mode x y f d2d3 d1 a t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = t0 + d0*y d2 = a*a d3 = c + b*a v2v2 v3v3

“Cross-country” mode y x f d3 dfdy a t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = t0 + d0*y d2 = a*a d3 = c + b*a dfdy = d1*d2 v3v3

“Cross-country” mode x y f dfdxdfdy t0 = sin(y) d0 = cos(y) b = t0*y a = exp(x) c = a*b f = a*c d1 = t0 + d0*y d2 = a*a d3 = c + b*a dfdy = d1*d2 dfdx = a*d3 5 mults 2 adds

Edge elimination Eliminate only a single in edge or out edge, rather than all in and out edges, as in vertex elimination Forward or back elimination y x f a c t0 y d0 b a a v1v1 v2v2 v3v3 v4v4

Edge elimination Eliminate only a single in edge or out edge, rather than all in and out edges, as in vertex elimination Forward or back elimination y x f a c t0 y d0 b * a a a v1v1 v2v2 v3v3 v4v4

Edge elimination Eliminate only a single in edge or out edge, rather than all in and out edges, as in vertex elimination Forward or back elimination y x f a * a c t0 y d0 b * a a a v1v1 v2v2 v3v3 v4v4

What We Know Reverse mode is within a factor of 2 of optimal for functions with one dependent variable. This bound is sharp. Eliminating one edge at a time (edge elimination) can be cheaper than eliminating entire vertices at a time Eliminating pairs of edges (face elimination) can be cheaper than edge elimination Optimal Jacobian accumulation is NP hard Various linear and polynomial time heuristics Optimal orderings for certain special cases –Polynomial time algorithm for optimal vertex elimination in the case where all intermediate vertices have one out edge

What We Don’t Know What is the worst case ratio of optimal vertex elimination to optimal edge elimination? … edge to face? When should we stop? (minimal representation problem) How to adjust cost metric to account for cache/memory behavior? Is O(min(#indeps,#deps)) a sharp bound for the cost of computing a general Jacobian relative to the function?

Minimal graph of a Jacobian (scarcity) Reduce graph to one with minimal number of edges (or smallest number of DOF) How to find the minimal graph? Relationship to matrix properties? Avoid “catastrophic fill in” (empirical evidence that this happens in practice) In essence, represent Jacobian as sum/product of sparse/low-rank matrices Original DAGMinimal DAGBipartite DAG

Practical Matters: constructing computational graphs At compile time (source transformation) –Structure of graph is known, but edge weights are not: in effect, implement inspector (symbolic) phase at compile time (offline), executor (numeric) phase at run time (online) –In order to assemble graph from individual statements, must be able to resolve aliases, be able to match variable definitions and uses –Scope of computational graph construction is usually limited to statements or basic blocks –Computational graph usually has O(10)—O(100) vertices At run time (operator overloading) –Structure and weights both discovered at runtime –Completely online—cannot afford polynomial time algorithms to analyze graph –Computational graph may have O(10,000) vertices

Reverse Mode and Checkpointing Reverse mode propagates derivatives from dependent variables to independent variables Cost is proportional to number of dependent variables: ideal for scalar functions with large number of independents Gradient can be computed for small constant times cost of function Partial derivatives of most intrinsics require value of input(s) –d(a*b)/db = a, d(a*b)/da = b –d(sin(x))/dx = cos(x) Reversal of control flow requires that all intermediate values are preserved or recomputed Standard strategies rely on –Taping: store all intermediate variables when they are overwritten –Checkpointing: store data needed to restore state, recompute

Checkpointing: notation Perform forward (function) computation Perform reverse (adjoint) computation Record overwritten variables (taping) Checkpoint state at subroutine entry Restore state from checkpoint Combinations are possible:

Timestepping with no checkpoints

Checkpointing based on timesteps

Checkpointing based on timesteps: parallelism

Checkpointing based on timestep: hierarchical

3-level checkpointing Suppose we use 3-level checkpointing for 100 time steps, checkpointing every 25 steps (at level 1), every 5 steps (at level 2), and every step (at level 3) Then, the checkpoints are stored in the following order: 25, 50, 75, 80, 85, 90, 95, 96, 97, 98, 99, 91, 92, 93, 94, 86, 87, 88, 89, 81, 82, 83, 84, 76, 77, 78, 79, 55, 60, 65, 70, 71, 72, 73, 74, 66, 67, 68, 69, 61, 62, 63, 64, 56, 57, 58, 59, 51, 52, 53, 54, 30, 35, 40, 45, 46, …

Checkpointing based on call tree functionsplit modejoint mode Assume all subroutines have structure x.1, call child, x.2 split mode: 1.1t, 2.1t, 3.1t, 3.2t, 2.2t, 1.2t, 1.2a, 2.2a, 3.2a, 3.1a, 2.1a, 1.1a joint mode: 1.1t, 2.1, 3.1, 3.2, 2.2, 1.2t, 1.2a, 2.1t, 3.1, 3.2, 2.2t, 2.2a, 3.1t, 3.2t, 3.2a, 3.1a, 2.1a, 1.1a

Checkpointing real applications In practice, need a combination of all of these techniques At the timestep level, 2- or 3-level checkpointing is typical: too many timesteps to checkpoint every timestep At the call tree level, some mixture of joint and split mode is desirable –Pure split mode consumes too much memory –Pure joint mode wastes time recomputing at the lowest levels of the call tree Currently, OpenAD provides a templating mechanism to simplify the use of mixed checkpointing strategies Future research will attempt to automate some of the checkpointing strategy selection, including dynamic adaptation

Matrix Coloring Jacobian matrices are often sparse The forward mode of AD computes J × S, where S is usually an identity matrix or a vector Can “compress” Jacobian by choosing S such that structurally orthogonal columns are combined A set of columns are structurally orthogonal if no two of them have nonzeros in the same row Equivalent problem: color the graph whose adjacency matrix is J T J Equivalent problem: distance-2 color the bipartite graph of J

Matrix Coloring

Compressed Jacobian

Scenarios N small: use forward mode on full computation M small: use reverse mode on full computation M & N large, P small: use reverse mode on A, forward mode on B&C M & N large, K small: use reverse mode on A&B, forward mode on C N, P, K, M large, Jacobians of A, B, C sparse: compressed forward mode N, P, K, M large, Jacobians of A, B, C low rank: scarce forward mode n pk m A BC

Scenarios N small: use forward mode on full computation M small: use reverse mode on full computation M & N large, P small: use reverse mode on A, forward mode on B&C M & N large, K small: use reverse mode on A&B, forward mode on C N, P, K, M large, Jacobians of A, B, C sparse: compressed forward mode N, P, K, M large, Jacobians of A, B, C low rank: scarce forward mode N, P, K, M large: Jacobians of A, B, C dense: what to do? n pk m A BC

What to do for very large, dense Jacobian matrices? Jacobian matrix might be large and dense, but “effectively sparse” –Many entries below some threshold ε (“almost zero”) –Can tolerate errors up to δ in entries greater than ε –Example: advection-diffusion for finite time step: nonlocal terms fall off exponentially –Solution: do a partial coloring to compress this dense Jacobian: requires solving a modified graph coloring problem Jacobian might be large and dense, but “effectively low rank” –Can be well approximated by a low rank matrix –Jacobian-vector products (and J T v) are cheap –Adaptation of efficient subspace method uses Jv and J T v in random directions to build low rank approximation (Abdel-Khalik et al., AD08)

Tools Fortran 95 C/C++ Fortran 77 MATLAB

Tools: Fortran 95 TAF (FastOpt) –Commercial tool –Support for (almost) all of Fortran 95 –Used extensively in geophysical sciences applications Tapenade (INRIA) –Support for many Fortran 95 features –Developed by a team with extensive compiler experience OpenAD/F (Argonne/UChicago/Rice) –Support for many Fortran 95 features –Developed by a team with expertise in combintorial algorithms, compilers, software engineering, and numerical analysis –Development driven by climate model & astrophysics code All three: forward and reverse; source transformation

Tools: C/C++ ADOL-C (Dresden) –Mature tool –Support for all of C++ –Operator overloading; forward and reverse modes ADIC (Argonne/UChicago) –Support for all of C, some C++ –Source transformation; forward mode (reverse under development) –New version (2.0) based on industrial strength compiler infrastructure –Shares some infrastructure with OpenAD/F SACADO: –Operator overloading; forward and reverse modes –See Phipps presentation TAC++ (FastOpt) –Commercial tool (under development) –Support for much of C/C++ –Source transformation; forward and reverse modes –Shares some infrastructure with TAF

Tools: Fortran 77 ADIFOR (Rice/Argonne) –Mature and very robust tool –Support for all of Fortran 77 –Forward and (adequate) reverse modes –Hundreds of users; ~150 citations AdiMat (Aachen): source transformation MAD (Cranfield/TOMLAB): operator overloading Various research prototypes Tools: MATLAB

Application highlights Atmospheric chemistry Breast cancer biostatistical analysis CFD: CFL3D, NSC2KE, (Fluent 4.52: Aachen)... Chemical kinetics Climate and weather: MITGCM, MM5, CICE Semiconductor device simulation Water reservoir simulation

Tuned parametersStandard parameters - Simulated (yellow) and observed (green) March ice thickness (m) Parameter tuning: sea ice model

Differentiated Toolkit: CVODES (nee SensPVODE) Diurnal kinetics advection- diffusion equation 100x100 structured grid 16 Pentium III nodes

Sensitivity Analysis: mesoscale weather model

Conclusions & Future Work Automatic differentiation research involves a wide range of combinatorial problems AD is a powerful tool for scientific computing Modern automatic differentiation tools are robust and produce efficient code for complex simulation codes –Robustness requires an industrial-strength compiler infrastructure –Efficiency requires sophisticated compiler analysis Effective use of automatic differentiation depends on insight into problem structure Future Work –Further develop and test techniques for computing Jacobians that are effectively sparse or effectively low rank –Develop techniques to automatically generate complex and adaptive checkpointing strategies

About Argonne First national laboratory 25 miles southwest of Chicago ~3000 employees (~1000 scientists) Mathematics and Computer Science (MCS) Division –~100 scientific staff –Research in parallel programming models, grid computing, scientific computing, component-based software engineering, applied math, … Summer programs for undergrads and grad students (Jan/Feb deadlines) Research semester and co-op opportunities for undergrads Postdoc, programmer, and scientist positions available –AD group has 1 opening

For More Information Andreas Griewank, Evaluating Derivatives, SIAM, Griewank, “On Automatic Differentiation”; this and other technical reports available online at: AD in general: ADIFOR: ADIC: OpenAD: Other tools: