Richard Fateman CS 282 Lecture 18b1 Automatic Differentiation Lecture 18b.

Slides:



Advertisements
Similar presentations
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Advertisements

1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Solving Systems of Equations. Rule of Thumb: More equations than unknowns  system is unlikely to have a solution. Same number of equations as unknowns.
1 CS 201 Compiler Construction Lecture 7 Code Optimizations: Partial Redundancy Elimination.
Derivatives Part A. Review of Basic Rules f(x)=xf`(x)=1 f(x)=kx f`(x)= k f(x)=kx n f`(x)= (k*n)x (n-1)    1.) The derivative of a variable is 1. 2.)
1cs542g-term Notes. 2 Solving Nonlinear Systems  Most thoroughly explored in the context of optimization  For systems arising in implicit time.
Richard Fateman CS 282 Lecture 21 Basic Domains of Interest used in Computer Algebra Systems Lecture 2.
INTEGRALS 5. Indefinite Integrals INTEGRALS The notation ∫ f(x) dx is traditionally used for an antiderivative of f and is called an indefinite integral.
INTEGRALS The Substitution Rule In this section, we will learn: To substitute a new variable in place of an existing expression in a function,
1 CS 201 Compiler Construction Lecture 5 Code Optimizations: Copy Propagation & Elimination.
Lecture 2: Numerical Differentiation. Derivative as a gradient
Faster Multiplication, Powering of Polynomials
2.5 The Chain Rule If f and g are both differentiable and F is the composite function defined by F(x)=f(g(x)), then F is differentiable and F′ is given.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Martin Mendez UASLP Chapter 61 Unit II.
1 CS 201 Compiler Construction Lecture 6 Code Optimizations: Constant Propagation & Folding.
5.5 The Substitution Rule. The Substitution Rule If u=g(x) is differentiable function whose range is an interval I, and f is continuous on I, then The.
Differential Equations and Boundary Value Problems
VARIABLES AND EXPRESSIONS NUMERICAL EXPRESSION A NAME FOR A NUMBER VARIABLE OR ALGEBRAIC EXPRESSION AN EXPRESSION THAT CONTAINS AT LEAST ONE VARIABLE.
8 Indefinite Integrals Case Study 8.1 Concepts of Indefinite Integrals
First Order Linear Equations Integrating Factors.
Section 2.5 – Implicit Differentiation
Lecture 35 Numerical Analysis. Chapter 7 Ordinary Differential Equations.
Solving Systems of Equations. Rule of Thumb: More equations than unknowns  system is unlikely to have a solution. Same number of equations as unknowns.
Clicker Question 1 What is the derivative of f(x) = 7x 4 + e x sin(x)? – A. 28x 3 + e x cos(x) – B. 28x 3 – e x cos(x) – C. 28x 3 + e x (cos(x) + sin(x))
STROUD Worked examples and exercises are in the text PROGRAMME F11 DIFFERENTIATION = slope finding.
More U-Substitution February 17, Substitution Rule for Indefinite Integrals If u = g(x) is a differentiable function whose range is an interval.
Erin Catto Blizzard Entertainment Numerical Integration.
5.4 The Fundamental Theorem. The Fundamental Theorem of Calculus, Part 1 If f is continuous on, then the function has a derivative at every point in,
What’s in an optimizing compiler?
Newton's Method for Functions of Several Variables Joe Castle & Megan Grywalski.
CHAPTER 4 INTEGRATION. Integration is the process inverse of differentiation process. The integration process is used to find the area of region under.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 10, 10/30/2003 Prof. Roy Levow.
Integration by Substitution Undoing the Chain Rule TS: Making Decisions After Reflection & Review.
Section 2.5 – Implicit Differentiation. Explicit Equations The functions that we have differentiated and handled so far can be described by expressing.
Differential Equations MTH 242 Lecture # 13 Dr. Manshoor Ahmed.
Numerical Methods for Solving ODEs Euler Method. Ordinary Differential Equations  A differential equation is an equation in which includes derivatives.
Carnegie Mellon Lecture 14 Loop Optimization and Array Analysis I. Motivation II. Data dependence analysis Chapter , 11.6 Dror E. MaydanCS243:
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics General Education Department Mathematics.
Review Calculus (Make sure you study RS and WS 5.3)
3 DERIVATIVES.  Remember, they are valid only when x is measured in radians.  For more details see Chapter 3, Section 4 or the PowerPoint file Chapter3_Sec4.ppt.
Dr. Jie Zou PHY Chapter 2 Solution of Nonlinear Equations: Lecture (II)
CS001 Introduction to Programming Day 2 Sujana Jyothi
Chapter 1 First-Order Differential Equations Shurong Sun University of Jinan Semester 1,
U Substitution Method of Integration 5.5. The chain rule allows us to differentiate a wide variety of functions, but we are able to find antiderivatives.
5.6 Integration by Substitution Method (U-substitution) Thurs Dec 3 Do Now Find the derivative of.
5.6 Integration by Substitution Method (U-substitution) Fri Feb 5 Do Now Find the derivative of.
Section 1.1 Basic Definitions and Terminology. DIFFERENTIAL EQUATIONS Definition: A differential equation (DE) is an equation containing the derivatives.
Barnett/Ziegler/Byleen Business Calculus 11e1 Learning Objectives for Section 13.2 Integration by Substitution ■ The student will be able to integrate.
Chapter 2 1. Chapter Summary Sets The Language of Sets - Sec 2.1 – Lecture 8 Set Operations and Set Identities - Sec 2.2 – Lecture 9 Functions and sequences.
Solving Systems of Equations. Rule of Thumb: More equations than unknowns  system is unlikely to have a solution. Same number of equations as unknowns.
Section 14.2 Computing Partial Derivatives Algebraically
INTEGRATION & TECHNIQUES OF INTEGRATION
Basic Definitions and Terminology
Integration by Inspection & Substitution
4.2 – Implicit Differentiation
DIFFERENTIATION & INTEGRATION
4.2 – Implicit Differentiation
Basic Block Optimizations
Integration by Substitution
More U-Substitution: The “Double-U” Substitution with ArcTan(u)
Numerical Analysis Lecture 45.
4.5 Integration by Substitution The chain rule allows us to differentiate a wide variety of functions, but we are able to find antiderivatives for.
Calculus (Make sure you study RS and WS 5.3)
Rates that Change Based on another Rate Changing
Section 5.5: The Substitution Rule
Section 2 Integration by Substitution
Pivoting, Perturbation Analysis, Scaling and Equilibration
Basic Block Optimizations
Presentation transcript:

Richard Fateman CS 282 Lecture 18b1 Automatic Differentiation Lecture 18b

Richard Fateman CS 282 Lecture 18b2 What is “Automatic Differentiation” We want to compute both f(x), f’(x), given a program for f(x):=exp(x^2)*tan(x)+log(x). We could do this via…

Richard Fateman CS 282 Lecture 18b3 But this is too much work. All we really want is a way to compute, for a specific numeric value of x, f(x), and f’(x). Say the value is x=2.0. We could evaluate those two expressions, or we could evaluate a taylor series for f around the point 2.0, whose coefficients would give us f(2) and f’(2)/2! Or we could do something cleverer.

Richard Fateman CS 282 Lecture 18b4 ADOL, ADOL-C, Fortran (77, 90) or C “program differentiation” Rewrite (via a compiler?) a more-or-less arbitrary Fortran program that computes f(c) into a program that computes  f(c), f’(c)  for a real value c. How to do this? Two ways, “forward” and “reverse”

Richard Fateman CS 282 Lecture 18b5 Forward automatic differentiation In the program to compute f(c) make these substitutions: –Where we see the expression x, replace it with  c,1  the rule d(u) = 1 if u=x. –Where we see a constant, e.g. 43, use  43,0  the rule d(u)=0 if u does not depend on x –Where we see an operation  a,b  +  r,s  use  a+r,b+s  the rule d(u+v) = du+dv –Where we see an operation  a,b  *  r,s  use  a*r,b*r+a*s  the rule d(u*v)= u*dv+v*du –Where we see cos(  r,s  ) use  cos(r),-sin(r)*s  the rule d(cos(u))=-sin(u)du –Etc.

Richard Fateman CS 282 Lecture 18b6 Two ways to do forward AD Rewrite the program so that every variable is now TWO variables, var and var x, the extra var x is the derivative. Leave the “program” alone and overlay the meaning of all the operations like +, *, cos, with generalizations on pairs , 

Richard Fateman CS 282 Lecture 18b7 Does this work “hands off”? (Either method..) Mostly, but not entirely. Some constructions are not differentiable. Floor, Ceiling, decision points. Some constructions don’t exist in normal fortran (etc.) e.g. “get the derivative” needed for use by –Newton iteration: z i+1 := z i - f(z i )/f’(z i ) Is it still useful? Apparently enough to build up a minor industry in enhanced compilers. see for comprehensive references If the programs start in Lisp, the transformations are clean and short. See AD.pdf

Richard Fateman CS 282 Lecture 18b8 Where is this useful? A number of numerical methods benefit from having derivatives conveniently available: minimization, root-finding, ODE solving. Generalized to F(x,y,z), and partial derivatives of order 2, 3, (or more?)

Richard Fateman CS 282 Lecture 18b9 What about reverse differentiation? The reverse mode computes derivatives of dependent variables with respect to intermediate variables and propagates from one statement to the previous statement according to the chain rule. Thus, the reverse mode requires a “reversal” of the program execution, or alternatively, a stacking of intermediate values for later access.

Richard Fateman CS 282 Lecture 18b10 Reverse.. Given independent vars x = {x 1, x 2, …} And dependent vars y = {y 1 (x), y 2 (x) …}, and also temporary vars… (read grey stuff upward…) A=f(x1,x2) B=g(A,x3) … we know d/dx1 of B is dG/d1 * dA/dx1 R=A+B … we know d/dx1 of R is dA/dx1+dB/dx1, what are they?

Richard Fateman CS 282 Lecture 18b11 Which is better? Hybrid methods have been advocated to take advantage of both approaches. Reverse has a major time advantage for many vars; it has the disadvantage of non-linear memory use– depends on program flow, loops iterations must be stacked. Numerous conferences on this subject. Lots of tools, substantial cleverness.