Dan Simon Cleveland State University Jang, Sun, and Mizutani Neuro-Fuzzy and Soft Computing Chapter 6 Derivative-Based Optimization 1.

Slides:



Advertisements
Similar presentations
Curved Trajectories towards Local Minimum of a Function Al Jimenez Mathematics Department California Polytechnic State University San Luis Obispo, CA
Advertisements

Optimization.
Optimization 吳育德.
Empirical Maximum Likelihood and Stochastic Process Lecture VIII.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. by Lale Yurttas, Texas A&M University Chapter 61.
Linear Discriminant Functions
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
PHYS2020 NUMERICAL ALGORITHM NOTES ROOTS OF EQUATIONS.
Function Optimization Newton’s Method. Conjugate Gradients
Nonlinear Optimization for Optimal Control
Tutorial 12 Unconstrained optimization Conjugate gradients.
Motion Analysis (contd.) Slides are from RPI Registration Class.
Methods For Nonlinear Least-Square Problems
CSci 6971: Image Registration Lecture 4: First Examples January 23, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart, RPI Dr.
Derivative-Based Fuzzy System Optimization Dan Simon Cleveland State University 1.
Unconstrained Optimization Problem
Improved BP algorithms ( first order gradient method) 1.BP with momentum 2.Delta- bar- delta 3.Decoupled momentum 4.RProp 5.Adaptive BP 6.Trinary BP 7.BP.
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
Function Optimization. Newton’s Method Conjugate Gradients Method
Linear Discriminant Functions Chapter 5 (Duda et al.)
Why Function Optimization ?
Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.
Tier I: Mathematical Methods of Optimization

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
UNCONSTRAINED MULTIVARIABLE
Collaborative Filtering Matrix Factorization Approach
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Chapter 17 Boundary Value Problems. Standard Form of Two-Point Boundary Value Problem In total, there are n 1 +n 2 =N boundary conditions.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Chapter 3 Roots of Equations. Objectives Understanding what roots problems are and where they occur in engineering and science Knowing how to determine.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
559 Fish 559; Lecture 25 Root Finding Methods. 559 What is Root Finding-I? Find the value for such that the following system of equations is satisfied:
One Dimensional Search
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.
Exam 1 Oct 3, closed book Place ITE 119, Time:12:30-1:45pm
Gradient Methods In Optimization
Variations on Backpropagation.
Survey of unconstrained optimization gradient based algorithms
Performance Surfaces.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Exam 1 Oct 3, closed book Place ITE 119, Time:12:30-1:45pm One double-sided cheat sheet (8.5in x 11in) allowed Bring your calculator to the exam Chapters.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
Numerical Analysis – Data Fitting Hanyang University Jong-Il Park.
Computational Optimization
CS5321 Numerical Optimization
Chapter 6.
Non-linear Least-Squares
Collaborative Filtering Matrix Factorization Approach
Variations on Backpropagation.
Outline Single neuron case: Nonlinear error correcting learning
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
Optimization Part II G.Anuradha.
Instructor :Dr. Aamer Iqbal Bhatti
L5 Optimal Design concepts pt A
6.5 Taylor Series Linearization
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Variations on Backpropagation.
Performance Surfaces.
Performance Optimization
Chapter 6.
Section 3: Second Order Methods
Steepest Descent Optimization
Presentation transcript:

Dan Simon Cleveland State University Jang, Sun, and Mizutani Neuro-Fuzzy and Soft Computing Chapter 6 Derivative-Based Optimization 1

Outline 1.Gradient Descent 2.The Newton-Raphson Method 3.The Levenberg–Marquardt Algorithm 4.Trust Region Methods 2

Gradient descent: head downhill Contour plot 3

Fuzzy controller optimization: Find the MF parameters that minimize tracking error min E(  ) with respect to   = n-element vector of MF parameters E(  ) = controller tracking error 4

x=-3: 0.1: 3; y=-2: 0.1: 2; for i=1:length(x), for j=1:length(y), z(i,j)=x(i)^2+10*y(j)^2; end, end contour(x,y,z) Contour plot of x 2 +10y 2  too small: convergence takes long time  too large: overshoot minimum 5

Gradient descent is a local optimization method (Rastrigin function) 6

How should we select the step size?  k too small: convergence takes long time  k too large: overshoot minimum Line minimization: Step Size Selection 7

Recall the general Taylor series expansion: f(x) = f(x 0 ) + f’(x 0 )(x – x 0 ) + …Therefore, Step Size Selection The minimum of E(  k+1 ) occurs when its derivative is 0, which means that: 8

Compare the two equations: Step Size Selection We see that Newton’s method, also called the Newton-Raphson method 9

Jang, Fig. 6.3 – The Newton-Raphson method approximates E(  ) as a quadratic function 10

The second derivative is called the Hessian (Otto Hesse, 1800s) Step Size Selection How easy or difficult is it to calculate the Hessian? What if the Hessian is not invertible? The Newton-Raphson method 11

The Levenberg–Marquardt algorithm: Step Size Selection  is a parameter selected to balance between steepest descent ( =  ) and Newton-Raphson ( = 0). We can also control the step size with another parameter  : 12

Jang, Fig. 6.5 – Illustration of Levenberg-Marquardt gradient descent 13

Trust region methods: These are used in conjuction with the Newton-Raphson method, which approximates E as quadratic in  : Step Size Selection If we use Newton-Raphson to minimize E with a step size of  k, then we are implicitly assuming that E(  k +  k ) will be equal to the above expression. 14

Actually, we expect the actual improvement k to be slightly smaller than the true improvement: Step Size Selection We limit the step size  k so that |  k |<R k Our “trust” in the quadratic approximation is proportional to k 15

k = ratio of actual to expected improvement R k = trust region: maximum allowable size of  k Step Size Selection 16

17

18

19