6.6 The Marquardt Algorithm

Slides:

Advertisements

Similar presentations

Numerical Solution of Linear Equations

Advertisements

1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.

Lecture 5 Newton-Raphson Method

Roundoff and truncation errors

FTP Biostatistics II Model parameter estimations: Confronting models with measurements.

SOLVING SYSTEMS OF LINEAR EQUATIONS. Overview A matrix consists of a rectangular array of elements represented by a single symbol (example: [A]). An individual.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Chapter 4 Roots of Equations

Newton-Gauss Algorithm iii) Calculation the shift parameters vector R (p 0 )dR(p 0 )/dR(p 1 )dR(p 0 )/dR(p 2 )=- - p1p1 p2p2 - … - The Jacobian Matrix.

CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Improved BP algorithms ( first order gradient method) 1.BP with momentum 2.Delta- bar- delta 3.Decoupled momentum 4.RProp 5.Adaptive BP 6.Trinary BP 7.BP.

12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.

Advanced Topics in Optimization

1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.

Dominant Eigenvalues & The Power Method

Chapter 8 Objectives Understanding matrix notation.

Least-Squares Regression

Computer Engineering Majors Authors: Autar Kaw

Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.

Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.

Lecture Notes Dr. Rakhmad Arief Siregar Universiti Malaysia Perlis

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.

LIAL HORNSBY SCHNEIDER

10/17/ Gauss-Siedel Method Industrial Engineering Majors Authors: Autar Kaw

Modern Navigation Thomas Herring

10/26/ Gauss-Siedel Method Civil Engineering Majors Authors: Autar Kaw Transforming.

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

1 Complex Images k’k’ k”k” k0k0 -k0-k0 branch cut   k 0 pole C1C1 C0C0 from the Sommerfeld identity, the complex exponentials must be a function.

Maximum Likelihood Estimation Psych DeShon.

Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION The Minimum Variance Estimate ASEN 5070 LECTURE.

Gaussian Elimination Electrical Engineering Majors Author(s): Autar Kaw Transforming Numerical Methods Education for.

Linear Systems – Iterative methods

Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.

SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart.

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.

Discretization Methods Chapter 2. Training Manual May 15, 2001 Inventory # Discretization Methods Topics Equations and The Goal Brief overview.

Variations on Backpropagation.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Part 3 Chapter 12 Iterative Methods

Mechanical Engineering Majors Authors: Autar Kaw

2/26/ Gauss-Siedel Method Electrical Engineering Majors Authors: Autar Kaw

3/6/ Gauss-Siedel Method Major: All Engineering Majors Author: دکتر ابوالفضل رنجبر نوعی

Copyright © Cengage Learning. All rights reserved. 7 Systems of Equations and Inequalities.

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 11:00-12:00.

Linear Algebra Review.

(5) Notes on the Least Squares Estimate

Gauss-Siedel Method.

Autar Kaw Benjamin Rigsby

Numerical Analysis Lecture 16.

KS4 Mathematics Linear Graphs.

Instructor :Dr. Aamer Iqbal Bhatti

6.2 Grid Search of Chi-Square Space

6.5 Taylor Series Linearization

5.2 Least-Squares Fit to a Straight Line

5.4 General Linear Least-Squares

6.1 Introduction to Chi-Square Space

topic4: Implicit method, Stability, ADI method

Matrix Algebra.

9.5 Least-Squares Digital Filters

6.3 Gradient Search of Chi-Square Space

topic4: Implicit method, Stability, ADI method

A8 Linear and real-life graphs

Copyright © Cengage Learning. All rights reserved.

MATH 1910 Chapter 3 Section 8 Newton’s Method.

Numerical Analysis Lecture11.

CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.

Pivoting, Perturbation Analysis, Scaling and Equilibration

Presentation transcript:

6.6 The Marquardt Algorithm limitations of the gradient and Taylor expansion methods recasting the Taylor expansion in terms of chi-square derivatives recasting the gradient search into an iterative matrix formalism Marquardt's algorithm automatically combines the gradient and Taylor expansion methods program description and example output for the Gaussian peak 6.6 : 1/9

Limitations of Previous Methods The gradient search works well when the slope of chi-square space is steep. It "flounders" when getting near the minimum because the gradient is approaching zero. A second-order Taylor series expansion works best when the initial guesses are within a region of chi-square space having positive curvature. When the initial estimates are in regions of negative curvature the iterative procedure diverges. The Marquardt algorithm is a method of automatically switching between the gradient search and Taylor expansion to enhance convergence to the minimum in chi-square space. 6.6 : 2/9

Taylor Series Derivatives One of the biggest drawbacks to the second-order Taylor expansion is the need to provide a functional form for the first and second derivatives. This is especially acute when writing a computer program. Marquardt showed that the functions could be replaced by numeric differentiation of chi-square. Additionally, Marquardt pointed out the Dyi term in the last equation will sum to a small value as N increases, since the deviations have a pdf with a mean of zero. 6.6 : 3/9

Gradient Search Matrix Formalism The second important thing recognized by Marquardt was the ability to write the gradient search as an iterative matrix problem, just like a Taylor's series expansion. The matrix solution is, , where d and b are the previously defined vectors, and Da is a diagonal matrix containing the step sizes along each coefficient axis in chi-square space. Since c2 is unitless, each br has units of 1/ar. Since each dr must have units of ar, each step size, Dar,r, has to have units of ar2. The most natural choice for the step sizes is one proportional to the coefficient variance, (a-1)r,r. Finally, the step should be some small fraction of the variance, (a-1)r,r /l, where l > 1. The Da matrix is then obtained by multiplying the diagonal of a by the constant l, where l is serving the same role as the factor f in the gradient search. 6.6 : 4/9

Gradient Search Matrix Formalism Quick review of the gradient search. Gradient of c2 is given by Step size is proportional to the magnitude of the gradient, weighted by the empirical factor of f = 0.01 – 0.001. But (remarkably)… Where Da is a diagonal matrix.

Matrix Solution Marquardt [Journal of the Society for Industrial and Applied Mathematics, 1963, vol. 11, pp. 431-441] demonstrated that the second-order expansion and the gradient search could be combined into one mathematical operation. To do this define a new a matrix. When l << 1 , the solution corresponds to a Taylor expansion. When l >> 1, the solution corresponds to a gradient search. The calculation uses the standard iterative procedure, where the subscript, cur, denotes the current guesses, and the subscript, new, denotes the improved estimate of the coefficient values. Iteration involves substituting new guesses for current guesses and repeating the matrix algebra. 6.6 : 5/9

The Algorithm compute chi-square at the initial guesses assemble the a matrix and b vector using the partial derivatives start with l equal to a small number, say 10-3, and assemble the a' matrix - the fit will start with a Taylor's expansion solve the matrix equation for anew and compute chi-square at anew (a) if c2(anew)  c2(acur) multiply l by 10, reassemble a' and repeat step 4 with acur - make the fit more like a gradient (b) if c2(anew) < c2(acur) divide l by 10, reassemble a' and repeat step 4 with acur = anew - make the fit more like an expansion (c) if c2(anew)  c2(acur) stop the iterations and use acur as estimates of the coefficients - the minimum has been found 6. set l = 0 and compute a-1 with acur. Use the diagonal elements to obtain the variance of the coefficients. 6.6 : 6/9

Marquardt Program Description A program that automatically computes the minimum in chi-square space is shown in the Mathcad worksheet, "6.6 Marquardt Algorithm.mcd". It uses six functions with a variety of inputs: f(x,a), inputs are the x-data and the initial coefficient guesses, a; output is the corresponding y-value as a scalar. chisqr(y,x,a), inputs are the x- and y-data, and the a-coefficients; output is chi-square at the location given by a. beta(y,x,a,D), inputs are the data, coefficients, and the step size, D used to compute the derivatives; output is the b vector which contains -0.5. alpha(y,x,a,D), computes the derivatives; output is the a matrix which contains the second-order derivatives. setl(a,l), computes the a' matrix given a and l. marquart(y,x,a,D), executes the Marquardt algorithm; output is a vector containing the coefficient values at the minimum, the value of chi-square at the minimum, the final value of l, and the number of required iterations. 6.6 : 7/9

Marquardt Program Output Start with the initial guesses, a0 = 2 and a1 = 51, and D = 0.001 (the largest value of D giving a stable derivative). Three iterations were required. l = 0.0010.13, which means the search was always a Taylor expansion. The coefficients have the lowest c2 that we have seen. Test the function in a region of negative curvature, a0 = 6 and a1 = 50.94. The large number of iterations indicates that the algorithm spent a significant fraction of the time performing a gradient search. 6.6 : 8/9

Coefficient Errors The coefficients were set to those at the minimum, a0 = 2.1402 and a1 = 50.9376. The alpha matrix was recalculated with l = 0. The a-1 matrix yielded the diagonal elements, (a0,0)-1 = 47.06 and (a1,1)-1 = 70.482. The estimated variance of the fit was s2 = 2.7310-5. The coefficient variance can be computed from these two terms. All of the methods gave statistically indistinguishable results for the least-squares coefficients. The Taylor and Marquardt gave statistically indistinguishable standard deviations for the coefficients. The grid search is slow but easy to implement manually, the Marquardt algorithm is the most tolerant to bad guesses. 6.6 : 9/9