Lecture 15 Nonlinear Problems Newton’s Method. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.

Slides:



Advertisements
Similar presentations
Environmental Data Analysis with MatLab
Advertisements

Lecture 1 Describing Inverse Problems. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability.
Lecture 20 Continuous Problems Linear Operators and Their Adjoints.
Lecture 10 Nonuniqueness and Localized Averages. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 14 Nonlinear Problems Grid Search and Monte Carlo Methods.
Lecture 13 L1 , L∞ Norm Problems and Linear Programming
Lecture 23 Exemplary Inverse Problems including Earthquake Location.
Lecture 22 Exemplary Inverse Problems including Filter Design.
Lecture 18 Varimax Factors and Empircal Orthogonal Functions.
Optimization Introduction & 1-D Unconstrained Optimization
Lecture 7 Backus-Gilbert Generalized Inverse and the Trade Off of Resolution and Variance.
Lecture 3 Probability and Measurement Error, Part 2.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Visual Recognition Tutorial
Lecture 2 Probability and Measurement Error, Part 1.
Lecture 5 A Priori Information and Weighted Least Squared.
Lecture 19 Continuous Problems: Backus-Gilbert Theory and Radon’s Problem.
Lecture 4 The L 2 Norm and Simple Least Squares. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 16 Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals.
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Lecture 17 Factor Analysis. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Lecture 21 Continuous Problems Fréchet Derivatives.
Lecture 24 Exemplary Inverse Problems including Vibrational Problems.
Lecture 6 Resolution and Generalized Inverses. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 8 Advanced Topics in Least Squares - Part Two -
6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.
Maximum likelihood (ML) and likelihood ratio (LR) test
Newton’s Method applied to a scalar function Newton’s method for minimizing f(x): Twice differentiable function f(x), initial solution x 0. Generate a.
Environmental Data Analysis with MatLab Lecture 5: Linear Models.
Lecture 3 Review of Linear Algebra Simple least-squares.
Chapter 1 Introduction The solutions of engineering problems can be obtained using analytical methods or numerical methods. Analytical differentiation.
Lecture 12 Equality and Inequality Constraints. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 7 Advanced Topics in Least Squares. the multivariate normal distribution for data, d p(d) = (2  ) -N/2 |C d | -1/2 exp{ -1/2 (d-d) T C d -1 (d-d)
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Lecture 11 Vector Spaces and Singular Value Decomposition.
ECIV 301 Programming & Graphics Numerical Methods for Engineers REVIEW II.
Computer vision: models, learning and inference
Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance.
Environmental Data Analysis with MatLab Lecture 7: Prior Information.
Maximum likelihood (ML)
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Today Wrap up of probability Vectors, Matrices. Calculus
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Chapter 17 Boundary Value Problems. Standard Form of Two-Point Boundary Value Problem In total, there are n 1 +n 2 =N boundary conditions.
Lecture Notes Dr. Rakhmad Arief Siregar Universiti Malaysia Perlis
Analytical vs. Numerical Minimization Each experimental data point, l, has an error, ε l, associated with it ‣ Difference between the experimentally measured.
ME451 Kinematics and Dynamics of Machine Systems Numerical Solution of DAE IVP Newmark Method November 1, 2013 Radu Serban University of Wisconsin-Madison.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Modern Navigation Thomas Herring
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Monte-Carlo method for Two-Stage SLP Lecture 5 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Geology 5670/6670 Inverse Theory 20 Feb 2015 © A.R. Lowry 2015 Read for Mon 23 Feb: Menke Ch 9 ( ) Last time: Nonlinear Inversion Solution appraisal.
Geology 5670/6670 Inverse Theory 27 Feb 2015 © A.R. Lowry 2015 Read for Wed 25 Feb: Menke Ch 9 ( ) Last time: The Sensitivity Matrix (Revisited)
Geology 5670/6670 Inverse Theory 28 Jan 2015 © A.R. Lowry 2015 Read for Fri 30 Jan: Menke Ch 4 (69-88) Last time: Ordinary Least Squares: Uncertainty The.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
Latent Variables, Mixture Models and EM
6.5 Taylor Series Linearization
Presentation transcript:

Lecture 15 Nonlinear Problems Newton’s Method

Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and Measurement Error, Part 2 Lecture 04The L 2 Norm and Simple Least Squares Lecture 05A Priori Information and Weighted Least Squared Lecture 06Resolution and Generalized Inverses Lecture 07Backus-Gilbert Inverse and the Trade Off of Resolution and Variance Lecture 08The Principle of Maximum Likelihood Lecture 09Inexact Theories Lecture 10Nonuniqueness and Localized Averages Lecture 11Vector Spaces and Singular Value Decomposition Lecture 12Equality and Inequality Constraints Lecture 13L 1, L ∞ Norm Problems and Linear Programming Lecture 14Nonlinear Problems: Grid and Monte Carlo Searches Lecture 15Nonlinear Problems: Newton’s Method Lecture 16Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Lecture 17Factor Analysis Lecture 18Varimax Factors, Empircal Orthogonal Functions Lecture 19Backus-Gilbert Theory for Continuous Problems; Radon’s Problem Lecture 20Linear Operators and Their Adjoints Lecture 21Fréchet Derivatives Lecture 22 Exemplary Inverse Problems, incl. Filter Design Lecture 23 Exemplary Inverse Problems, incl. Earthquake Location Lecture 24 Exemplary Inverse Problems, incl. Vibrational Problems

Purpose of the Lecture Introduce Newton’s Method Generalize it to an Implicit Theory Introduce the Gradient Method

Part 1 Newton’s Method

grid search Monte Carlo Method are completely undirected alternative take directions from the local properties of the error function E(m)

Newton’s Method start with a guess m (p) near m (p), approximate E(m) as a parabola and find its minimum set new guess to this value and iterate

m n est m n+1 est m GM m E(m)

Taylor Series Approximation for E(m) expand E around a point m (p)

differentiate and set result to zero to find minimum

relate b and B to g(m) linearized data kernel

formula for approximate solution

relate b and B to g(m) very reminiscent of least squares

what do you do if you can’t analytically differentiate g(m) ? use finite differences to numerically differentiate g(m) or E(m)

first derivative

vector Δm [0,..., 0, 1, 0,..., 0] T need to evaluate E(m) M+1 times

second derivative need to evaluate E(m) about ½M 2 times

what can go wrong? convergence to a local minimum

m n est m n+1 est m GM m E(m) local minimum global minimum

analytically differentiate sample inverse problem d i (x i ) = sin(ω 0 m 1 x i ) + m 1 m 2

(A) (B) (C)

often, the convergence is very rapid

but sometimes the solution converges to a local minimum and sometimes it even diverges

mg = [1, 1]’; G = zeros(N,M); for k = [1:Niter] dg = sin( w0*mg(1)*x) + mg(1)*mg(2); dd = dobs-dg; Eg=dd'*dd; G = zeros(N,2); G(:,1) = w0*x.*cos(w0*mg(1)*x) + mg(2); G(:,2) = mg(2)*ones(N,1); % least squares solution dm = (G'*G)\(G'*dd); % update mg = mg+dm; end

Part 2 Newton’s Method for an Implicit Theory

Implicit Theory f(d,m)=0 with Gaussian prediction error and a priori information about m

to simplify algebra group d, m into a vector x

parameter, x 2 parameter, x 1

represent data and a priori model parameters as a Gaussian p(x) f(x)=0 defines a surface in the space of x maximize p(x) on this surface maximum likelihood point is x est

x2x2 x 2 est x 1 est x1x1 p(x 1 ) x 1 ML x1x1 f(x)=0 (B) (A)

can get local maxima if f(x) is very non-linear

x2x2 x 2 est x 1 est x1x1 p(x 1 ) x 1 ML x1x1 f(x)=0 (B) (A)

mathematical statement of the problem its solution (using Lagrange Multipliers) with F ij = ∂f i /∂x j

mathematical statement of the problem its solution (using Lagrange Multipliers) reminiscent of minimum length solution

mathematical statement of the problem its solution (using Lagrange Multipliers) oops! x appears in 3 places

solution iterate ! new value for x is x (p+1) old value for x is x (p)

special case of an explicit theory f(x) = d-g(m) equivalent to solving using simple least squares

special case of an explicit theory f(x) = d-g(m) weighted least squares generalized inverse with a linearized data kernel

special case of an explicit theory f(x) = d-g(m) Newton’s Method, but making E+L small not just E small

Part 3 The Gradient Method

What if you can compute E(m) and ∂E/∂m p but you can’t compute ∂g/∂m p or ∂ 2 E/∂m p ∂ m q

m n est m GM E(m)

m n est m GM E(m) you know the direction towards the minimum but not how far away it is

unit vector pointing towards the minimum so improved solution would be if we knew how big to make α

Armijo’s rule provides an acceptance criterion for α with c≈10 -4 simple strategy start with a largish α divide it by 2 whenever it fails Armijo’s Rule

(A) (B) (C) d x m2m2 m1m1 iteration Error, E m1m1 m2m2

% error and its gradient at the trial solution mgo=[1,1]'; ygo = sin( w0*mgo(1)*x) + mgo(1)*mgo(2); Ego = (ygo-y)'*(ygo-y); dydmo = zeros(N,2); dydmo(:,1) = w0*x.*cos(w0*mgo(1)*x) + mgo(2); dydmo(:,2) = mgo(2)*ones(N,1); dEdmo = 2*dydmo'*(ygo-y); alpha = 0.05; c1 = ; tau = 0.5; Niter=500; for k = [1:Niter] v = -dEdmo / sqrt(dEdmo'*dEdmo);

% backstep for kk=[1:10] mg = mgo+alpha*v; yg = sin(w0*mg(1)*x)+mg(1)*mg(2); Eg = (yg-y)'*(yg-y); dydm = zeros(N,2); dydm(:,1) = w0*x.*cos(w0*mg(1)*x)+ mg(2); dydm(:,2) = mg(2)*ones(N,1); dEdm = 2*dydm'*(yg-y); if( (Eg<=(Ego + c1*alpha*v'*dEdmo)) ) break; end alpha = tau*alpha; end

% change in solution Dmg = sqrt( (mg-mgo)'*(mg-mgo) ); % update mgo=mg; ygo = yg; Ego = Eg; dydmo = dydm; dEdmo = dEdm; if( Dmg < 1.0e-6 ) break; end

often, the convergence is reasonably rapid

exception when the minimum is in along a long shallow valley