Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.

Slides:



Advertisements
Similar presentations
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Advertisements

Line Search.
Optimization.
Optimization 吳育德.
Robot Vision SS 2005 Matthias Rüther 1 ROBOT VISION Lesson 3: Projective Geometry Matthias Rüther Slides courtesy of Marc Pollefeys Department of Computer.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Solving Linear Systems (Numerical Recipes, Chap 2)
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)
Function Optimization Newton’s Method. Conjugate Gradients
Computer Graphics Recitation 5.
Newton’s Method applied to a scalar function Newton’s method for minimizing f(x): Twice differentiable function f(x), initial solution x 0. Generate a.
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
Tutorial 12 Unconstrained optimization Conjugate gradients.
Methods For Nonlinear Least-Square Problems
Math for CSLecture 41 Linear Least Squares Problem Over-determined systems Minimization problem: Least squares norm Normal Equations Singular Value Decomposition.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Optimization Methods One-Dimensional Unconstrained Optimization
Ordinary least squares regression (OLS)
ECIV 301 Programming & Graphics Numerical Methods for Engineers REVIEW II.
Advanced Topics in Optimization
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Linear Least Squares QR Factorization. Systems of linear equations Problem to solve: M x = b Given M x = b : Is there a solution? Is the solution unique?
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Newton's Method for Functions of Several Variables
Why Function Optimization ?
Optimization Methods One-Dimensional Unconstrained Optimization

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
UNCONSTRAINED MULTIVARIABLE
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 9. Optimization problems.
Computational Methods in Physics PHYS 3437 Dr Rob Thacker Dept of Astronomy & Physics (MM-301C)
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.
Scientific Computing Linear Least Squares. Interpolation vs Approximation Recall: Given a set of (x,y) data points, Interpolation is the process of finding.
Nonlinear least squares Given m data points (t i, y i ) i=1,2,…m, we wish to find a vector x of n parameters that gives a best fit in the least squares.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
Linear algebra: matrix Eigen-value Problems Eng. Hassan S. Migdadi Part 1.
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
Scientific Computing Singular Value Decomposition SVD.
Parameter estimation. 2D homography Given a set of (x i,x i ’), compute H (x i ’=Hx i ) 3D to 2D camera projection Given a set of (X i,x i ), compute.
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 4. Least squares.
Scientific Computing General Least Squares. Polynomial Least Squares Polynomial Least Squares: We assume that the class of functions is the class of all.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Parameter estimation class 5 Multiple View Geometry CPSC 689 Slides modified from Marc Pollefeys’ Comp
Feature Generation and Cluster-based Feature Selection.
Chapter 12: Data Analysis by linear least squares Overview: Formulate problem as an over-determined linear system of equations Solve the linear system.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Review of Linear Algebra
CS5321 Numerical Optimization
~ Least Squares example
Lecture 13: Singular Value Decomposition (SVD)
~ Least Squares example
Performance Optimization
CS5321 Numerical Optimization
Pivoting, Perturbation Analysis, Scaling and Equilibration
Multiple linear regression
CS5321 Numerical Optimization
Presentation transcript:

Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft. taller and the z looks 2354 ft. taller, and from y, z looks 950 ft. taller. Set up the overdetermined system uyzuyz ~ =b Ax= Want to minimize ||Ax-b|| 2

Exponential example Assume you are given the following data: b t Which you think fits the model b = u + w e -.5t for some values of u and w which you wish to determine so that The Matrix A would look like:

Approaches to solve Ax b Normal equations-quick and dirty QR- standard in libraries uses orthogonal decomposition SVD - decomposition which also gives indication how linear independent columns are Conjugate gradient- no decompositions, good for large sparse problems

Quick and Dirty Approach Multiply by A T to get the normal equations: A T A x = A T b For the mountain example the matrix A T A is 3 x 3. The matrix A T A is symmetric. However, sometimes A T A can be nearly singular or singular. Consider the matrix A = 1 1 e 0 0 e The matrix A T A = 1+ e e 2 becomes singular if e is less than the square root of the machine precision.

Normal equations for least squares - slide 2 For the hill problem: u A T A x = y = 4354 = A T b x 8138 Solution is u =2472, y= 3886, z=4816.

Assume the m x n matrix A and the m-vector b can be partitioned into A = R and b = c 0 d where R is a nonsingular n x n matrix and c has length n. Then ||Ax-b|| 2 2 = ||Rx -c || || d || 2 2 So pick x such that ||Rx -c || 2 2 which implies ||Ax-b|| 2 2 = || d || 2 2.This is the best one can do. QR: A more stable approach

QR for least squares-slide 2 But most A matrices do not have this form, but using orthogonal transformations we can transform matrices of full column rank to this form. A matrix Q is orthogonal if Q T Q=I. If Q is orthogonal then for any x, ||Qx || 2 2 =||x || 2 2, that is an orthogonal matrix preserves the 2 norm. Examples of orthogonal matrices: cos y sin y sin y cos y for some angle y Givens rotations

QR for least squares -slide 3 We wish to pick a sequence of orthogonal matrices such that A might be transformed into upper triangular form: x x xx x xx x xx x x x x x0 x x0 x x0 x x = R x x x0 x x 0 0 x0 0 x x x x 0 x x0 0 x The transformations are then applied to the data vector b The solution is now found by solving Rx =c, where c is the first n elements of the transformed b.

QR for least squares -slide 4 Householder transformations of the form I -2uu T / u T u can easily eliminate whole columns. If the A matrix is almost square the QR approach and the normal equations approach require about the same number of operations. If A is tall and skinny, the QR approach takes about the twice number of operations. Good project: Investigate structure of R if A is sparse. Most good least squares solvers use the QR approach. In Matlab: x= A \b.

Singular Value Decomposition The singular value decomposition(SVD) of a matrix A is given by A = USV T where U and V are orthogonal and S is a diagonal matrix of the form. If any of the s’s are 0, the matrix is singular. Thus one can determine how close A is to a singular matrix by looking at the smallest s’s. Good project: Investigate an application of SVD

SVD for least squares The solution to the least squares problem Ax b is given by x=VS 1 -1 U 1 T b Requires 4 to 10 times more work than QR but shows dependencies in model. If A is an m x n matrix of rank n, then A =USV T = [ U 1 U 2 ] V T = U 1 S 1 V T where U 1 has the first n rows of U and S 1 is n x n.

Conjugate gradient Does not require decomposition of matrix Good for large sparse problem-like PET Iterative method that requires matrix vector multiplication by A and A T each iteration In exact arithmetic for n variables guaranteed to converge in n iterations- so 2 iterations for the exponential fit and 3 iterations for the hill problem. Does not zigzag

Initialization: Guess x Set r= b-Ax, p = A t r ( p will be a direction), Set γ=p T p Until convergence Set q= Ap, α = γ/q T q Reset x to x + α p Reset r to r - α q Set s = A t r, γ new =s T s, β = γ new / γ, γ = γ new Reset p to s+ β p. Conjugate gradient algorithm

Nonlinear example Assume in the exponential example that the model was b = u + we -kt with w added to the list of unknowns. The variable k is nonlinear while u and w are linear. Possible approaches: Treat all three variables as nonlinear and use a nonlinear solver Use a 1-dimensional solver for k and each time a function value is requested one solves for u and w using a linear least squares solver, plug the best value for them in and give back the residual.

One dimensional minimization of f(x) Assumption Given an interval [a,b], where one “knows” there is a minimum, that f is unimodal on [a,b], i.e. there is only one local minimum. With this assumption we can find the minimum by sampling and discarding portions of the interval that cannot have the solution. Best algorithms are combinations of golden section search and quadratic interpolation with 3 points and finding minimum of the quadratic-Brent.

Golden section on -(2x 3 -9x 2 =12x+6) Originally a =0, x 1 =.7639, x 2 = , b=2.0 Subsequent values of a and b ab etc.

Unconstrained minimization of f(x) where x has n elements Steepest descent- requires first derivatives (gradient) might zigzag good beginning strategy sped up by conjugate gradient Newton- requires first and second derivatives(Hessian) requires solution of linear system with n variables fast if close to solution Quasi-Newton(most practical)- requires first derivative no linear system to solve builds up approximation to inverse of Hessian

Newton’s method for minimization Let g = gradient, H = matrix of second partials Taylor’s Theorem: f(x+s) f(x) + g T s s T Hs. This quadratic function in s is minimized when s =- H -1 g Algorithm:guess x Until convergence Solve H(x) s =- g(x) {needs H and solver} Reset x to x + s

Quasi- Newton Method Builds up approximation to Hessian in directions that have been searched Almost as fast as Newton Initial: pick x, set B = I. Until convergence: set s = - Bg (no linear system to solve) set x new = x+s let γ= g(x new ) -g(x); δ = x new - x; x= x new reset B to B + δ δ T / δ T γ - B γ (B γ) T / γ T B γ

Comparison of steepest descent, Newton, and quasi Newton on f(x) = 0.5x x 2 2 IterationSteepestNewtonQuasi-Newton

Large Scale Problems Conjugate gradient vs. Limited Memory Quasi-Newton Conjugate gradient- each step is linear combination of previous step and current gradient Limited Memory-(Nocedal,Schnabel, Byrd, Kaufman) Do not multiply B out but keep vectors. Need to keep 2 vectors per iteration After k steps(k is about 5) reset B to I and start again. Experimentation favors LMQN over CG Good project: How should LMQN be done