CS5321 Numerical Optimization

Slides:



Advertisements
Similar presentations
Optimization.
Advertisements

Optimization of thermal processes
Optimization 吳育德.
Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Solving Linear Systems (Numerical Recipes, Chap 2)
Systems of Linear Equations
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Numerical Optimization
1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .
Special Matrices and Gauss-Siedel
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
CSE 245: Computer Aided Circuit Simulation and Verification
Why Function Optimization ?
Tier I: Mathematical Methods of Optimization
9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
Computational Optimization
CHAPTER SIX Eigenvalues
Systems of Linear Equations Iterative Methods
Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal February 12, 2007 Inexact Methods for PDE-Constrained Optimization.
Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.
Application of Differential Applied Optimization Problems.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal January 31, 2007 Inexact Methods for PDE-Constrained Optimization.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
CSE 245: Computer Aided Circuit Simulation and Verification Matrix Computations: Iterative Methods I Chung-Kuan Cheng.
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Variations on Backpropagation.
Signal & Weight Vector Spaces
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Numerical Methods for Inverse Kinematics Kris Hauser ECE 383 / ME 442.
Function Optimization
Bounded Nonlinear Optimization to Fit a Model of Acoustic Foams
Non-linear Minimization
Computational Optimization
Computational Optimization
CSE 245: Computer Aided Circuit Simulation and Verification
CS5321 Numerical Optimization
CS5321 Numerical Optimization
Collaborative Filtering Matrix Factorization Approach
CS5321 Numerical Optimization
Variations on Backpropagation.
CSE 245: Computer Aided Circuit Simulation and Verification
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
CS5321 Numerical Optimization
CS5321 Numerical Optimization
Optimization Part II G.Anuradha.
Optimization Methods TexPoint fonts used in EMF.
CS5321 Numerical Optimization
~ Least Squares example
CS5321 Numerical Optimization
~ Least Squares example
Variations on Backpropagation.
CS5321 Numerical Optimization
Performance Surfaces.
Performance Optimization
Outline Preface Fundamentals of Optimization
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Outline Preface Fundamentals of Optimization
Section 3: Second Order Methods
Steepest Descent Optimization
CS5321 Numerical Optimization
CS5321 Numerical Optimization
Multiple linear regression
CS5321 Numerical Optimization
Presentation transcript:

CS5321 Numerical Optimization 03 Line Search Methods 12/1/2018

Line search method Given a point xk, find a descent direction pk. Find the step length α to minimize f (xk+αpk) A descent direction pk means the directional derivative f (xk)Tpk < 0 In Newton’s method, pk = −Hk gk Matrix Hk=2f (xk) is Hessian; gk= f (xk) is gradient If Hk is positive definite, f (xk)Tpk= −gkTHkgk < 0 12/1/2018

Modified Newton’s methods If the Hessian H is indefinite, ill-conditioned, or singular, the modified Newton’s methods compute the inverse of H+E, such that −(H+E)−1f (xk) gives a descent direction. Matrix E can be calculated via Eigenvalue modification Diagonal shift Modified Cholesky factorization Modified symmetric indefinite factorization 12/1/2018

1. Eigenvalue modification Modifying the eigenvalues of H such that H becomes positive definite Let H=QQT be the spectral decomposition of H.  = diag(1, 2,…n) where 1… i>0i+1…  n Define =diag(0,…,0,i+1,…,n) s.t. + > 0 Matrix E=QQT Problem: the eigenvalue decomposition is too expensive. 12/1/2018

2. Diagonal shift If =diag(,…, ) = I, E = QQT = QQT = I To make H+E positive definite, only need to choose  >|i|, where i is minimum negative eigenvalues of H. How to know without explicitly performing the eigenvalue decomposition? If H is positive definite, H has Cholesky decomposition. Guess a shift  and try Cholesky decomposition on H+I. If fails, increase  and try again, until succeed The choice of increment is heuristic. 12/1/2018

3. Modified Cholesky factorization A SPD matrix H can be decomposed as H=LDLT. L is unit triangular matrix D is diagonal with positive elements Relations to Cholesky decomp H=MMT is M=LD1/2 If A is not SPD, modify the elements of L and D during the decomposition such that D(i,i)  > 0 and L(i,j)D(i,i)1/2  . The decomposition can be used in solving linear systems. 12/1/2018

4. Modified symmetric indefinite factorization Symmetric indefinite factorization PHPT=LBLT better numerical stability than Cholesky decomposition Matrix B has the same inertia as H. The inertia of a matrix is the number of positive, zero, and negative eigenvalues of the matrix. Use eigenvalue modification to B st. B+F is positive definite The eigen-decomp of B is cheap since it is block diagonal Thus, P(H+E)PT=L(B+F)LT and E=PTLFLTP 12/1/2018

Step length  Assume pk is a descent direction. Find an optimal step length α. The minimization problem may be difficult to solve. (nonlinear) Alternative method is to find an α that satisfies some conditions Wolfe conditions Goldstein conditions m i n ® > Á ( ) = f x k + p 12/1/2018

The Goldstein conditions With 0< c <1/2, Suitable for Newton-typed methods, but not well suited for quasi-Newton methods f ( x k ) + 1 ¡ c ® r T p · 12/1/2018

The Wolfe conditions Sufficient decrease condition: for c1(0,1) Curvature condition: for c2(0, c1) Usually choose c2=0.9 for Newton’s method and c2=0.1 for conjugate gradient method f ( x k + ® p ) · c 1 r T r f ( x k + ® p ) T ¸ c 2 12/1/2018

The Coldstein conditions The Wolfe conditions 12/1/2018

The strong Wolfe conditions Limit the range of ’(k) from both sides f ( x k + ® p ) · c 1 r T j r f ( x k + ® p ) T · c 2 The strong Wolfe conditions 12/1/2018

Convergence of line search For a descent direction pk and a step length  that satisfies the Wolfe condition, if f is bounded below and continuously differentiable in an open neighborhood N and f is Lipschitz continuous in N, then k is the angle between pk and f (xk) Note that can be either cosk0 or ||f (xk)||0 Other two methods have similar results c o s 2 µ k r f ( x ) ! 12/1/2018