CS5321 Numerical Optimization

Slides:

Advertisements

Similar presentations

Advertisements

Optimization of thermal processes

Optimization 吳育德.

Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.

MATH 685/ CSI 700/ OR 682 Lecture Notes

Solving Linear Systems (Numerical Recipes, Chap 2)

Systems of Linear Equations

Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.

1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)

1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.

Numerical Optimization

1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .

Special Matrices and Gauss-Siedel

12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.

CSE 245: Computer Aided Circuit Simulation and Verification

Why Function Optimization ?

Tier I: Mathematical Methods of Optimization

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.

Computational Optimization

CHAPTER SIX Eigenvalues

Systems of Linear Equations Iterative Methods

Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.

Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal February 12, 2007 Inexact Methods for PDE-Constrained Optimization.

Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.

Application of Differential Applied Optimization Problems.

1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.

Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal January 31, 2007 Inexact Methods for PDE-Constrained Optimization.

© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.

CSE 245: Computer Aided Circuit Simulation and Verification Matrix Computations: Iterative Methods I Chung-Kuan Cheng.

559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.

Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.

Variations on Backpropagation.

Signal & Weight Vector Spaces

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Numerical Methods for Inverse Kinematics Kris Hauser ECE 383 / ME 442.

Function Optimization

Bounded Nonlinear Optimization to Fit a Model of Acoustic Foams

Non-linear Minimization

Computational Optimization

Computational Optimization

CSE 245: Computer Aided Circuit Simulation and Verification

CS5321 Numerical Optimization

CS5321 Numerical Optimization

Collaborative Filtering Matrix Factorization Approach

CS5321 Numerical Optimization

Variations on Backpropagation.

CSE 245: Computer Aided Circuit Simulation and Verification

Chapter 10. Numerical Solutions of Nonlinear Systems of Equations

CS5321 Numerical Optimization

CS5321 Numerical Optimization

Optimization Part II G.Anuradha.

Optimization Methods TexPoint fonts used in EMF.

CS5321 Numerical Optimization

~ Least Squares example

CS5321 Numerical Optimization

~ Least Squares example

Variations on Backpropagation.

CS5321 Numerical Optimization

Performance Surfaces.

Performance Optimization

Outline Preface Fundamentals of Optimization

Multiple features Linear Regression with multiple variables

Multiple features Linear Regression with multiple variables

Outline Preface Fundamentals of Optimization

Section 3: Second Order Methods

Steepest Descent Optimization

CS5321 Numerical Optimization

CS5321 Numerical Optimization

Multiple linear regression

CS5321 Numerical Optimization

Presentation transcript:

CS5321 Numerical Optimization 03 Line Search Methods 12/1/2018

Line search method Given a point xk, find a descent direction pk. Find the step length α to minimize f (xk+αpk) A descent direction pk means the directional derivative f (xk)Tpk < 0 In Newton’s method, pk = −Hk gk Matrix Hk=2f (xk) is Hessian; gk= f (xk) is gradient If Hk is positive definite, f (xk)Tpk= −gkTHkgk < 0 12/1/2018

Modified Newton’s methods If the Hessian H is indefinite, ill-conditioned, or singular, the modified Newton’s methods compute the inverse of H+E, such that −(H+E)−1f (xk) gives a descent direction. Matrix E can be calculated via Eigenvalue modification Diagonal shift Modified Cholesky factorization Modified symmetric indefinite factorization 12/1/2018

1. Eigenvalue modification Modifying the eigenvalues of H such that H becomes positive definite Let H=QQT be the spectral decomposition of H.  = diag(1, 2,…n) where 1… i>0i+1…  n Define =diag(0,…,0,i+1,…,n) s.t. + > 0 Matrix E=QQT Problem: the eigenvalue decomposition is too expensive. 12/1/2018

2. Diagonal shift If =diag(,…, ) = I, E = QQT = QQT = I To make H+E positive definite, only need to choose  >|i|, where i is minimum negative eigenvalues of H. How to know without explicitly performing the eigenvalue decomposition? If H is positive definite, H has Cholesky decomposition. Guess a shift  and try Cholesky decomposition on H+I. If fails, increase  and try again, until succeed The choice of increment is heuristic. 12/1/2018

3. Modified Cholesky factorization A SPD matrix H can be decomposed as H=LDLT. L is unit triangular matrix D is diagonal with positive elements Relations to Cholesky decomp H=MMT is M=LD1/2 If A is not SPD, modify the elements of L and D during the decomposition such that D(i,i)  > 0 and L(i,j)D(i,i)1/2  . The decomposition can be used in solving linear systems. 12/1/2018

4. Modified symmetric indefinite factorization Symmetric indefinite factorization PHPT=LBLT better numerical stability than Cholesky decomposition Matrix B has the same inertia as H. The inertia of a matrix is the number of positive, zero, and negative eigenvalues of the matrix. Use eigenvalue modification to B st. B+F is positive definite The eigen-decomp of B is cheap since it is block diagonal Thus, P(H+E)PT=L(B+F)LT and E=PTLFLTP 12/1/2018

Step length  Assume pk is a descent direction. Find an optimal step length α. The minimization problem may be difficult to solve. (nonlinear) Alternative method is to find an α that satisfies some conditions Wolfe conditions Goldstein conditions m i n ® > Á ( ) = f x k + p 12/1/2018

The Goldstein conditions With 0< c <1/2, Suitable for Newton-typed methods, but not well suited for quasi-Newton methods f ( x k ) + 1 ¡ c ® r T p · 12/1/2018

The Wolfe conditions Sufficient decrease condition: for c1(0,1) Curvature condition: for c2(0, c1) Usually choose c2=0.9 for Newton’s method and c2=0.1 for conjugate gradient method f ( x k + ® p ) · c 1 r T r f ( x k + ® p ) T ¸ c 2 12/1/2018

The Coldstein conditions The Wolfe conditions 12/1/2018

The strong Wolfe conditions Limit the range of ’(k) from both sides f ( x k + ® p ) · c 1 r T j r f ( x k + ® p ) T · c 2 The strong Wolfe conditions 12/1/2018

Convergence of line search For a descent direction pk and a step length  that satisfies the Wolfe condition, if f is bounded below and continuously differentiable in an open neighborhood N and f is Lipschitz continuous in N, then k is the angle between pk and f (xk) Note that can be either cosk0 or ||f (xk)||0 Other two methods have similar results c o s 2 µ k r f ( x ) ! 12/1/2018