Qualifier Exam in HPC February 10 th, 2010. Quasi-Newton methods Alexandru Cioaca.

Slides:



Advertisements
Similar presentations
Optimization.
Advertisements

Engineering Optimization
Optimization 吳育德.
1 TTK4135 Optimization and control B.Foss Spring semester 2005 TTK4135 Optimization and control Spring semester 2005 Scope - this you shall learn Optimization.
Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Solving Linear Systems (Numerical Recipes, Chap 2)
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Numerical Optimization
Function Optimization Newton’s Method. Conjugate Gradients
Nonlinear Optimization for Optimal Control
Motion Analysis (contd.) Slides are from RPI Registration Class.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Optimization Mechanics of the Simplex Method
Optimization Methods One-Dimensional Unconstrained Optimization
Engineering Optimization
CS240A: Conjugate Gradients and the Model Problem.
Advanced Topics in Optimization
Ordinary Differential Equations (ODEs) 1Daniel Baur / Numerical Methods for Chemical Engineers / Implicit ODE Solvers Daniel Baur ETH Zurich, Institut.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Why Function Optimization ?
Optimization Methods One-Dimensional Unconstrained Optimization
Unconstrained Optimization Rong Jin. Logistic Regression The optimization problem is to find weights w and b that maximizes the above log-likelihood How.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
Computational Optimization
UNCONSTRAINED MULTIVARIABLE
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 9. Optimization problems.
Antonio M. Vidal Jesús Peinado
Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Basic Numerical methods and algorithms
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal February 12, 2007 Inexact Methods for PDE-Constrained Optimization.
ENCI 303 Lecture PS-19 Optimization 2
84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.
1 Jorge Nocedal Northwestern University With S. Hansen, R. Byrd and Y. Singer IPAM, UCLA, Feb 2014 A Stochastic Quasi-Newton Method for Large-Scale Learning.
Computing a posteriori covariance in variational DA I.Gejadze, F.-X. Le Dimet, V.Shutyaev.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.
Lecture 22 - Exam 2 Review CVEN 302 July 29, 2002.
Nonlinear programming Unconstrained optimization techniques.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal January 31, 2007 Inexact Methods for PDE-Constrained Optimization.
“On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice.
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
A comparison between PROC NLP and PROC OPTMODEL Optimization Algorithm Chin Hwa Tan December 3, 2008.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Exact Differentiable Exterior Penalty for Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison December 20, 2015 TexPoint.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Consider Preconditioning – Basic Principles Basic Idea: is to use Krylov subspace method (CG, GMRES, MINRES …) on a modified system such as The matrix.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
CS5321 Numerical Optimization
CS5321 Numerical Optimization
CS5321 Numerical Optimization
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
CS5321 Numerical Optimization
Optimization Part II G.Anuradha.
~ Least Squares example
Numerical Linear Algebra
Solving Linear Systems: Iterative Methods and Sparse Systems
~ Least Squares example
Performance Optimization
CS5321 Numerical Optimization
CS5321 Numerical Optimization
CS5321 Numerical Optimization
Presentation transcript:

Qualifier Exam in HPC February 10 th, 2010

Quasi-Newton methods Alexandru Cioaca

Quasi-Newton methods (nonlinear systems)  Nonlinear systems: F(x) = 0,F : R n  R n F(x) = [ f i (x 1,…,x n ) ] T  Such systems appear in the simulation of processes (physical, chemical, etc.)  Iterative algorithm to solve nonlinear systems  Newton’s method != Nonlinear least-squares

Quasi-Newton methods (nonlinear systems) Standard assumptions 1. F – continuously differentiable in an open convex set D 2. F – Lipschitz continuous on D 3. There is x * in D s.t. F(x * )=0, F’(x * ) nonsingular Newton’s method: Starting from x 0 (initial iterate) x k+1 = x k – F’(x k ) -1 * F(x k ),{x k }  x * Until termination criterion is satisfied

Quasi-Newton methods (nonlinear systems)  Linear model around x k : M n (x) = F(x n ) + F’(x n )(x-x n ) M n (x) = 0  x n+1 = x n - F’(x n ) -1 *F(x n )  Iterates are computed as: F’(x n ) * s n = F(x n ) x n+1 = x n - s n

Quasi-Newton methods (nonlinear systems) Evaluate F’(x n )  Symbolically  Numerically with finite differences  Automatic differentiation Solve the linear system F’(x n ) * s n = F(x n )  Direct solve: LU, Cholesky  Iterative methods: GMRES, CG

Quasi-Newton methods (nonlinear systems) Computation:  F(xk)n scalar functions  F’(xk)n 2 scalar functions  LUO(2n 3 /3)  CholeskyO(n 3 /3)  Krylov methods(depends on condition number)

Quasi-Newton methods (nonlinear systems)  LU and Cholesky are useful when we want to reuse the factorization (quasi-implicit)  Difficult to parallelize and balance the workload  Cholesky is faster and more stable but needs SPD (!)  For n large, factorization is very impractical (n~10 6 )  Krylov methods contain elements easily parallelizable (updates, inner products, matrix-vector products)  CG is faster and more stable but needs SPD

Quasi-Newton methods (nonlinear systems) Advantages:  Under standard assumptions, Newton’s method converges locally and quadratically  There exists a domain of attraction S which contains the solution  Once the iterates enter S, they stay in S and eventually converge to x*  The algorithm is memoryless (self-corrective)

Quasi-Newton methods (nonlinear systems) Disadvantages:  Convergence depends on the choice of x 0  F’(x) has to be evaluated for each x k  Computation can be expensive: F(x k ), F’(x k ), s k

Quasi-Newton methods (nonlinear systems)  Implicit schemes for ODEs y’ = f(t,y) Forward Euler: y n+1 = y n + hf(t n,y n )(explicit) Backward Euler: y n+1 = y n + hf(t n+1, y n+1 ) (implicit)  Implicit schemes need the solution of a nonlinear system (also CN, RK, LMF)

Quasi-Newton methods (nonlinear systems)  How to circumvent evaluating F’(x k ) ?  Broyden’s method B k+1 = B k + (y k – B k *s k )*s k T / x k+1 = x k – B k -1 * F(x k )  Inverse update (Sherman-Morrison formula) H k+1 =H k +(s k -H k *y k )*s k T *H k / x k+1 = x k – H k * F(x k ) ( s k+1 = x k+1 – x k,y k+1 = F(x k+1 ) – F(x k ) )

Quasi-Newton methods (nonlinear systems) Advantages:  No need to compute F’(x k )  For inverse update – no linear system to solve Disadvantages:  Superlinear convergence  No longer memoryless

Quasi-Newton methods (unconstrained optimization)  Problem: Find the global minimizer of a cost function f : R n  R, x * = arg min f  f differentiable means the problem can be attacked by looking for zeros of the gradient

Quasi-Newton methods (unconstrained optimization)  Descent methods x k+1 =x k – λ k *P k *  f(x k ) P k = I n -steepest descent P k =  2 f(x k ) -1 -Newton’s method P k = B k -1 -Quasi-Newton  Angle between P k,  f(x k ) less than 90  B k has to mimic the behavior of the Hessian

Quasi-Newton methods (unconstrained optimization) Global convergence  Line search Step length: backtracking, interpolation Sufficient decrease: Wolfe conditions  Trust regions

Quasi-Newton methods (unconstrained optimization) For Quasi-Newton, B k has to resemble  2 f(x k )  Single-Rank:  Symmetry:  Positive def.:  Inverse update:

Quasi-Newton methods (unconstrained optimization) Computation  Matrix updates, inner products  DFP, PSB3 matrix-vector products  BFGS 2 matrix-matrix products Storage  Limited memory versions (L-BFGS)  Store {sk, yk} for the last m iterations and recompute H

Further improvements Preconditioning the linear system  For faster convergence one may solve K*B k *p k = K*F(x k )  If B is spd (and sparse) we can use sparse approximate inverses to generate the preconditioner  This preconditioner can be refined on a subspace of B k using an algebraic multigrid technique  We need to solve the eigenvalue problem

Further improvements Model reduction  Sometimes the dimension of the system is very large  Smaller model that captures the essence of the original  An approximation of the model variability can be retrieved from an ensemble of forward simulations  The covariance matrix gives the subspace  We need to solve the eigenvalue problem

QR/QL algorithms for symmetric matrices  Solves the eigenvalue problem  Iterative algorithm  Uses QR/QL factorization at each step (A=Q*R, Q unitary, R upper triangular) for k = 1,2,.. A k =Q k *R k A k+1 =R k *Q k end  Diagonal of A k converges to eigenvalues of A

QR/QL algorithms for symmetric matrices  The matrix A is reduced to upper Hessenberg form before starting the iterations  Householder reflections (U=I-v*v’)  Reduction is made column-wise  If A is symmetric, it is reduced to tridiagonal form

QR/QL algorithms for symmetric matrices  Convergence to a triangular form can be slow  Origin shifts are used to accelerate it for k = 1,2,.. A k -z k *I=Q k *R k A k+1 =R k *Q k +z k *I end  Wilkinson shift  QR makes heavy use of matrix-matrix products

Alternatives to quasi-Newton Inexact Newton methods  Inner iteration – determine a search direction by solving the linear system with a certain tolerance  Only Hessian-vector products are necessary  Outer iteration – line search on the search direction Nonlinear CG  Residual replaced by gradient of cost function  Line search  Different flavors

Alternatives to quasi-Newton Direct search  Does not involve derivatives of the cost function  Uses a structure called simplex to search for decrease in f  Stops when further progress cannot be achieved  Can get stuck in a local minima

More alternatives Monte Carlo  Computational method relying on random sampling  Can be used for optimization (MDO), inverse problems by using random walks  In the case where we have multiple correlated variables, the correlation matrix is spd so we can use Cholesky to factorize it

Conclusions  Newton’s method is a very powerful method with many applications and uses (solving nonlinear systems, finding minima of cost functions). Newton’s method can be used together with many other numerical algorithms (factorizations, linear solvers)  The optimization and parallelization of matrix-vector, matrix-matrix products, decompositions and other numerical methods can have a significant impact in overall performance

Thank you for your time!