Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.

Slides:

Advertisements

Similar presentations

Zhen Lu CPACT University of Newcastle MDC Technology Reduced Hessian Sequential Quadratic Programming(SQP)

Advertisements

Optimization of thermal processes

Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.

Empirical Maximum Likelihood and Stochastic Process Lecture VIII.

Visual Recognition Tutorial

1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)

1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.

Function Optimization Newton’s Method. Conjugate Gradients

1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .

Tutorial 12 Unconstrained optimization Conjugate gradients.

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.

Design Optimization School of Engineering University of Bradford 1 Numerical optimization techniques Unconstrained multi-parameter optimization techniques.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.

Optimization Methods One-Dimensional Unconstrained Optimization

The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize.

The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:

Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.

Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn

Expectation Maximization Algorithm

Optimization Methods One-Dimensional Unconstrained Optimization

Unconstrained Optimization Problem

Function Optimization. Newton’s Method Conjugate Gradients Method

Advanced Topics in Optimization

Newton's Method for Functions of Several Variables

Why Function Optimization ?

An Introduction to Optimization Theory. Outline Introduction Unconstrained optimization problem Constrained optimization problem.

Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.

Optimization Methods One-Dimensional Unconstrained Optimization

Unconstrained Optimization Rong Jin. Logistic Regression The optimization problem is to find weights w and b that maximizes the above log-likelihood How.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.

Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)

UNCONSTRAINED MULTIVARIABLE

MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 9. Optimization problems.

ENCI 303 Lecture PS-19 Optimization 2

84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.

Computing a posteriori covariance in variational DA I.Gejadze, F.-X. Le Dimet, V.Shutyaev.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.

Application of Differential Applied Optimization Problems.

Nonlinear programming Unconstrained optimization techniques.

Nonlinear Programming.  A nonlinear program (NLP) is similar to a linear program in that it is composed of an objective function, general constraints,

Fin500J: Mathematical Foundations in Finance

1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.

1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)

HMM - Part 2 The EM algorithm Continuous density HMM.

559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.

A comparison between PROC NLP and PROC OPTMODEL Optimization Algorithm Chin Hwa Tan December 3, 2008.

Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

Lecture 13. Geometry Optimization References Computational chemistry: Introduction to the theory and applications of molecular and quantum mechanics, E.

1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Exam 1 Oct 3, closed book Place ITE 119, Time:12:30-1:45pm

Survey of unconstrained optimization gradient based algorithms

Hand-written character recognition

Data assimilation for weather forecasting G.W. Inverarity 06/05/15.

INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.

Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.

Optimal Control.

Non-linear Minimization

CS5321 Numerical Optimization

CS5321 Numerical Optimization

~ Least Squares example

~ Least Squares example

Performance Optimization

Section 3: Second Order Methods

Presentation transcript:

Unconstrained Optimization Rong Jin

Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in determining the step size  Small step size  slow convergence  Large step size  oscillation or bubbling

Recap: Newton Method  Univariate Newton method  Mulvariate Newton method  Guarantee to converge when the objective function is convex/concave Hessian matrix

Recap  Problem with standard Newton method Computing inverse of Hessian matrix H is expensive (O(n^3)) The size of Hessian matrix H can be very large (O(n^2))  Quasi-Newton method (BFGS): Approximate the inverse of Hessian matrix H with another matrix B Avoid the difficulty in computing inverse of H However, still have problem when the size of B is large  Limited memory Quasi-Newton method (L-BFGS) Storing a set of vectors instead of matrix B Avoid the difficulty in computing the inverse of H Avoid the difficulty in storing the large-size B

Recap Number of Variable Standard Newton method: O(n 3 ) Small Medium Quasi Newton method (BFGS): O(n 2 ) Limited-memory Quasi Newton method (L-BFGS): O(n) Large Convergence Rate V-Fast Fast R-Fast

Empirical Study: Learning Conditional Exponential Model DatasetInstancesFeatures Rule29, Lex42,509135,182 Summary24,044198,467 Shallow8,625,782264,142 DatasetIterationsTime (s) Rule Lex Summary Shallow Limited-memory Quasi-Newton method Gradient ascent

Free Software  ftware.html ftware.html L-BFGS L-BFGSB

Conjugate Gradient  Another Great Numerical Optimization Method !

Linear Conjugate Gradient Method  Consider optimizing the quadratic function  Conjugate vectors The set of vector {p 1, p 2, …, p l } is said to be conjugate with respect to a matrix A if Important property  The quadratic function can be optimized by simply optimizing the function along individual direction in the conjugate set. Optimal solution:   k is the minimizer along the kth conjugate direction

Example  Minimize the following function  Matrix A  Conjugate direction  Optimization First direction, x 1 = x 2 =x: Second direction, x 1 =- x 2 =x: Solution: x 1 = x 2 =1

How to Efficiently Find a Set of Conjugate Directions  Iterative procedure Given conjugate directions {p 1,p 2,…, p k-1 } Set p k as follows: Theorem: The direction generated in the above step is conjugate to all previous directions {p 1,p 2,…, p k-1 }, i.e., Note: compute the k direction p k only requires the previous direction p k-1

Nonlinear Conjugate Gradient  Even though conjugate gradient is derived for a quadratic objective function, it can be applied directly to other nonlinear functions Guarantee convergence if the objective is convex/concave  Variants: Fletcher-Reeves conjugate gradient (FR-CG) Polak-Ribiere conjugate gradient (PR-CG)  More robust than FR-CG  Compared to Newton method The first order method Usually less efficient than Newton method However, it is simple to implement

Empirical Study: Learning Conditional Exponential Model DatasetInstancesFeatures Rule29, Lex42,509135,182 Summary24,044198,467 Shallow8,625,782264,142 DatasetIterationsTime (s) Rule Lex Summary Shallow Limited-memory Quasi-Newton method Conjugate Gradient (PR)

Free Software  ftware.html ftware.html CG+

When Should We Use Which Optimization Technique  Using Newton method if you can find a package  Using conjugate gradient if you have to implement it  Using gradient ascent/descent if you are lazy

Logarithm Bound Algorithms  To maximize Start with a guess Do it for t = 1, 2, …, T  Compute  Find a decoupling function  Find optimal solution  Touch Point

Logarithm Bound Algorithm Start with initial guess x 0 Come up with a lower bounded function  (x)  f(x) + f(x 0 ) Touch point:  (x 0 ) =0 Touch Point Optimal solution x 1 for  (x)

Logarithm Bound Algorithm Start with initial guess x 0 Come up with a lower bounded function  (x)  f(x) + f(x 0 ) Touch point:  (x 0 ) =0 Optimal solution x 1 for  (x) Repeat the above procedure

Logarithm Bound Algorithm Start with initial guess x 0 Come up with a lower bounded function  (x)  f(x) + f(x 0 ) Touch point:  (x 0 ) =0 Optimal solution x 1 for  (x) Repeat the above procedure Converge to the optimal point Optimal Point

Property of Concave Functions  For any concave function

Important Inequality  log(x), -exp(x) are concave functions  Therefore

Expectation-Maximization Algorithm  Derive the EM algorithm for Hierarchical Mixture Model m 1 (x) r(x) m 2 (x) X y  Log-likelihood of training data