Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.

Slides:



Advertisements
Similar presentations
Zhen Lu CPACT University of Newcastle MDC Technology Reduced Hessian Sequential Quadratic Programming(SQP)
Advertisements

Optimization of thermal processes
Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.
Empirical Maximum Likelihood and Stochastic Process Lecture VIII.
Visual Recognition Tutorial
1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Function Optimization Newton’s Method. Conjugate Gradients
1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .
Tutorial 12 Unconstrained optimization Conjugate gradients.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Design Optimization School of Engineering University of Bradford 1 Numerical optimization techniques Unconstrained multi-parameter optimization techniques.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Optimization Methods One-Dimensional Unconstrained Optimization
The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize.
The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn
Expectation Maximization Algorithm
Optimization Methods One-Dimensional Unconstrained Optimization
Unconstrained Optimization Problem
Function Optimization. Newton’s Method Conjugate Gradients Method
Advanced Topics in Optimization
Newton's Method for Functions of Several Variables
Why Function Optimization ?
An Introduction to Optimization Theory. Outline Introduction Unconstrained optimization problem Constrained optimization problem.
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Optimization Methods One-Dimensional Unconstrained Optimization
Unconstrained Optimization Rong Jin. Logistic Regression The optimization problem is to find weights w and b that maximizes the above log-likelihood How.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
UNCONSTRAINED MULTIVARIABLE
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 9. Optimization problems.
ENCI 303 Lecture PS-19 Optimization 2
84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.
Computing a posteriori covariance in variational DA I.Gejadze, F.-X. Le Dimet, V.Shutyaev.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.
Application of Differential Applied Optimization Problems.
Nonlinear programming Unconstrained optimization techniques.
Nonlinear Programming.  A nonlinear program (NLP) is similar to a linear program in that it is composed of an objective function, general constraints,
Fin500J: Mathematical Foundations in Finance
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
HMM - Part 2 The EM algorithm Continuous density HMM.
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
A comparison between PROC NLP and PROC OPTMODEL Optimization Algorithm Chin Hwa Tan December 3, 2008.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Lecture 13. Geometry Optimization References Computational chemistry: Introduction to the theory and applications of molecular and quantum mechanics, E.
1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Exam 1 Oct 3, closed book Place ITE 119, Time:12:30-1:45pm
Survey of unconstrained optimization gradient based algorithms
Hand-written character recognition
Data assimilation for weather forecasting G.W. Inverarity 06/05/15.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
Optimal Control.
Non-linear Minimization
CS5321 Numerical Optimization
CS5321 Numerical Optimization
~ Least Squares example
~ Least Squares example
Performance Optimization
Section 3: Second Order Methods
Presentation transcript:

Unconstrained Optimization Rong Jin

Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in determining the step size  Small step size  slow convergence  Large step size  oscillation or bubbling

Recap: Newton Method  Univariate Newton method  Mulvariate Newton method  Guarantee to converge when the objective function is convex/concave Hessian matrix

Recap  Problem with standard Newton method Computing inverse of Hessian matrix H is expensive (O(n^3)) The size of Hessian matrix H can be very large (O(n^2))  Quasi-Newton method (BFGS): Approximate the inverse of Hessian matrix H with another matrix B Avoid the difficulty in computing inverse of H However, still have problem when the size of B is large  Limited memory Quasi-Newton method (L-BFGS) Storing a set of vectors instead of matrix B Avoid the difficulty in computing the inverse of H Avoid the difficulty in storing the large-size B

Recap Number of Variable Standard Newton method: O(n 3 ) Small Medium Quasi Newton method (BFGS): O(n 2 ) Limited-memory Quasi Newton method (L-BFGS): O(n) Large Convergence Rate V-Fast Fast R-Fast

Empirical Study: Learning Conditional Exponential Model DatasetInstancesFeatures Rule29, Lex42,509135,182 Summary24,044198,467 Shallow8,625,782264,142 DatasetIterationsTime (s) Rule Lex Summary Shallow Limited-memory Quasi-Newton method Gradient ascent

Free Software  ftware.html ftware.html L-BFGS L-BFGSB

Conjugate Gradient  Another Great Numerical Optimization Method !

Linear Conjugate Gradient Method  Consider optimizing the quadratic function  Conjugate vectors The set of vector {p 1, p 2, …, p l } is said to be conjugate with respect to a matrix A if Important property  The quadratic function can be optimized by simply optimizing the function along individual direction in the conjugate set. Optimal solution:   k is the minimizer along the kth conjugate direction

Example  Minimize the following function  Matrix A  Conjugate direction  Optimization First direction, x 1 = x 2 =x: Second direction, x 1 =- x 2 =x: Solution: x 1 = x 2 =1

How to Efficiently Find a Set of Conjugate Directions  Iterative procedure Given conjugate directions {p 1,p 2,…, p k-1 } Set p k as follows: Theorem: The direction generated in the above step is conjugate to all previous directions {p 1,p 2,…, p k-1 }, i.e., Note: compute the k direction p k only requires the previous direction p k-1

Nonlinear Conjugate Gradient  Even though conjugate gradient is derived for a quadratic objective function, it can be applied directly to other nonlinear functions Guarantee convergence if the objective is convex/concave  Variants: Fletcher-Reeves conjugate gradient (FR-CG) Polak-Ribiere conjugate gradient (PR-CG)  More robust than FR-CG  Compared to Newton method The first order method Usually less efficient than Newton method However, it is simple to implement

Empirical Study: Learning Conditional Exponential Model DatasetInstancesFeatures Rule29, Lex42,509135,182 Summary24,044198,467 Shallow8,625,782264,142 DatasetIterationsTime (s) Rule Lex Summary Shallow Limited-memory Quasi-Newton method Conjugate Gradient (PR)

Free Software  ftware.html ftware.html CG+

When Should We Use Which Optimization Technique  Using Newton method if you can find a package  Using conjugate gradient if you have to implement it  Using gradient ascent/descent if you are lazy

Logarithm Bound Algorithms  To maximize Start with a guess Do it for t = 1, 2, …, T  Compute  Find a decoupling function  Find optimal solution  Touch Point

Logarithm Bound Algorithm Start with initial guess x 0 Come up with a lower bounded function  (x)  f(x) + f(x 0 ) Touch point:  (x 0 ) =0 Touch Point Optimal solution x 1 for  (x)

Logarithm Bound Algorithm Start with initial guess x 0 Come up with a lower bounded function  (x)  f(x) + f(x 0 ) Touch point:  (x 0 ) =0 Optimal solution x 1 for  (x) Repeat the above procedure

Logarithm Bound Algorithm Start with initial guess x 0 Come up with a lower bounded function  (x)  f(x) + f(x 0 ) Touch point:  (x 0 ) =0 Optimal solution x 1 for  (x) Repeat the above procedure Converge to the optimal point Optimal Point

Property of Concave Functions  For any concave function

Important Inequality  log(x), -exp(x) are concave functions  Therefore

Expectation-Maximization Algorithm  Derive the EM algorithm for Hierarchical Mixture Model m 1 (x) r(x) m 2 (x) X y  Log-likelihood of training data