Download presentation
Presentation is loading. Please wait.
1
Unconstrained Optimization Rong Jin
2
Recap Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in determining the step size Small step size slow convergence Large step size oscillation or bubbling
3
Recap: Newton Method Univariate Newton method Mulvariate Newton method Guarantee to converge when the objective function is convex/concave Hessian matrix
4
Recap Problem with standard Newton method Computing inverse of Hessian matrix H is expensive (O(n^3)) The size of Hessian matrix H can be very large (O(n^2)) Quasi-Newton method (BFGS): Approximate the inverse of Hessian matrix H with another matrix B Avoid the difficulty in computing inverse of H However, still have problem when the size of B is large Limited memory Quasi-Newton method (L-BFGS) Storing a set of vectors instead of matrix B Avoid the difficulty in computing the inverse of H Avoid the difficulty in storing the large-size B
5
Recap Number of Variable Standard Newton method: O(n 3 ) Small Medium Quasi Newton method (BFGS): O(n 2 ) Limited-memory Quasi Newton method (L-BFGS): O(n) Large Convergence Rate V-Fast Fast R-Fast
6
Empirical Study: Learning Conditional Exponential Model DatasetInstancesFeatures Rule29,602246 Lex42,509135,182 Summary24,044198,467 Shallow8,625,782264,142 DatasetIterationsTime (s) Rule3504.8 811.13 Lex1545114.21 17620.02 Summary3321190.22 698.52 Shallow1452785962.53 4212420.30 Limited-memory Quasi-Newton method Gradient ascent
7
Free Software http://www.ece.northwestern.edu/~nocedal/so ftware.html http://www.ece.northwestern.edu/~nocedal/so ftware.html L-BFGS L-BFGSB
8
Conjugate Gradient Another Great Numerical Optimization Method !
9
Linear Conjugate Gradient Method Consider optimizing the quadratic function Conjugate vectors The set of vector {p 1, p 2, …, p l } is said to be conjugate with respect to a matrix A if Important property The quadratic function can be optimized by simply optimizing the function along individual direction in the conjugate set. Optimal solution: k is the minimizer along the kth conjugate direction
10
Example Minimize the following function Matrix A Conjugate direction Optimization First direction, x 1 = x 2 =x: Second direction, x 1 =- x 2 =x: Solution: x 1 = x 2 =1
11
How to Efficiently Find a Set of Conjugate Directions Iterative procedure Given conjugate directions {p 1,p 2,…, p k-1 } Set p k as follows: Theorem: The direction generated in the above step is conjugate to all previous directions {p 1,p 2,…, p k-1 }, i.e., Note: compute the k direction p k only requires the previous direction p k-1
12
Nonlinear Conjugate Gradient Even though conjugate gradient is derived for a quadratic objective function, it can be applied directly to other nonlinear functions Guarantee convergence if the objective is convex/concave Variants: Fletcher-Reeves conjugate gradient (FR-CG) Polak-Ribiere conjugate gradient (PR-CG) More robust than FR-CG Compared to Newton method The first order method Usually less efficient than Newton method However, it is simple to implement
13
Empirical Study: Learning Conditional Exponential Model DatasetInstancesFeatures Rule29,602246 Lex42,509135,182 Summary24,044198,467 Shallow8,625,782264,142 DatasetIterationsTime (s) Rule1421.93 811.13 Lex28121.72 17620.02 Summary53731.66 698.52 Shallow281316251.12 4212420.30 Limited-memory Quasi-Newton method Conjugate Gradient (PR)
14
Free Software http://www.ece.northwestern.edu/~nocedal/so ftware.html http://www.ece.northwestern.edu/~nocedal/so ftware.html CG+
15
When Should We Use Which Optimization Technique Using Newton method if you can find a package Using conjugate gradient if you have to implement it Using gradient ascent/descent if you are lazy
16
Logarithm Bound Algorithms To maximize Start with a guess Do it for t = 1, 2, …, T Compute Find a decoupling function Find optimal solution Touch Point
17
Logarithm Bound Algorithm Start with initial guess x 0 Come up with a lower bounded function (x) f(x) + f(x 0 ) Touch point: (x 0 ) =0 Touch Point Optimal solution x 1 for (x)
18
Logarithm Bound Algorithm Start with initial guess x 0 Come up with a lower bounded function (x) f(x) + f(x 0 ) Touch point: (x 0 ) =0 Optimal solution x 1 for (x) Repeat the above procedure
19
Logarithm Bound Algorithm Start with initial guess x 0 Come up with a lower bounded function (x) f(x) + f(x 0 ) Touch point: (x 0 ) =0 Optimal solution x 1 for (x) Repeat the above procedure Converge to the optimal point Optimal Point
20
Property of Concave Functions For any concave function
21
Important Inequality log(x), -exp(x) are concave functions Therefore
22
Expectation-Maximization Algorithm Derive the EM algorithm for Hierarchical Mixture Model m 1 (x) r(x) m 2 (x) X y Log-likelihood of training data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.