Outline Preface Fundamentals of Optimization

Slides:



Advertisements
Similar presentations
Optimization.
Advertisements

Optimization of thermal processes
Optimization 吳育德.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
Linear Discriminant Functions
1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)
Numerical Optimization
Function Optimization Newton’s Method. Conjugate Gradients
Tutorial 12 Unconstrained optimization Conjugate gradients.
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
Optimization Methods One-Dimensional Unconstrained Optimization
Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn
Optimization Methods One-Dimensional Unconstrained Optimization
Unconstrained Optimization Problem
Function Optimization. Newton’s Method Conjugate Gradients Method
Advanced Topics in Optimization
Why Function Optimization ?
Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.
Optimization Methods One-Dimensional Unconstrained Optimization
Tier I: Mathematical Methods of Optimization

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Computational Optimization
UNCONSTRAINED MULTIVARIABLE
ENCI 303 Lecture PS-19 Optimization 2
Nonlinear programming Unconstrained optimization techniques.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Exact Differentiable Exterior Penalty for Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison December 20, 2015 TexPoint.
1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Optimal Control.
Function Optimization
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Chapter 11 Optimization with Equality Constraints
Non-linear Minimization
Computational Optimization
Dr. Arslan Ornek IMPROVING SEARCH
Chapter 14.
CS5321 Numerical Optimization
Chap 3. The simplex method
CS5321 Numerical Optimization
Aviv Rosenberg 10/01/18 Seminar on Experts and Bandits
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
CS5321 Numerical Optimization
Optimization Part II G.Anuradha.
Instructor :Dr. Aamer Iqbal Bhatti
METHOD OF STEEPEST DESCENT
~ Least Squares example
Outline Unconstrained Optimization Functions of One Variable
Generally Discriminant Analysis
~ Least Squares example
Least Squares Now, we go back to consider a simple problem: Ax=b
Nonlinear programming
Performance Optimization
Outline Preface Fundamentals of Optimization
L23 Numerical Methods part 3
Section 3: Second Order Methods
Conjugate Direction Methods
Presentation transcript:

Outline Preface Fundamentals of Optimization ®Copyright of Shun-Feng Su Outline Preface Fundamentals of Optimization Unconstrained Optimization Ideas of finding solutions One-Dimensional Search Gradient Methods Newton’s Method and Its Variations

®Copyright of Shun-Feng Su Gradient Methods Increment approach is to find which way can improve the current situation based on the current error. (back forward approach) Usually, an incremental approach is to update the parameter vector as x(k+1)=x(k)+x. In fact, such an approach is usually fulfilled as a gradient approach; that is x=f(x)/x. Need to find a relationship between the current error and the change of the variable considered; that is why x=f(x)/x is employed. Sept, 2010

®Copyright of Shun-Feng Su Gradient Methods These methods use the gradient of the given function in searching for the minimizer. The gradient acts in such a direction that for a given small displacement, the function increase more in the gradient direction than in any other direction. When for any ||d||=1, <f, d>||f|| (Cauchy-Schwarz inequality) Also, <f, f / ||f|| >=||f|| Note we are now considering multi-variable functions. Inner production

x(k+1)=x(k)kf(x(k)) ®Copyright of Shun-Feng Su Gradient Methods Thus, the iteration algorithm is x(k+1)=x(k)kf(x(k)) k is called the step size. This is often referred to as the gradient decent algorithm. The issue is how to select k. Usually, it is a constant and is selected in a ad hoc manner. small  long searching time. large  zigzag path to the minimizer. Sept, 2010

Gradient Methods Idea of level set sequences of steepest descent ®Copyright of Shun-Feng Su Gradient Methods Idea of level set sequences of steepest descent Sept, 2010

k =arg min0f(x(k) f(x(k))) ®Copyright of Shun-Feng Su Gradient Methods The steepest descent is to select k to achieve the maximum amount of decrease of the function (i.e., to minimize k()f(x(k) f(x(k)))), or k =arg min0f(x(k) f(x(k))) arg means the argument that can achieve the required. arg min0 means the  value that can achieve the minimum for 0. Thus, we can conduct a line search in the direction of f(x(k)) to find x(k+1). It is called the steepest descent method. Sept, 2010

®Copyright of Shun-Feng Su Gradient Methods If {x(k)}k=0 is a steepest descent sequence for a given function, then for each k, (x(k+1)x(k)) is orthogonal to (x(k+2)x(k+1)). Orthogonal means <(x(k+1)x(k)), (x(k+2)x(k+1))>=0. Proof: <(x(k+1)x(k)), (x(k+2)x(k+1))>= k k+1 <f(x(k)), f(x(k+1))> Note k =arg min0f(x(k)f(x(k))) or arg min0 k(). With FONC, k’(k)=0= dk(k)/dk= f(x(k)k f(x(k)))T.(-f(x(k))=<f(x(k+1)), f(x(k))> The proof is complete.

®Copyright of Shun-Feng Su Gradient Methods Let {x(k)}k=0 be a steepest descent sequence for a given function. if f(x(k))0, then f(x(k+1))<f(x(k)). Proof: x(k+1)=x(k)kf(x(k)) and k =arg min0 k(). Thus, k( k)k() for all 0. It is easy to see f(x(k+1))=k(k)f(x(k))=k(0). Not sufficient

®Copyright of Shun-Feng Su Gradient Methods Let {x(k)}k=0 be a steepest descent sequence for a given function. if f(x(k))0, then f(x(k+1))<f(x(k)). Proof: x(k+1)=x(k)kf(x(k)) and k =arg min0 k(). Thus, k( k)k() for all 0. It is easy to see f(x(k+1))=k(k)f(x(k))=k(0). Consider ’k(0)= f(x(k)0f(x(k)))T.(-f(x(k))= ||f(x(k))|| Since f(x(k))0, ’k(0)<0. It implies that there exist an ~>0 such that k(~)<k(0). f(x(k+1))=k(k)k(~)<k(0) = f(x(k)). The proof is complete. Not used

®Copyright of Shun-Feng Su Gradient Methods If f(x(k))=0, then f(x(k+1))=f(x(k)). It means x(k) satisfies the FONC. It is a stopping (termination) criterion. However, this criterion is not directly suitable as a practical stopping criterion because f(x(k))=0 may not be obtained in practical cases. A practical stopping criterion is to check ||f(x(k))|| is less than a pre-specified threshold  or to check whether |f(x(k+1)) f(x(k))|< (or relatively, divided by |f(x(k))|). Another alternative is ||x(k+1)) x(k)||< (or relatively divided by ||x(k)||). preferable

®Copyright of Shun-Feng Su Gradient Methods Relative criterions are preferable because they are scale-independent (scaling the objective function will not change the satisfaction of the criterion.) A relative criterion like whether |f(x(k+1)) f(x(k))|/|f(x(k))|< may encounter problems when |f(x(k))| is very small. Thus, sometimes, we can use |f(x(k+1)) f(x(k))|/(max(1, |f(x(k))|))< .

Gradient Methods Example: consider ®Copyright of Shun-Feng Su Gradient Methods Example: consider Ans: Let the initial point is x(0)=[4, 2, -1]T. f(x)= f(x(0))=[0, -2, 1024] T. 0 =arg min0f(x(0)f(x(0))), by using the secant method, 0 =3.96710-3. x(1)=[4.0, 2.008, -5.062]T.

Any method can be used to find the minimizer. ®Copyright of Shun-Feng Su Gradient Methods Example: consider Ans: Let the initial point is x(0)=[4, 2, -1]T. f(x)= f(x(0))=[0, -2, 1024] T. 0 =arg min0f(x(0)f(x(0))), by using the secant method, 0 =3.96710-3. x(1)=[4.0, 2.008, -5.062]T. Any method can be used to find the minimizer.

Gradient Methods f(x(1))=[0, -1.984, -0.003875]T. ®Copyright of Shun-Feng Su Gradient Methods f(x(1))=[0, -1.984, -0.003875]T. 1 =arg min0f(x(1)f(x(1))),  1 =0.5. x(2)=[4.0, 3.0, -5.060]T. f(x(2))=[0.0, 0.0, -0.003525]T. 2 =arg min0f(x(2)f(x(2))), 2 =16.29. x(3)=[4.0, 3.0, -5.002]T. Note that the minimizer is [4, 3, -5]. In three iterations, it almost reaches the minimizer.

the Hessian matrix of f or H(x)=2f(x) =Q ®Copyright of Shun-Feng Su Gradient Methods Consider a quadratic function in steepest descent: f(x)=1/2 xTQx-bTx f(x)=Qx-b Assume Q is a symmetric matrix (if not, say AAT, xTAx=(xTAx)T=xTATx, Then xTAx=1/2(xTAx+xTATx) =1/2xT(A+AT)x=1/2xTQx the Hessian matrix of f or H(x)=2f(x) =Q The steepest descent x(k+1)=x(k)kf(x(k)) Scalar. symmetric

®Copyright of Shun-Feng Su Gradient Methods

Gradient Methods To find arg min0f(x(k)g(k)). ®Copyright of Shun-Feng Su Gradient Methods To find arg min0f(x(k)g(k)). Define g(k)=f(x(k)) and k()=f(x(k)g(k)) Assume g(k) 0, (if g(k)=0, x(k)=x*) k()=1/2(x(k)g(k))TQ(x(k)g(k))(x(k)g(k))Tb) ’k()=(x(k)g(k))TQ(g(k))bT(g(k)) Let ’k(k)=0, we have k =(g(k)Tg(k))/(g(k)TQg(k))  explicit fomula for k or x(k+1)=x(k)[(g(k)Tg(k))/(g(k)TQg(k))] g(k)

®Copyright of Shun-Feng Su Gradient Methods Note that the above (an implicit form for the steepest descent approach) is for quadratic form only. There are also some analysis about the convergence property and convergence rate. But usually, quadratic form is a simple problem. However, if you are studying on the convergence properties, you can check with those details in the references.

®Copyright of Shun-Feng Su Gradient Methods An important result is about fixed step gradient algorithm (still for a quadratic form): For the fixed step size quadratic algorithm, x(k)x* for any x(0) if and only if 0<< 2/min(Q), where min(Q) denotes the maximal eigenvalue of Q. Note that it is only for quadratic form, but somehow this can be used in convergence analysis in general problems.

®Copyright of Shun-Feng Su Gradient Methods

Gradient Methods Selected homework in Prob 3: ®Copyright of Shun-Feng Su Gradient Methods Selected homework in Prob 3: 8.5, 8.6, 8.13, 8.14 and 8.17