Outline Preface Fundamentals of Optimization

Outline Preface Fundamentals of Optimization
®Copyright of Shun-Feng Su Outline Preface Fundamentals of Optimization Unconstrained Optimization Ideas of finding solutions One-Dimensional Search Gradient Methods Newton’s Method and Its Variations

®Copyright of Shun-Feng Su
Gradient Methods Increment approach is to find which way can improve the current situation based on the current error. (back forward approach) Usually, an incremental approach is to update the parameter vector as x(k+1)=x(k)+x. In fact, such an approach is usually fulfilled as a gradient approach; that is x=f(x)/x. Need to find a relationship between the current error and the change of the variable considered; that is why x=f(x)/x is employed. Sept, 2010

Gradient Methods These methods use the gradient of the given function in searching for the minimizer. The gradient acts in such a direction that for a given small displacement, the function increase more in the gradient direction than in any other direction. When for any ||d||=1, <f, d>||f|| (Cauchy-Schwarz inequality) Also, <f, f / ||f|| >=||f|| Note we are now considering multi-variable functions. Inner production

x(k+1)=x(k)kf(x(k))
®Copyright of Shun-Feng Su Gradient Methods Thus, the iteration algorithm is x(k+1)=x(k)kf(x(k)) k is called the step size. This is often referred to as the gradient decent algorithm. The issue is how to select k. Usually, it is a constant and is selected in a ad hoc manner. small  long searching time. large  zigzag path to the minimizer. Sept, 2010

Gradient Methods Idea of level set sequences of steepest descent
®Copyright of Shun-Feng Su Gradient Methods Idea of level set sequences of steepest descent Sept, 2010

k =arg min0f(x(k) f(x(k)))
®Copyright of Shun-Feng Su Gradient Methods The steepest descent is to select k to achieve the maximum amount of decrease of the function (i.e., to minimize k()f(x(k) f(x(k)))), or k =arg min0f(x(k) f(x(k))) arg means the argument that can achieve the required. arg min0 means the  value that can achieve the minimum for 0. Thus, we can conduct a line search in the direction of f(x(k)) to find x(k+1). It is called the steepest descent method. Sept, 2010

Gradient Methods If {x(k)}k=0 is a steepest descent sequence for a given function, then for each k, (x(k+1)x(k)) is orthogonal to (x(k+2)x(k+1)). Orthogonal means <(x(k+1)x(k)), (x(k+2)x(k+1))>=0. Proof: <(x(k+1)x(k)), (x(k+2)x(k+1))>= k k+1 <f(x(k)), f(x(k+1))> Note k =arg min0f(x(k)f(x(k))) or arg min0 k(). With FONC, k’(k)=0= dk(k)/dk= f(x(k)k f(x(k)))T.(-f(x(k))=<f(x(k+1)), f(x(k))> The proof is complete.

Gradient Methods Let {x(k)}k=0 be a steepest descent sequence for a given function. if f(x(k))0, then f(x(k+1))<f(x(k)). Proof: x(k+1)=x(k)kf(x(k)) and k =arg min0 k(). Thus, k( k)k() for all 0. It is easy to see f(x(k+1))=k(k)f(x(k))=k(0). Not sufficient

Gradient Methods Let {x(k)}k=0 be a steepest descent sequence for a given function. if f(x(k))0, then f(x(k+1))<f(x(k)). Proof: x(k+1)=x(k)kf(x(k)) and k =arg min0 k(). Thus, k( k)k() for all 0. It is easy to see f(x(k+1))=k(k)f(x(k))=k(0). Consider ’k(0)= f(x(k)0f(x(k)))T.(-f(x(k))= ||f(x(k))|| Since f(x(k))0, ’k(0)<0. It implies that there exist an ~>0 such that k(~)<k(0). f(x(k+1))=k(k)k(~)<k(0) = f(x(k)). The proof is complete. Not used

Gradient Methods If f(x(k))=0, then f(x(k+1))=f(x(k)). It means x(k) satisfies the FONC. It is a stopping (termination) criterion. However, this criterion is not directly suitable as a practical stopping criterion because f(x(k))=0 may not be obtained in practical cases. A practical stopping criterion is to check ||f(x(k))|| is less than a pre-specified threshold  or to check whether |f(x(k+1)) f(x(k))|< (or relatively, divided by |f(x(k))|). Another alternative is ||x(k+1)) x(k)||< (or relatively divided by ||x(k)||). preferable

Gradient Methods Relative criterions are preferable because they are scale-independent (scaling the objective function will not change the satisfaction of the criterion.) A relative criterion like whether |f(x(k+1)) f(x(k))|/|f(x(k))|< may encounter problems when |f(x(k))| is very small. Thus, sometimes, we can use |f(x(k+1)) f(x(k))|/(max(1, |f(x(k))|))< .

Gradient Methods Example: consider
®Copyright of Shun-Feng Su Gradient Methods Example: consider Ans: Let the initial point is x(0)=[4, 2, -1]T. f(x)= f(x(0))=[0, -2, 1024] T. 0 =arg min0f(x(0)f(x(0))), by using the secant method, 0 =3.96710-3. x(1)=[4.0, 2.008, ]T.

Any method can be used to find the minimizer.
®Copyright of Shun-Feng Su Gradient Methods Example: consider Ans: Let the initial point is x(0)=[4, 2, -1]T. f(x)= f(x(0))=[0, -2, 1024] T. 0 =arg min0f(x(0)f(x(0))), by using the secant method, 0 =3.96710-3. x(1)=[4.0, 2.008, ]T. Any method can be used to find the minimizer.

Gradient Methods f(x(1))=[0, -1.984, -0.003875]T.
®Copyright of Shun-Feng Su Gradient Methods f(x(1))=[0, , ]T. 1 =arg min0f(x(1)f(x(1))),  1 =0.5. x(2)=[4.0, 3.0, ]T. f(x(2))=[0.0, 0.0, ]T. 2 =arg min0f(x(2)f(x(2))), 2 =16.29. x(3)=[4.0, 3.0, ]T. Note that the minimizer is [4, 3, -5]. In three iterations, it almost reaches the minimizer.

the Hessian matrix of f or H(x)=2f(x) =Q
®Copyright of Shun-Feng Su Gradient Methods Consider a quadratic function in steepest descent: f(x)=1/2 xTQx-bTx f(x)=Qx-b Assume Q is a symmetric matrix (if not, say AAT, xTAx=(xTAx)T=xTATx, Then xTAx=1/2(xTAx+xTATx) =1/2xT(A+AT)x=1/2xTQx the Hessian matrix of f or H(x)=2f(x) =Q The steepest descent x(k+1)=x(k)kf(x(k)) Scalar. symmetric

Gradient Methods

Gradient Methods To find arg min0f(x(k)g(k)).
®Copyright of Shun-Feng Su Gradient Methods To find arg min0f(x(k)g(k)). Define g(k)=f(x(k)) and k()=f(x(k)g(k)) Assume g(k) 0, (if g(k)=0, x(k)=x*) k()=1/2(x(k)g(k))TQ(x(k)g(k))(x(k)g(k))Tb) ’k()=(x(k)g(k))TQ(g(k))bT(g(k)) Let ’k(k)=0, we have k =(g(k)Tg(k))/(g(k)TQg(k))  explicit fomula for k or x(k+1)=x(k)[(g(k)Tg(k))/(g(k)TQg(k))] g(k)

Gradient Methods Note that the above (an implicit form for the steepest descent approach) is for quadratic form only. There are also some analysis about the convergence property and convergence rate. But usually, quadratic form is a simple problem. However, if you are studying on the convergence properties, you can check with those details in the references.

Gradient Methods An important result is about fixed step gradient algorithm (still for a quadratic form): For the fixed step size quadratic algorithm, x(k)x* for any x(0) if and only if 0<< 2/min(Q), where min(Q) denotes the maximal eigenvalue of Q. Note that it is only for quadratic form, but somehow this can be used in convergence analysis in general problems.

Gradient Methods

Gradient Methods Selected homework in Prob 3:
®Copyright of Shun-Feng Su Gradient Methods Selected homework in Prob 3: 8.5, 8.6, 8.13, 8.14 and 8.17

Outline Preface Fundamentals of Optimization

Similar presentations

Presentation on theme: "Outline Preface Fundamentals of Optimization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Outline Preface Fundamentals of Optimization

Similar presentations

Presentation on theme: "Outline Preface Fundamentals of Optimization"— Presentation transcript:

Similar presentations

About project

Feedback