Download presentation
Presentation is loading. Please wait.
Published byRolf Ford Modified over 9 years ago
1
1 Computacion Inteligente Derivative-Based Optimization
2
2 Contents Optimization problems Mathematical background Descent Methods The Method of Steepest Descent Conjugate Gradient
3
3 OPTIMIZATION PROBLEMS
4
4 1.Objective function – mathematical function which is optimized by changing the values of the design variables. 2.Design Variables – Those variables which we, as designers, can change. 3.Constraints – Functions of the design variables which establish limits in individual variables or combinations of design variables.
5
5 3 basic ingredients… –an objective function, –a set of decision variables, –a set of equality/inequality constraints. The problem is to search for the values of the decision variables that minimize the objective function while satisfying the constraints…
6
6 –Design Variables: decision and objective vector –Constraints: equality and inequality –Bounds: feasible ranges for variables –Objective Function: maximization can be converted to minimization due to the duality principle ObectiveDecision vector Boundsconstrains
7
7 1.Identify the quantity or function, f, to be optimized. 2.Identify the design variables: x 1, x 2, x 3, …,x n. 3.Identify the constraints if any exist a. Equalities b. Inequalities 4.Adjust the design variables (x’s) until f is optimized and all of the constraints are satisfied.
8
8 1.Objective functions may be unimodal or multimodal. a.Unimodal – only one optimum b.Multimodal – more than one optimum 2.Most search schemes are based on the assumption of a unimodal surface. The optimum determined in such cases is called a local optimum design. 3.The global optimum is the best of all local optimum designs.
9
9 Existence of global minimum If f(x) is continuous on the feasible set S which is closed and bounded, then f(x) has a global minimum in S –A set S is closed if it contains all its boundary pts. –A set S is bounded if it is contained in the interior of some circle compact = closed and bounded
10
10 x1x1 x2x2
11
11 local maxsaddle point
12
12 Derivative-based optimization (gradient based) –Capable of determining “search directions” according to an objective function’s derivative information steepest descent method; Newton’s method; Newton-Raphson method; Conjugate gradient, etc. Derivative-free optimization random search method; genetic algorithm; simulated annealing; etc.
13
13 MATHEMATICAL BACKGROUND
14
14 A square matrix M is positive definite if It is positive semidefinite if for all x ≠ 0 for all x The scalar x T Mx = is called a quadratic form.
15
15 A symmetric matrix M = M T is positive definite if and only if its eigenvalues λ i > 0. (semidefinite ↔ λ i ≥ 0) –Proof (→): Let v i the eigenvector for the i-th eigenvalue λ i –Then, –which implies λ i > 0, prove that positive eigenvalues imply positive definiteness.
16
16 Theorem: If a matrix M = U T U then it is positive definite Proof. Let’s f be defined as If we can show that f is always positive then M must be positive definite. We can write this as Provided that Ux always gives a non zero vector for all values of x except when x = 0 we can write b = U x, i.e. so f must always be positive
17
17 f: R n → R is a quadratic function if –where Q is symmetric.
18
18 It is no necessary for Q be symmetric. –Suposse matrix P non-symmetric Q is symmetric
19
19 –Suposse matrix P non-symmetric. Example Q is symmetric
20
20 Given the quadratic function If Q is positive definite, then f is a parabolic “bowl.”
21
21 Two other shapes can result from the quadratic form. –If Q is negative definite, then f is a parabolic “bowl” up side down. –If Q is indefinite then f describes a saddle.
22
22 Quadratics are useful in the study of optimization. –Often, objective functions are “close to” quadratic near the solution. –It is easier to analyze the behavior of algorithms when applied to quadratics. –Analysis of algorithms for quadratics gives insight into their behavior in general.
23
23 The derivative of f: R → R is a function f ′: R → R given by if the limit exists.
24
24 Along the Axes…
25
25 In general direction…
26
26
27
27 Definition: A real-valued function f: R n → R is said to be continuously differentiable if the partial derivatives exist for each x in R n and are continuous functions of x. In this case, we say f C 1 (a smooth function C 1 )
28
28 Definition: The gradient of f: in R 2 → R: It is a function ∇ f: R 2 → R 2 given by In the plane
29
29 Definition: The gradient of f: R n → R is a function ∇ f: R n → R n given by
30
30 The gradient defines (hyper) plane approximating the function infinitesimally
31
31 By the chain rule
32
32 Proposition 1: is maximal choosing intuitive: the gradient points at the greatest change direction Prove it!
33
33 Proof: –Assign: –by chain rule:
34
34 Proof: –On the other hand for general v:
35
35 Proposition 2: let f: R n → R be a smooth function C 1 around p, if f has local minimum (maximum) at p then, Intuitive: necessary for local min(max)
36
36 Proof: intuitive
37
37 We found the best INFINITESIMAL DIRECTION at each point, Looking for minimum: “blind man” procedure How can we derive the way to the minimum using this knowledge?
38
38 The gradient of f: R n → R m is a function Df: R n → R m×n given by called Jacobian Note that for f: R n → R, we have ∇ f(x) = Df(x) T.
39
39 If the derivative of ∇ f exists, we say that f is twice differentiable. –Write the second derivative as D 2 f (or F), and call it the Hessian of f.
40
40 The level set of a function f: R n → R at level c is the set of points S = {x: f(x) = c}.
41
41 Fact: ∇ f(x 0 ) is orthogonal to the level set at x 0
42
42 Proof of fact: –Imagine a particle traveling along the level set. –Let g(t) be the position of the particle at time t, with g(0) = x 0. –Note that f(g(t)) = constant for all t. –Velocity vector g′(t) is tangent to the level set. –Consider F(t) = f(g(t)). We have F′(0) = 0. By the chain rule, –Hence, ∇ f(x0) and g′(0) are orthogonal.
43
43 Suppose f: R → R is in C 1. Then, –o(h) is a term such that o(h) = h → 0 as h → 0. –At x 0, f can be approximated by a linear function, and the approximation gets better the closer we are to x 0.
44
44 Suppose f: R → R is in C 2. Then, –At x 0, f can be approximated by a quadratic function.
45
45 Suppose f: R n → R. –If f in C 1, then –If f in C 2, then
46
46 We already know that ∇ f(x 0 ) is orthogonal to the level set at x 0. –Suppose ∇ f(x 0 ) ≠ 0. Fact: ∇ f points in the direction of increasing f.
47
47 Consider x α = x 0 + α ∇ f(x 0 ), α > 0. –By Taylor's formula, Therefore, for sufficiently small, f(xα) > f(x0)
48
48 DESCENT METHODS
49
49 This theorem is the link from the previous gradient properties to the constructive algorithm. The problem:
50
50 We introduce a model for algorithm: Data Step 0: set i = 0 Step 1: if stop, Step 2: compute the step-size Step 3: setgo to step 1 else, compute search direction
51
51 The Theorem: –Suppose f: R n → R C 1 smooth, and exist continuous function: k: R n → [0,1], and, –And, the search vectors constructed by the model algorithm satisfy:
52
52 –And Then –if is the sequence constructed by the algorithm model, –then any accumulation point y of this sequence satisfy:
53
53 The theorem has very intuitive interpretation: Always go in descent direction. The principal differences between various descent algorithms lie in the first procedure for determining successive directions
54
54 STEEPEST DESCENT
55
55 We now use what we have learned to implement the most basic minimization technique. First we introduce the algorithm, which is a version of the model algorithm. The problem:
56
56 Steepest descent algorithm: Data Step 0: set i = 0 Step 1: if stop, Step 2: compute the step-size Step 3: setgo to step 1 else, compute search direction
57
57 Theorem: –If is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy: –Proof: from Wolfe theorem Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t given (are calculated numerically).
58
58 How long a step to take? Note search direction is –We are limited to a line search Choose λ to minimize f.... directional derivative is equal to zero.
59
59 How long a step to take? –From the chain rule: Therefore the method of steepest descent looks like this: They are orthogonal !
60
60
61
61 Given: Find the minimum when x 1 is allowed to vary from 0.5 to 1.5 and x 2 is allowed to vary from 0 to 2. λ arbitrary
62
62 Given: Find the minimum when x 1 is allowed to vary from 0.5 to 1.5 and x 2 is allowed to vary from 0 to 2.
63
63 CONJUGATE GRADIENT
64
64 We from now on assume we want to minimize the quadratic function: This is equivalent to solve linear problem: If A symmetric
65
65 La solucion es la interseccion de las lineas
66
66 –Cada elipsoide tiene f(x) constante In general, the solution x lies at the intersection point of n hyperplanes, each having dimension n – 1.
67
67 What is the problem with steepest descent? –We can repeat the same directions over and over… Wouldn’t it be better if, every time we took a step, we got it right the first time?
68
68 What is the problem with steepest descent? –We can repeat the same directions over and over… Conjugate gradient requires n gradient evaluations and n line searches.
69
69 First, let’s define de error as e i is a vector that indicates how far we are from the solution. solution Start point
70
70 Let’s pick a set of orthogonal search directions (should span R n ) –In each search direction, we’ll take exactly one step, that step will be just the right length to line up evenly with
71
71 –Unfortunately, this method only works if you already know the answer. Using the coordinate axes as search directions…
72
72 We have
73
73 Given, how do we calculate ? e i+ 1 should be orthogonal to d i
74
74 Given, how do we calculate ? –That is
75
75 How do we find ? –Since search vectors form a basis On the other hand
76
76 We want that after n step the error will be 0: –Here an idea: if then: So if:
77
77 So we look for such that –Simple calculation shows that if we take The correct choice is
78
78 Conjugate gradient algorithm for minimizing f: Step 4: and repeat n times Step 1: Data Step 0: Step 3: Step 2:
79
79 Sources J-Shing Roger Jang, Chuen-Tsai Sun and Eiji Mizutani, Slides for Ch. 5 of “Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence”, First Edition, Prentice Hall, 1997. Djamel Bouchaffra. Soft Computing. Course materials. Oakland University. Fall 2005 Lucidi delle lezioni, Soft Computing. Materiale Didattico. Dipartimento di Elettronica e Informazione. Politecnico di Milano. 2004 Jeen-Shing Wang, Course: Introduction to Neural Networks. Lecture notes. Department of Electrical Engineering. National Cheng Kung University. Fall, 2005
80
80 Sources Carlo Tomasi, Mathematical Methods for Robotics and Vision. Stanford University. Fall 2000 Petros Ioannou, Jing Sun, Robust Adaptive Control. Prentice-Hall, Inc, Upper Saddle River: NJ, 1996 Jonathan Richard Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Edition 11/4. School of Computer Science. Carnegie Mellon University. Pittsburgh. August 4, 1994 Gordon C. Everstine, Selected Topics in Linear Algebra. The GeorgeWashington University. 8 June 2004
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.