Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Instabilities of SVD Small eigenvalues -> m+ sensitive to small amounts of noise Small eigenvalues maybe indistinguishable from 0 Possible to remove small.
Slides from: Doug Gray, David Poole
Optimization.
Experimental Design, Response Surface Analysis, and Optimization
Introducción a la Optimización de procesos químicos. Curso 2005/2006 BASIC CONCEPTS IN OPTIMIZATION: PART II: Continuous & Unconstrained Important concepts.
Optimization methods Review
Ch11 Curve Fitting Dr. Deshi Ye
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The FIR Adaptive Filter The LMS Adaptive Filter Stability and Convergence.
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Jonathan Richard Shewchuk Reading Group Presention By David Cline
Performance Optimization
Numerical Optimization
Optimization Methods One-Dimensional Unconstrained Optimization
Prediction and model selection
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
Advanced Topics in Optimization
D Nagesh Kumar, IIScOptimization Methods: M2L3 1 Optimization using Calculus Optimization of Functions of Multiple Variables: Unconstrained Optimization.
Why Function Optimization ?
Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.
Tier I: Mathematical Methods of Optimization
Chapter 10 Real Inner Products and Least-Square (cont.)
Adaptive Signal Processing
Normalised Least Mean-Square Adaptive Filtering

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
UNCONSTRAINED MULTIVARIABLE
Collaborative Filtering Matrix Factorization Approach
Radial Basis Function Networks
Chapter 17 Boundary Value Problems. Standard Form of Two-Point Boundary Value Problem In total, there are n 1 +n 2 =N boundary conditions.
ENCI 303 Lecture PS-19 Optimization 2
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Nonlinear programming Unconstrained optimization techniques.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION ASEN 5070 LECTURE 11 9/16,18/09.
Application of Matrix Differential Calculus for Optimization in Statistical Algorithm By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
Chapter 2-OPTIMIZATION
Variations on Backpropagation.
Signal & Weight Vector Spaces
Performance Surfaces.
Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Computacion Inteligente Least-Square Methods for System Identification.
D Nagesh Kumar, IISc Water Resources Systems Planning and Management: M2L2 Introduction to Optimization (ii) Constrained and Unconstrained Optimization.
Chapter 7. Classification and Prediction
Widrow-Hoff Learning (LMS Algorithm).
Collaborative Filtering Matrix Factorization Approach
Variations on Backpropagation.
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
Optimization Part II G.Anuradha.
Instructor :Dr. Aamer Iqbal Bhatti
5.2 Least-Squares Fit to a Straight Line
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Variations on Backpropagation.
Neural Network Training
Performance Surfaces.
Performance Optimization
Section 3: Second Order Methods
Presentation transcript:

Chapter 2-OPTIMIZATION G.Anuradha

Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method –Step Size Determination Derivative-free Optimization –Genetic Algorithms –Simulated Annealing –Random Search –Downhill Simplex Search

What is Optimization? Choosing the best element from some set of available alternatives Solving problems in which one seeks to minimize or maximize a real function

Notation of Optimization Optimize y=f(x 1,x 2 ….x n ) subject to g j (x 1,x 2 …x n ) ≤ / ≥ /= b j where j=1,2,….n Eqn:1 is objective function Eqn:2 a set of constraints imposed on the solution. x 1,x 2 …x n are the set of decision variables Note:- The problem is either to maximize or minimize the value of objective function.

Complicating factors in optimization 1.Existence of multiple decision variables 2.Complex nature of the relationships between the decision variables and the associated income 3.Existence of one or more complex constraints on the decision variables

Types of optimization Constraint:- Solution is arrived at by maximizing or minimizing the objective function Unconstraint:- No constraints are imposed on the decision variables and differential calculus can be used to analyze them Examples

Least Square Methods for System Identification System Identification:- Determining a mathematical model for an unknown system by observing the input-output data pairs System identification is required –To predict a system behavior –To explain the interactions and relationship between inputs and outputs –To design a controller System identification –Structure identification –Parameter identification

Structure identification Apply a priori knowledge about the target system to determine a class of models within which the search for the most suitable model is conducted y=f(u;θ) y – model’s output u – Input Vector θ – parameter vector

Parameter Identification Structure of the model is known and optimization techniques are applied to determine the parameter vector θ= θ

Block diagram of parameter identification

Parameter identification An input ui is applied to both the system and the model Difference between the target system’s output yi and model’s output yi is used to update a parameter vector θ to minimize the difference System identification is not a one-pass process; it needs to do both structure and parameter identification repeatedly

Classification of Optimization algorithms Derivative-based algorithms:- Derivative-free algorithms

Characteristics of derivative free algorithm 1.Derivative freeness:- repeated evaluation of objective function 2.Intuitive guidelines:- concepts are based on nature’s wisdom, such as evolution and thermodynamics 3.Slower 4.Flexibility 5.Randomness:- global optimizers 6.Analytic Opacity:-knowledge about them are based on empirical studies 7.Iterative nature:-

Characteristics of derivative free algorithm Stopping condition of iteration:- let k denote an iteration count and fk denote the best objective function obtained at count k. stopping condition depends on –Computation time –Optimization goal; –Minimal Improvement –Minimal relative improvement

Basics of Matrix Manipulation and Calculus

Basics of Matrix Manipulation and Calculus

Gradient of a Scalar Function

Jacobian of a Vector Function

Least Square Estimator Method of least squares is a standard approach to approximate solution of overdetermined systems. Least Squares- Overall solution minimizes the sum of the squares of the errors made in solving every single equation Application—Data Fitting

Types of Least Squares Least Squares –Linear:- It is a linear combination of parameters. –The model may represent a straight line, a parabola or any other linear combination of functions –Non-Linear:- the parameters appear as functions, such as β 2,e βx. If the derivatives are either constant or depend only on the values of the independent variable, the model is linear else non-linear.

Differences between Linear and Non-Linear Least Squares LinearNon-Linear Algorithms Does not require initial values Algorithms Require Initial values Globally concave; Non convergence is not an issue Non convergence is a common issue Normally solved using direct methodsUsually an iterative process Solution is uniqueMultiple minima in the sum of squares Yields unbiased estimates even when errors are uncorrelated with predictor values Yields biased estimates

Linear model Regression Function

Linear model contd… Using matrix notation Where A is a m*n matrix

Due to noise a small amount of error is added

Least Square Estimator

Problem on Least Square Estimator

Derivative Based Optimization Deals with gradient-based optimization techniques, capable of determining search directions according to an objective function’s derivative information Used in optimizing non-linear neuro-fuzzy models, –Steepest descent –Conjugate gradient

First-Order Optimality Condition F x  F x   x +  F x   F x  T xx  =  x +==  x T F x  xx  =  x  2  ++ For small  x: If x* is a minimum, this implies: Ifthen But this would imply that x* is not a minimum. Therefore Since this must be true for every  x,

Second-Order Condition A strong minimum will exist at x* if for any  x ° 0. Therefore the Hessian matrix must be positive definite. A matrix A is positive definite if: A necessary condition is that the Hessian matrix be positive semidefinite. A matrix A is positive semidefinite if: If the first-order condition is satisfied (zero gradient), then for any z ° 0. for any z. This is a sufficient condition for optimality.

Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or

Steepest Descent Choose the next step so that the function decreases: For small changes in x we can approximate F(x): where If we want the function to decrease: We can maximize the decrease by choosing:

Example

Plot

Effect of learning rate More the learning rate the trajectory becomes oscillatory. This will make the algorithm unstable The upper limit for learning rates can be set for quadratic functions

Stable Learning Rates (Quadratic) Stability is determined by the eigenvalues of this matrix. Eigenvalues of [I -  A]. Stability Requirement: ( i - eigenvalue of A)

Example

Newton’s Method Take the gradient of this second-order approximation and set it equal to zero to find the stationary point:

Example

Plot

Conjugate Vectors A set of vectors is mutually conjugate with respect to a positive definite Hessian matrix A if One set of conjugate vectors consists of the eigenvectors of A. (The eigenvectors of symmetric matrices are orthogonal.)

For Quadratic Functions The change in the gradient at iteration k is where The conjugacy conditions can be rewritten This does not require knowledge of the Hessian matrix.

Forming Conjugate Directions Choose the initial search direction as the negative of the gradient. Choose subsequent search directions to be conjugate. where or

Conjugate Gradient algorithm The first search direction is the negative of the gradient. Select the learning rate to minimize along the line. (For quadratic functions.)

Example

Plots Conjugate GradientSteepest Descent

This is used for finding line minimization methods and their stopping criteria –Initial bracketing –Line searches Newton’s method Secant method Sectioning method