Gradient Methods Yaron Lipman May 2003. Preview Background Steepest Descent Conjugate Gradient.

Slides:



Advertisements
Similar presentations
Types and Programming Languages Lecture 13 Simon Gay Department of Computing Science University of Glasgow 2006/07.
Advertisements

Curved Trajectories towards Local Minimum of a Function Al Jimenez Mathematics Department California Polytechnic State University San Luis Obispo, CA
Quantum One: Lecture 9. Graham Schmidt Orthogonalization.
2.3 共轭斜量法 ( Conjugate Gradient Methods) 属于一种迭代法,但如果不考虑计算过程的舍入误 差, CG 算法只用有限步就收敛于方程组的精确解.
Chapter 9: Vector Differential Calculus Vector Functions of One Variable -- a vector, each component of which is a function of the same variable.
ESSENTIAL CALCULUS CH11 Partial derivatives
1 OR II GSLM Outline  some terminology  differences between LP and NLP  basic questions in NLP  gradient and Hessian  quadratic form  contour,
1 Modeling and Optimization of VLSI Interconnect Lecture 9: Multi-net optimization Avinoam Kolodny Konstantin Moiseev.
Support Vector Machines
Optimization of thermal processes
Optimization 吳育德.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Some useful Contraction Mappings  Results for a particular choice of norms.
Direction Set (Powell’s) Methods in Multidimensions
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
Linear Discriminant Functions
Visual Recognition Tutorial
Gradient Methods April Preview Background Steepest Descent Conjugate Gradient.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Numerical Optimization
Function Optimization Newton’s Method. Conjugate Gradients
Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.
1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .
Tutorial 12 Unconstrained optimization Conjugate gradients.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.
Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn
Optimization Methods One-Dimensional Unconstrained Optimization
Unconstrained Optimization Problem
Function Optimization. Newton’s Method Conjugate Gradients Method
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Why Function Optimization ?
Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.
Optimization Methods One-Dimensional Unconstrained Optimization
(MTH 250) Lecture 24 Calculus. Previous Lecture’s Summary Multivariable functions Limits along smooth curves Limits of multivariable functions Continuity.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Computational Optimization
UNCONSTRAINED MULTIVARIABLE
Collaborative Filtering Matrix Factorization Approach
1 Computacion Inteligente Derivative-Based Optimization.
ENCI 303 Lecture PS-19 Optimization 2
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.
Nonlinear programming Unconstrained optimization techniques.
MA Day 25- February 11, 2013 Review of last week’s material Section 11.5: The Chain Rule Section 11.6: The Directional Derivative.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Chapter 10 Minimization or Maximization of Functions.
Lecture Note 2 – Calculus and Probability Shuaiqiang Wang Department of CS & IS University of Jyväskylä
Steepest Descent Method Contours are shown below.
Gradient Methods In Optimization
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
MA Day 34- February 22, 2013 Review for test #2 Chapter 11: Differential Multivariable Calculus.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Feature Generation and Cluster-based Feature Selection.
Function Optimization
Non-linear Minimization
Computational Optimization
Support Vector Machines
Collaborative Filtering Matrix Factorization Approach
15.5 Directional Derivatives
Performance Optimization
Outline Preface Fundamentals of Optimization
Presentation transcript:

Gradient Methods Yaron Lipman May 2003

Preview Background Steepest Descent Conjugate Gradient

Preview Background Steepest Descent Conjugate Gradient

Background Motivation The gradient notion The Wolfe Theorems

Motivation The min(max) problem: But we learned in calculus how to solve that kind of question!

Motivation Not exactly, Functions: High order polynomials: What about function that don ’ t have an analytic presentation: “ Black Box ”

Motivation “ real world ” problem finding harmonic mapping General problem: find global min(max) This lecture will concentrate on finding local minimum.

Background Motivation The gradient notion The Wolfe Theorems

Directional Derivatives: first, the one dimension derivative:

Directional Derivatives : Along the Axes …

Directional Derivatives : In general direction …

Directional Derivatives

In the plane The Gradient: Definition in

The Gradient: Definition

The Gradient Properties The gradient defines (hyper) plane approximating the function infinitesimally

The Gradient properties By the chain rule: (important for later use)

The Gradient properties Proposition 1: is maximal choosing is minimal choosing (intuitive: the gradient point the greatest change direction)

The Gradient properties Proof: (only for minimum case) Assign: by chain rule:

The Gradient properties On the other hand for general v:

The Gradient Properties Proposition 2: let be a smooth function around P, if f has local minimum (maximum) at p then, (Intuitive: necessary for local min(max))

The Gradient Properties Proof: Intuitive:

The Gradient Properties Formally: for any We get:

The Gradient Properties We found the best INFINITESIMAL DIRECTION at each point, Looking for minimum: “ blind man ” procedure How can we derive the way to the minimum using this knowledge?

Background Motivation The gradient notion The Wolfe Theorems

The Wolfe Theorem This is the link from the previous gradient properties to the constructive algorithm. The problem:

The Wolfe Theorem We introduce a model for algorithm: Data: Step 0:set i=0 Step 1:ifstop, else, compute search direction Step 2: compute the step-size Step 3:setgo to step 1

The Wolfe Theorem The Theorem: suppose C1 smooth, and exist continuous function: And, And, the search vectors constructed by the model algorithm satisfy:

The Wolfe Theorem And Then if is the sequence constructed by the algorithm model, then any accumulation point y of this sequence satisfy:

The Wolfe Theorem The theorem has very intuitive interpretation : Always go in decent direction.

Preview Background Steepest Descent Conjugate Gradient

Steepest Descent What it mean? We now use what we have learned to implement the most basic minimization technique. First we introduce the algorithm, which is a version of the model algorithm. The problem:

Steepest Descent Steepest descent algorithm: Data: Step 0:set i=0 Step 1:ifstop, else, compute search direction Step 2: compute the step-size Step 3:setgo to step 1

Steepest Descent Theorem: if is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy: Proof: from Wolfe theorem

Steepest Descent From the chain rule: Therefore the method of steepest descent looks like this:

Steepest Descent

The steepest descent find critical point and local minimum. Implicit step-size rule Actually we reduced the problem to finding minimum: There are extensions that gives the step size rule in discrete sense. (Armijo)

Preview Background Steepest Descent Conjugate Gradient

Modern optimization methods : “ conjugate direction ” methods. A method to solve quadratic function minimization: (H is symmetric and positive definite)

Conjugate Gradient Originally aimed to solve linear problems: Later extended to general functions under rational of quadratic approximation to a function is quite accurate.

Conjugate Gradient The basic idea: decompose the n-dimensional quadratic problem into n problems of 1-dimension This is done by exploring the function in “ conjugate directions ”. Definition: H-conjugate vectors:

Conjugate Gradient If there is an H-conjugate basis then: N problems in 1-dimension (simple smiling quadratic) The global minimizer is calculated sequentially starting from x 0 :