Gradient Descent 梯度下降法

Slides:

Advertisements

Similar presentations

Instabilities of SVD Small eigenvalues -> m+ sensitive to small amounts of noise Small eigenvalues maybe indistinguishable from 0 Possible to remove small.

Advertisements

Dynamic Time Warping (DTW)

數值方法 2008, Applied Mathematics NDHU 1 Nonlinear systems Newton’s method The steepest descent method.

1 OR II GSLM Outline  some terminology  differences between LP and NLP  basic questions in NLP  gradient and Hessian  quadratic form  contour,

Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.

Optimization of thermal processes

Linear Discriminant Functions

Methods For Nonlinear Least-Square Problems

Newton’s Method applied to a scalar function Newton’s method for minimizing f(x): Twice differentiable function f(x), initial solution x 0. Generate a.

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.

An Introduction to Optimization Theory. Outline Introduction Unconstrained optimization problem Constrained optimization problem.

Implementation of Nonlinear Conjugate Gradient Method for MLP Matt Peterson ECE 539 December 10, 2001.

UNCONSTRAINED MULTIVARIABLE

Collaborative Filtering Matrix Factorization Approach

Application of Differential Applied Optimization Problems.

1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.

Heuristic Optimization Methods Calculus and Optimization Chin-Shiuh Shieh.

Variations on Backpropagation.

Survey of unconstrained optimization gradient based algorithms

Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.

Simulation of Stock Trading J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.

Linear Classifiers (LC) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.

CSCE 441: Computer Graphics Forward/Inverse kinematics

CSIE Dept., National Taiwan Univ., Taiwan

Quadratic Classifiers (QC)

DP for Optimum Strategies in Games

Query by Singing/Humming via Dynamic Programming

Discrete Fourier Transform (DFT)

National Taiwan University

Neural Networks and Its Deep Structures

Outline Soft computing Fuzzy logic and fuzzy inference systems

Computational Optimization

A system of nonlinear equations

Identification of Reduced-Oder Dynamic Models of Gas Turbines

Iterative Non-Linear Optimization Methods

Non-linear Least-Squares

Collaborative Filtering Matrix Factorization Approach

CSCE 441: Computer Graphics Forward/Inverse kinematics

Digital Visual Effects Yung-Yu Chuang

Variations on Backpropagation.

3-3 Optimization with Linear Programming

Chapter 10. Numerical Solutions of Nonlinear Systems of Equations

Camera Calibration Using Neural Network for Image-Based Soil Deformation Measurement Systems Zhao, Honghua Ge, Louis Civil, Architectural, and Environmental.

Structure from Motion with Non-linear Least Squares

Instructor :Dr. Aamer Iqbal Bhatti

Introduction to Scientific Computing II

L5 Optimal Design concepts pt A

Introduction to Scientific Computing II

Introduction to Scientific Computing II

Deep Neural Networks (DNN)

Optimization Methods TexPoint fonts used in EMF.

Ch. 20 Genetic Algorithms Genetic Algorithms ...

National Taiwan University

Downhill Simplex Search (Nelder-Mead Method)

EEE 244-8: Optimization.

Applications of Heaps J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept.

Query by Singing/Humming via Dynamic Programming

Introduction to Scientific Computing II

Variations on Backpropagation.

Scientific Computing: Closing 科學計算：結語

Neural Network Training

Gradient Descent 梯度下降法

Game Trees and Minimax Algorithm

What are optimization methods?

Nonlinear Conjugate Gradient Method for Supervised Training of MLP

Structure from Motion with Non-linear Least Squares

Edit Distance 張智星 (Roger Jang)

Presentation transcript:

Gradient Descent 梯度下降法 J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University

Introduction to Gradient Descent (GD) Goal Minimize a function based on gradient Concept Gradient of a multivariate function: Gradient descent: An iterative method to find a local minima of the function or Step size or learning rate

Single-Input Functions If n=1, GD reduces to the problem of going left or right. Example Animation: http://www.onmyphd.com/?p=gradient.descent

Basin of Attraction in 1D Each point/region with zero gradient has a basin of attraction

“Peaks” Functions (1/2) If n=2, GD needs to find a direction in 2D plane. Example: “Peaks” function in MATLAB Animation: gradientDescentDemo.m Gradients is perpendicular to contours, why? 3 local maxima 3 local minima

“Peaks” Functions (2/2) Gradient of the “peaks” function dz/dx = -6*(1-x)*exp(-x^2-(y+1)^2) - 6*(1-x)^2*x*exp(-x^2-(y+1)^2) - 10*(1/5-3*x^2)*exp(-x^2-y^2) + 20*(1/5*x-x^3-y^5)*x*exp(-x^2-y^2) - 1/3*(-2*x-2)*exp(-(x+1)^2-y^2) dz/dy = 3*(1-x)^2*(-2*y-2)*exp(-x^2-(y+1)^2) + 50*y^4*exp(-x^2-y^2) + 20*(1/5*x-x^3-y^5)*y*exp(-x^2-y^2) + 2/3*y*exp(-(x+1)^2-y^2) d(dz/dx)/dx = 36*x*exp(-x^2-(y+1)^2) - 18*x^2*exp(-x^2-(y+1)^2) - 24*x^3*exp(-x^2-(y+1)^2) + 12*x^4*exp(-x^2-(y+1)^2) + 72*x*exp(-x^2-y^2) - 148*x^3*exp(-x^2-y^2) - 20*y^5*exp(-x^2-y^2) + 40*x^5*exp(-x^2-y^2) + 40*x^2*exp(-x^2-y^2)*y^5 -2/3*exp(-(x+1)^2-y^2) - 4/3*exp(-(x+1)^2-y^2)*x^2 -8/3*exp(-(x+1)^2-y^2)*x

Basin of Attraction in 2D Each point/region with zero gradient has a basin of attraction

Justification for using momentum terms Rosenbrock Function Rosenbrock function More about this function Animation: http://www.onmyphd.com/?p=gradient.descent Document on how to optimize this function Justification for using momentum terms

Properties of Gradient Descent No guarantee for global optimum Feasible for differentiable objective functions Performance depends on Start point Step size Variants Use momentum term to reduce zig-zag paths Use line minimization at each iteration Other optimization schemes Conjugate gradient descent Gauss-Newton method Levenberg-Marquardt method

Gauss-Newton Method Synonyms Concept: Linearization method Extended Kalman filter method Concept: General nonlinear model: y = f(x, q) linearization at q = qnow: y = f(x, qnow)+a1(q1 - q1,now)+a2(q2 - q2,now) + ... LSE solution: qnext = qnow + h(ATA)-1ATB

Levenberg-Marquardt Method Formula qnext = qnow + h(ATA+lI)-1ATB Effects of l l small  Gauss-Newton method l big  Gradient descent How to update l Greedy policy  Make l small Cautious policy  Make l big

Comparisons Steepest descent (SD) Hybrid learning (SD+LSE) treat all parameters as nonlinear Hybrid learning (SD+LSE) distinguish between linear and nonlinear Gauss-Newton (GN) method linearize and treat all parameters as linear Levenberg-Marquardt (LM) method switches smoothly between SD and GN

Exercises Can we use gradient descent to find the minimum of f(x)=|x|? What is the gradient of the sigmoid function? What are the basins of attraction of the following curve?