Gradient Descent 梯度下降法

Slides:



Advertisements
Similar presentations
Instabilities of SVD Small eigenvalues -> m+ sensitive to small amounts of noise Small eigenvalues maybe indistinguishable from 0 Possible to remove small.
Advertisements

Dynamic Time Warping (DTW)
數值方法 2008, Applied Mathematics NDHU 1 Nonlinear systems Newton’s method The steepest descent method.
1 OR II GSLM Outline  some terminology  differences between LP and NLP  basic questions in NLP  gradient and Hessian  quadratic form  contour,
Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.
Optimization of thermal processes
Linear Discriminant Functions
Methods For Nonlinear Least-Square Problems
Newton’s Method applied to a scalar function Newton’s method for minimizing f(x): Twice differentiable function f(x), initial solution x 0. Generate a.
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
An Introduction to Optimization Theory. Outline Introduction Unconstrained optimization problem Constrained optimization problem.
Implementation of Nonlinear Conjugate Gradient Method for MLP Matt Peterson ECE 539 December 10, 2001.

UNCONSTRAINED MULTIVARIABLE
Collaborative Filtering Matrix Factorization Approach
Application of Differential Applied Optimization Problems.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.
Heuristic Optimization Methods Calculus and Optimization Chin-Shiuh Shieh.
Variations on Backpropagation.
Survey of unconstrained optimization gradient based algorithms
Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.
Simulation of Stock Trading J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Linear Classifiers (LC) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
CSCE 441: Computer Graphics Forward/Inverse kinematics
CSIE Dept., National Taiwan Univ., Taiwan
Quadratic Classifiers (QC)
DP for Optimum Strategies in Games
Query by Singing/Humming via Dynamic Programming
Discrete Fourier Transform (DFT)
National Taiwan University
Neural Networks and Its Deep Structures
Outline Soft computing Fuzzy logic and fuzzy inference systems
Computational Optimization
A system of nonlinear equations
Identification of Reduced-Oder Dynamic Models of Gas Turbines
Iterative Non-Linear Optimization Methods
Non-linear Least-Squares
Collaborative Filtering Matrix Factorization Approach
CSCE 441: Computer Graphics Forward/Inverse kinematics
Digital Visual Effects Yung-Yu Chuang
Variations on Backpropagation.
3-3 Optimization with Linear Programming
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
Camera Calibration Using Neural Network for Image-Based Soil Deformation Measurement Systems Zhao, Honghua Ge, Louis Civil, Architectural, and Environmental.
Structure from Motion with Non-linear Least Squares
Instructor :Dr. Aamer Iqbal Bhatti
Introduction to Scientific Computing II
L5 Optimal Design concepts pt A
Introduction to Scientific Computing II
Introduction to Scientific Computing II
Deep Neural Networks (DNN)
Optimization Methods TexPoint fonts used in EMF.
Ch. 20 Genetic Algorithms Genetic Algorithms ...
National Taiwan University
Downhill Simplex Search (Nelder-Mead Method)
EEE 244-8: Optimization.
Applications of Heaps J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept.
Query by Singing/Humming via Dynamic Programming
Introduction to Scientific Computing II
Variations on Backpropagation.
Scientific Computing: Closing 科學計算:結語
Neural Network Training
Gradient Descent 梯度下降法
Game Trees and Minimax Algorithm
What are optimization methods?
Nonlinear Conjugate Gradient Method for Supervised Training of MLP
Structure from Motion with Non-linear Least Squares
Edit Distance 張智星 (Roger Jang)
Presentation transcript:

Gradient Descent 梯度下降法 J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University

Introduction to Gradient Descent (GD) Goal Minimize a function based on gradient Concept Gradient of a multivariate function: Gradient descent: An iterative method to find a local minima of the function or Step size or learning rate

Single-Input Functions If n=1, GD reduces to the problem of going left or right. Example Animation: http://www.onmyphd.com/?p=gradient.descent

Basin of Attraction in 1D Each point/region with zero gradient has a basin of attraction

“Peaks” Functions (1/2) If n=2, GD needs to find a direction in 2D plane. Example: “Peaks” function in MATLAB Animation: gradientDescentDemo.m Gradients is perpendicular to contours, why? 3 local maxima 3 local minima

“Peaks” Functions (2/2) Gradient of the “peaks” function dz/dx = -6*(1-x)*exp(-x^2-(y+1)^2) - 6*(1-x)^2*x*exp(-x^2-(y+1)^2) - 10*(1/5-3*x^2)*exp(-x^2-y^2) + 20*(1/5*x-x^3-y^5)*x*exp(-x^2-y^2) - 1/3*(-2*x-2)*exp(-(x+1)^2-y^2) dz/dy = 3*(1-x)^2*(-2*y-2)*exp(-x^2-(y+1)^2) + 50*y^4*exp(-x^2-y^2) + 20*(1/5*x-x^3-y^5)*y*exp(-x^2-y^2) + 2/3*y*exp(-(x+1)^2-y^2) d(dz/dx)/dx = 36*x*exp(-x^2-(y+1)^2) - 18*x^2*exp(-x^2-(y+1)^2) - 24*x^3*exp(-x^2-(y+1)^2) + 12*x^4*exp(-x^2-(y+1)^2) + 72*x*exp(-x^2-y^2) - 148*x^3*exp(-x^2-y^2) - 20*y^5*exp(-x^2-y^2) + 40*x^5*exp(-x^2-y^2) + 40*x^2*exp(-x^2-y^2)*y^5 -2/3*exp(-(x+1)^2-y^2) - 4/3*exp(-(x+1)^2-y^2)*x^2 -8/3*exp(-(x+1)^2-y^2)*x

Basin of Attraction in 2D Each point/region with zero gradient has a basin of attraction

Justification for using momentum terms Rosenbrock Function Rosenbrock function More about this function Animation: http://www.onmyphd.com/?p=gradient.descent Document on how to optimize this function Justification for using momentum terms

Properties of Gradient Descent No guarantee for global optimum Feasible for differentiable objective functions Performance depends on Start point Step size Variants Use momentum term to reduce zig-zag paths Use line minimization at each iteration Other optimization schemes Conjugate gradient descent Gauss-Newton method Levenberg-Marquardt method

Gauss-Newton Method Synonyms Concept: Linearization method Extended Kalman filter method Concept: General nonlinear model: y = f(x, q) linearization at q = qnow: y = f(x, qnow)+a1(q1 - q1,now)+a2(q2 - q2,now) + ... LSE solution: qnext = qnow + h(ATA)-1ATB

Levenberg-Marquardt Method Formula qnext = qnow + h(ATA+lI)-1ATB Effects of l l small  Gauss-Newton method l big  Gradient descent How to update l Greedy policy  Make l small Cautious policy  Make l big

Comparisons Steepest descent (SD) Hybrid learning (SD+LSE) treat all parameters as nonlinear Hybrid learning (SD+LSE) distinguish between linear and nonlinear Gauss-Newton (GN) method linearize and treat all parameters as linear Levenberg-Marquardt (LM) method switches smoothly between SD and GN

Exercises Can we use gradient descent to find the minimum of f(x)=|x|? What is the gradient of the sigmoid function? What are the basins of attraction of the following curve?