CHAPTER 5 S TOCHASTIC G RADIENT F ORM OF S TOCHASTIC A PROXIMATION Organization of chapter in ISSO –Stochastic gradient Core algorithm Basic principles.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
CHAPTER 13 M ODELING C ONSIDERATIONS AND S TATISTICAL I NFORMATION “All models are wrong; some are useful.”  George E. P. Box Organization of chapter.
1 A Semiparametric Statistics Approach to Model-Free Policy Evaluation Tsuyoshi UENO (1), Motoaki KAWANABE (2), Takeshi MORI (1), Shin-ich MAEDA (1), Shin.
CHAPTER 3 CHAPTER 3 R ECURSIVE E STIMATION FOR L INEAR M ODELS Organization of chapter in ISSO –Linear models Relationship between least-squares and mean-square.
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
CHAPTER 2 D IRECT M ETHODS FOR S TOCHASTIC S EARCH Organization of chapter in ISSO –Introductory material –Random search methods Attributes of random search.
The loss function, the normal equation,
Financial Informatics –XVI: Supervised Backpropagation Learning
Machine Learning Neural Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Finite Element Method Introduction General Principle
Chapter 5 NEURAL NETWORKS
Stochastic Approximation Neta Shoham. References This Presentation is totally based on the book Introduction to Stochastic Search and Optimization (2003)
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
Chapter 6: Multilayer Neural Networks
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Artificial Neural Networks
Radial Basis Function Networks
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
Introduction to Adaptive Digital Filters Algorithms
Artificial Neural Networks
CHAPTER 14 CHAPTER 14 S IMULATION - B ASED O PTIMIZATION I : R EGENERATION, C OMMON R ANDOM N UMBERS, AND R ELATED M ETHODS Organization of chapter in.
Biointelligence Laboratory, Seoul National University
Natural Gradient Works Efficiently in Learning S Amari (Fri) Computational Modeling of Intelligence Summarized by Joon Shik Kim.
CHAPTER 4 S TOCHASTIC A PPROXIMATION FOR R OOT F INDING IN N ONLINEAR M ODELS Organization of chapter in ISSO –Introduction and potpourri of examples Sample.
Artificial Neural Networks
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 23 Nov 2, 2005 Nanjing University of Science & Technology.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Multi-Layer Perceptron
Akram Bitar and Larry Manevitz Department of Computer Science
CHAPTER 6 STOCHASTIC APPROXIMATION AND THE FINITE-DIFFERENCE METHOD
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Professor : Ming – Shyan Wang Department of Electrical Engineering Southern Taiwan University Thesis progress report Sensorless Operation of PMSM Using.
CHAPTER 11 R EINFORCEMENT L EARNING VIA T EMPORAL D IFFERENCES Organization of chapter in ISSO –Introduction –Delayed reinforcement –Basic temporal difference.
NEURAL NETWORKS LECTURE 1 dr Zoran Ševarac FON, 2015.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
AUTOMATIC CONTROL THEORY II Slovak University of Technology Faculty of Material Science and Technology in Trnava.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Computacion Inteligente Least-Square Methods for System Identification.
A PID Neural Network Controller
Fall 2004 Backpropagation CS478 - Machine Learning.
Chapter 7. Classification and Prediction
The Gradient Descent Algorithm
Learning in Neural Networks
第 3 章 神经网络.
A Simple Artificial Neuron
Student: Hao Xu, ECE Department
Machine Learning Today: Reading: Maria Florina Balcan
Goodfellow: Chap 6 Deep Feedforward Networks
CHAPTER 3 RECURSIVE ESTIMATION FOR LINEAR MODELS
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.
CHAPTER 12 STATISTICAL METHODS FOR OPTIMIZATION IN DISCRETE PROBLEMS
Chapter 8: Generalization and Function Approximation
The loss function, the normal equation,
CHAPTER 11 REINFORCEMENT LEARNING VIA TEMPORAL DIFFERENCES
Presentation transcript:

CHAPTER 5 S TOCHASTIC G RADIENT F ORM OF S TOCHASTIC A PROXIMATION Organization of chapter in ISSO –Stochastic gradient Core algorithm Basic principles Nonlinear regression Connections to LMS –Neural network training –Discrete event dynamic systems –Image processing Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

5-2 Stochastic Gradient Formulation For differentiable L(  ), recall familiar set of p equations and p unknowns for use in finding a minimum   : Above is special case of root-finding problem Suppose cannot observe L(  ) and g(  ) except in presence of noise –Adaptive control (target tracking) –Simulation-based optimization –Etc. unbiased measurementSeek unbiased measurement of  L /  for optimization

5-3 Stochastic Gradient Formulation (Cont’d) Suppose L(  ) = E[Q( ,  V  )] –V represents all random effects –Q( ,  V) represents “observed” cost (noisy measurement of L(  )) Seek a representation where  Q /  is an unbiased measurement of  L /  –Not true when distribution function for V depends on  desiredrepresentationAbove implies that desired representation isnot where p V (  ) is density function for V

5-4 Stochastic Gradient Measurement and Algorithm When density p V (  ) is independent of , is unbiased measurement of  L /  –Above requires derivative – integral interchange in  L /  =  E[Q( ,  V)] /  = E[  Q( ,  V) /   ] to be valid Can use root-finding (Robbins-Monro) SA algorithm to attempt to find   : Unbiased measurement satisfies key convergence conditions of SA (Section 4.3 in ISSO)

5-5 Stochastic Gradient Tendency to Move Iterate in Correct Direction

5-6 Stochastic Gradient and LMS Connections Recall basic linear model from Chapter 3: Consider standard MSE loss: L(  )= –Implies Q = Recall basic LMS algorithm from Chapter 3 Hence LMS is direct application of stochastic gradient SA Proposition 5.1 in ISSO shows how SA convergence theory applies to LMS –Implies convergence of LMS to  

5-7 Neural Networks Neural networks (NNs) are general function approximators Actual output z k represented by a NN according to standard model z k = h( ,  x k ) + v k –h( ,  x k ) represents NN output for input x k and weight values  –v k represents noise Diagram of simple feedforward NN on next slide backpropagationMost popular training method is backpropagation (mean- squared-type loss function) Backpropagation is following stochastic gradient recursion

5-8 Simple Feedforward Neural Network with p = 25 Weight Parameters

5-9 Discrete-Event Dynamic Systems Many applications of stochastic gradient methods in simulation-based optimization Discrete-event dynamic systems frequently modeled by simulation –Trajectories of process are piecewise constant Derivative – integral interchange critical –Interchange not valid in many realistic systems –Interchange condition checked on case-by-case basis Overall approach requires knowledge of inner workings of simulation –Needed to obtain  Q( ,  V) /  –Chapters 14 and 15 of ISSO have extensive discussion of simulation-based optimization

5-10 Image Restoration Aim is to recover true image subject to having recorded image corrupted by noise Common to construct least-squares type problem where H  s represents a convolution of the measurement process (H) and the true pixel-by-pixel image (s) Can be solved by either batch linear regression methods or the LMS/RLS methods Nonlinear measurements need full power of stochastic gradient method –Measurements modeled as Z = F(s, x, V)