Backpropagation Training

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Backpropagation CS 478 – Backpropagation.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
The back-propagation training algorithm
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Back-Propagation Algorithm
Improved BP algorithms ( first order gradient method) 1.BP with momentum 2.Delta- bar- delta 3.Decoupled momentum 4.RProp 5.Adaptive BP 6.Trinary BP 7.BP.
Artificial Neural Networks
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Collaborative Filtering Matrix Factorization Approach
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Classification Part 3: Artificial Neural Networks
Artificial Neural Networks
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
Back Propagation and Representation in PDP Networks Psychology 209 February 6, 2013.
Applying Neural Networks Michael J. Watts
BACKPROPAGATION: An Example of Supervised Learning One useful network is feed-forward network (often trained using the backpropagation algorithm) called.
Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
EEE502 Pattern Recognition
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Back Propagation and Representation in PDP Networks
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Back Propagation and Representation in PDP Networks
Applying Neural Networks
The Gradient Descent Algorithm
Learning with Perceptrons and Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSE P573 Applications of Artificial Intelligence Neural Networks
CSC 578 Neural Networks and Deep Learning
Lecture 11. MLP (III): Back-Propagation
Collaborative Filtering Matrix Factorization Approach
CSC 578 Neural Networks and Deep Learning
BACKPROPAGATION Multlayer Network.
Artificial Intelligence 13. Multi-Layer ANNs
cs540 - Fall 2016 (Shavlik©), Lecture 18, Week 10
CSE 573 Introduction to Artificial Intelligence Neural Networks
Backpropagation.
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Back Propagation and Representation in PDP Networks
Neural networks (1) Traditional multi-layer perceptrons
Computer Vision Lecture 19: Object Recognition III
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Presentation transcript:

Backpropagation Training Michael J. Watts   http://mike.watts.net.nz

Lecture Outline Backpropagation training Error calculation Error surface Pattern vs. Batch mode training Restrictions of backprop Problems with backprop

Backpropagation Training Backpropagation of error training Also known as backprop or BP A supervised learning algorithm Outputs of network are compared to desired outputs

Backpropagation Training Error calculated Error propagated backwards over the network Error used to calculate weight changes

Backpropagation Training

Backpropagation Training

Backpropagation Training where: delta is the weight change eta is the learning rate alpha is the momentum term

Backpropagation Training Where Is the change to the weight is the partial derivative of the error E with Respect to the weight w

Error Calculation Several different error measures exist applied over examples or complete training set Sum Squared Error SSE Mean Squared Error MSE

Sum Squared Error SSE Measured over entire data set Error Calculation Sum Squared Error SSE Measured over entire data set

Error Calculation Mean Squared Error measured over individual examples reduces errors over multiple outputs to a single value

Error Surface Plot of network error against weight values Consider network as a function that returns an error Each connection weight is one parameter of this function Low points in surface are local minima Lowest is the global minimum

Error Surface

Error Surface At any time t the network is at one point on the error surface Movement across surface from time t to t+1 is because of BP rule Network moves “downhill” to points of lower error BP rule is like gravity

Error Surface Learning rate is like a multiplier on gravity Determines how fast the network will move downhill Network can get stuck in a dip stuck in a local minimum low local error, but not lowest global error

Error Surface Too low learning rate = slow learning Too high = high chance of getting stuck

Error Surface Momentum is like the momentum of a mass Once in motion, it will keep moving Prevents sudden changes in direction Momentum can carry the network out of a local minimum

Error Surface Not enough momentum = less chance of escaping a local minimum Too much momentum means network can fly out of global minimum

Pattern vs Batch Training Also known as batch and online offline and online Pattern mode applies weight deltas after each example Batch accumulates deltas and applies them all at once

Pattern vs Batch Training Batch mode is closer to true gradient descent requires smaller learning rate Step size is smaller Smooth traversal of the error surface Requires many epochs

Pattern vs Batch Training Pattern mode is easier to implement Requires shuffling of training set Not simple gradient descent Might not take a direct downward path Requires a small step size (learning rate) to avoid getting stuck

Restrictions on Backprop Error and activation functions must be differentiable Hard threshold functions cannot be used e.g. step (Heaviside) function Cannot model discontinuous functions Mixed activation functions cannot be used

Problems with Backprop Time consuming hundreds or thousands of epochs may be required variants of BP address this quickprop Selection of parameters is difficult epochs, learning rate and momentum

Problems with Backprop Local minima can easily get stuck in a locally minimal error Overfitting can overlearn the training data lose the ability to generalise Sensitive to the MLP topology number of connections “free parameters”

Summary Backprop is a supervised learning algorithm Gradient descent reduction of errors Error surface is a multidimensional surface that describes the performance of the network in relation to the weight

Summary Traversal of the error surface is influenced by the learning rate and momentum parameters Pattern vs. Batch mode training difference is when deltas are applied Backprop has some problems local minima Overfitting