EASy Summer 2004Non-Symbolic AI lec 81 Non-Symbolic AI lecture 8 Backpropagation in a multi-layer perceptron.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Backpropagation Learning Algorithm
NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
The back-propagation training algorithm
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Back-Propagation Algorithm
EASy Summer 2006Non-Symbolic AI lec 91 Non-Symbolic AI lecture 9 Data-mining – using techniques such as Genetic Algorithms and Neural Networks. Looking.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks
Classification Part 3: Artificial Neural Networks
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Multiple-Layer Networks and Backpropagation Algorithms
Artificial Neural Networks
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
BACKPROPAGATION: An Example of Supervised Learning One useful network is feed-forward network (often trained using the backpropagation algorithm) called.
Multi-Layer Perceptron
Linear Discrimination Reading: Chapter 2 of textbook.
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
EEE502 Pattern Recognition
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Neural Networks 2nd Edition Simon Haykin
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Multiple-Layer Networks and Backpropagation Algorithms
Fall 2004 Backpropagation CS478 - Machine Learning.
Learning with Perceptrons and Neural Networks
Non-Symbolic AI lecture 7
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning
Artificial Neural Network & Backpropagation Algorithm
Artificial Intelligence 13. Multi-Layer ANNs
of the Artificial Neural Networks.
Neural Network - 2 Mayank Vatsa
Non-Symbolic AI lecture 7
Neural Networks Geoff Hulten.
Backpropagation David Kauchak CS159 – Fall 2019.
Backpropagation and Neural Nets
Non-Symbolic AI lecture 7
Non-Symbolic AI lecture 9
Artificial Neural Networks / Spring 2002
Principles of Back-Propagation
Presentation transcript:

EASy Summer 2004Non-Symbolic AI lec 81 Non-Symbolic AI lecture 8 Backpropagation in a multi-layer perceptron

EASy Summer 2004Non-Symbolic AI lec 82 Jiggling the weights in each layer When we present the training sets, for each Input we have an actual Output, compared with the Target gives the Error Error E = T – 0 Backprop allows you to use this error-at-output to adjust the weights arriving at the output layer … but then also allows you to calculate the effective error ‘1 layer back’, and use this to adjust the weights arriving there … and so on back-propagating errors through any number of layers

EASy Summer 2004Non-Symbolic AI lec 83 Differentiable sigmoid The trick is the use of a sigmoid as the non-linear transfer function Because this is nicely differentiable – it so happens that

EASy Summer 2004Non-Symbolic AI lec 84 How to do it in practice For the details, consult a suitable textbook e.g. Hertx, Krogh and Palmer “Intro to the Theory of Neural Computation” Addison Wesley 1991 But here is a step by step description of how you have to code it up.

EASy Summer 2004Non-Symbolic AI lec 85 Initial decisions Decide how many inputs you have. Add one more pseudo-input (clamped to value 1) for biases in next layer. Decide how many hidden nodes you have. Add one more, clamped to value 1, for biases to next layer. Probably you just have 1 hidden layer, tho in principle can be more. Decide how many output nodes you have.

EASy Summer 2004Non-Symbolic AI lec 86 WeightsWeights Now you can make vectors holding all the weights. Suppose there are a input units, b hidden units, c output units float i_to_h_wts[a+1][b]; float h_to_o_wts[b+1][c]; The +1 in each case is to account for the biases. Initialise all the weights to small random values.

EASy Summer 2004Non-Symbolic AI lec 87 Presenting the inputs Now we are going to present members of the training set to the network, and see what outputs result – and how good they are. We could present a whole batch from the training set, and calculate how to jiggle the weights on the basis of all the errors Or we can do it incrementally, one at a time. This is what we shall do – present a single input vector, calculate the resulting activations at the hidden and output layers. First this single pass forward.

EASy Summer 2004Non-Symbolic AI lec 88 ActivationsActivations We need vectors(or arrays) to hold the activations at each layer. float inputs[a+1]; inputs[a]=1.0; /* bias */ float hidden[b+1]; hidden[b]=1.0; /* bias */ float outputs[c]; float targets[c]; /* what the outputs should be */ Pick out one of the training set, and set the input values equal to this member. Now calculate activations in hidden layer

EASy Summer 2004Non-Symbolic AI lec 89 Forward to Hidden layer for (i=0;i<b;i++) { sum=0.0; for (j=0;j<a+1;j++) sum += inputs[j] * i_to_h_wts[j][i]; hidden[i] = sigmoid(sum); } Using a sigmoid function you have written to calculate

EASy Summer 2004Non-Symbolic AI lec 810 Forward to Output layer for (i=0;i<c;i++) { sum=0.0; for (j=0;j<b+1;j++) sum += hidden[j] * h_to_o_wts[j][i]; output[i] = sigmoid(sum); }

EASy Summer 2004Non-Symbolic AI lec 811 End of forward pass That has got us all the way forward.

EASy Summer 2004Non-Symbolic AI lec 812 Calculating delta’s Now for all the nodes, in all bar the first input layer, we are going to calculate deltas (appropriate basis for changes at those nodes). float delta_hidden[b+1]; float delta_outputs[c]; The underlying formula used is Which conveniently is

EASy Summer 2004Non-Symbolic AI lec 813 Deltas on output layer for (j=0;j<c;j++) delta_outputs[j] = outputs[j]*(1.0 – outputs[j]) * (target[j] – outputs[j]); Store these deltas for the final output layer, and also use this to propagate the errors backward (using the weights on the connections) through to the hidden layer … …

EASy Summer 2004Non-Symbolic AI lec 814 Deltas on Hidden layer for (j=0;j<b+1;j++) { error = 0.0; for (k=0;k<c;k++) error += h_to_o_wts[j][k] * delta_outputs[k]; delta_hidden[j] = hidden[j] * (1.0 – hidden[j]) * error; }

EASy Summer 2004Non-Symbolic AI lec 815 End of backward pass That has got us all the way backward, calculating deltas.

EASy Summer 2004Non-Symbolic AI lec 816 Now the weight-changes OK, we have calculated the errors at all the nodes, including hidden nodes. Let’s use these to calculate weight changes everywhere – using a learning rate parameter eta g float delta_i_to_h_wts[a+1][b]; float delta_h_to_o_wts[b+1][c];

EASy Summer 2004Non-Symbolic AI lec 817 Calculate the weight-changes for (j=0; j<c; j++) for (k=0; k<b+1; k++) { delta_h_to_o_wts[k][j] = eta * delta_outputs[j] * hidden[k]; h_to_o_wts[k][j] += delta_h_to_o_wts[k][j]; That gives new values for all the hidden-to-output weights

EASy Summer 2004Non-Symbolic AI lec 818 Calculate the weight-changes (2) for (j=0; j<b; j++) for (k=0; k<a+1; k++) { delta_i_to_h_wts[k][j] = eta * delta_hidden[j] * inputs[k]; i_to_h_wts[k][j] += delta_i_to_h_wts[k][j]; That gives new values for all the inputs-to-hidden weights

EASy Summer 2004Non-Symbolic AI lec 819 How big is eta ? For example, eta = 0.02 is a common rate to use.

EASy Summer 2004Non-Symbolic AI lec 820 End of forward and backward pass

EASy Summer 2004Non-Symbolic AI lec 821 Repeat many times That was a single pass, based on a single member of the training set, and making small jiggles in the weights (based on the learning rate eta, e.g. g = 1.0) Repeat this lots of times for different members of the training set, indeed going back and using each member many times – each time making a small change in the weights. Eventually (fingers crossed) the errors (Target – Output) will get satisfactorily small, and unless it has over-fitted the training set, the Black Box should generalise to unseen test data.

EASy Summer 2004Non-Symbolic AI lec 822 Wrapping up in a program I presented the things-to-do in a pragmatic order, but for writing a program you have to wrap it up a bit differently. Define all your weights, activations, deltas as arrays of floats (or doubles) first. Define your sigmoid function. Write functions for a pass forward, and a pass backward. Write a big loop that goes over many presentations of inputs. All this is left as an exercise for the reader.

EASy Summer 2004Non-Symbolic AI lec 823 Problems ? If there are, for instance, 100 weights to be jiggled around, then backprop is equivalent to gradient descent on a 100-dimensional error surface – like a marble rolling down towards the basin of minimum error. (there are other methods, e.g. conjugate gradient descent, that might be faster). What about worries that ‘the marble may get trapped in a local optimum’? Actually, that rarely happens, though another problem is more frequent.

EASy Summer 2004Non-Symbolic AI lec 824 ValleysValleys Using the marble metaphor, there may well be valleys like this, with steep sides and a gently sloping floor. Gradient descent tends to waste time swooshing up and down each side of the valley (think marbles!)

EASy Summer 2004Non-Symbolic AI lec 825 MomentumMomentum If you can add a momentum term, that tends to cancel out the back-and- forth movements and emphasise any consistent direction, then this will go down such valleys with gentle bottom-slopes much more successfully – faster.

EASy Summer 2004Non-Symbolic AI lec 826 Implementing momentum This means keeping track of all the delta_weight values from the last pass, and making the new value of each delta_weight basically fairly similar to the previous value – I.e. give it momentum or ‘reluctance to change’. Look back a few slides to ‘Calculate the weight changes’ where I put a purple arrow Substitute delta_wts[k][j] = eta * delta_outputs[j] * hidden[k] + alpha * old_delta_wts[k][j];

EASy Summer 2004Non-Symbolic AI lec 827 Value for momentum Alpha is the momentum factor, between 0 and 1. Typical value to use is 0.9. So common parameter setting rates (to be changed when appropriate) are: Learning rate eta 0.02, momentum alpha 0.9

EASy Summer 2004Non-Symbolic AI lec 828 ApplicationsApplications You know have all the knowledge to write a back-prop program to tackle the most complex application – e.g. NETtalk (Sejnowski and Rosenberg 1986)

EASy Summer 2004Non-Symbolic AI lec 829 Exercise for the reader Construct an ANN with 2 input nodes, 2 hidden nodes and one output node. Use binary inputs – so there are 4 possible input vectors. Train it on the XOR rule so that for these Inputs there are these Target outputs

EASy Summer 2004Non-Symbolic AI lec 830 Any Questions ?