Backpropagation.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Beyond Linear Separability
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Kostas Kontogiannis E&CE
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Chapter 7 Supervised Hebbian Learning.
Simple Neural Nets For Pattern Classification
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
PART 5 Supervised Hebbian Learning. Outline Linear Associator The Hebb Rule Pseudoinverse Rule Application.
COMP305. Part I. Artificial neural networks.. Topic 3. Learning Rules of the Artificial Neural Networks.
Image Compression Using Neural Networks Vishal Agrawal (Y6541) Nandan Dubey (Y6279)
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural Networks Lecture 8: Two simple learning algorithms
Supervised Hebbian Learning
Where We’re At Three learning rules  Hebbian learning regression  LMS (delta rule) regression  Perceptron classification.
Neural Networks. Plan Perceptron  Linear discriminant Associative memories  Hopfield networks  Chaotic networks Multilayer perceptron  Backpropagation.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Networks
Advanced information retreival Chapter 02: Modeling - Neural Network Model Neural Network Model.
Artificial Neural Network Yalong Li Some slides are from _24_2011_ann.pdf.
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
7 1 Supervised Hebbian Learning. 7 2 Hebb’s Postulate “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Multi-Layer Perceptron
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
EEE502 Pattern Recognition
IE 585 History of Neural Networks & Introduction to Simple Learning Rules.
1 Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory.
Pattern Associators, Generalization, Processing Psych /719 Feb 6, 2001.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Neural Networks.
The Gradient Descent Algorithm
Learning with Perceptrons and Neural Networks
Advanced information retreival
Real Neurons Cell structures Cell body Dendrites Axon
CSE 473 Introduction to Artificial Intelligence Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
CSE P573 Applications of Artificial Intelligence Neural Networks
Simple learning in connectionist networks
CSE 473 Introduction to Artificial Intelligence Neural Networks
Hebb and Perceptron.
Simple Learning: Hebbian Learning and the Delta Rule
Perceptron as one Type of Linear Discriminants
CSE 573 Introduction to Artificial Intelligence Neural Networks
Backpropagation.
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Artificial neurons Nisheeth 10th January 2019.
Artificial Intelligence 12. Two Layer ANNs
Simple learning in connectionist networks
The McCullough-Pitts Neuron
David Kauchak CS158 – Spring 2019
Supervised Hebbian Learning
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Backpropagation

Last time…

Correlational learning: Hebb rule What Hebb actually said: When an axon of cell A is near enough to excite a cell B and repeatedly and consistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficacy, as one of the cells firing B, is increased. The minimal version of the Hebb rule: When there is a synapse between cell A and cell B, increment the strength of the synapse whenever A and B fire together (or in close succession). The minimal Hebb rule as implemented in a network:

Limitations of Hebbian learning Many association problems it cannot solve Especially where similar input patterns must produce quite different outputs. Without further constraints, weights grow without bound Each weight is learned independently of all others Weight changes are exactly the same from pass to pass.

Predictive error-driven learning with linear units…

1 2 3 w1 w2

Why don’t we just use two-layer perceptrons?

aj = bw + ai * wij i b j w1 bw

1 2 3 w1 w2 b Input 1 Input 2 Output 1 𝑎 3 = 𝑏 𝑤 +𝑎 1 𝑤 1 + 𝑎 2 𝑤 2 1

1 2 3 w1 w2 b Input 1 Input 2 Output 1 𝑎 3 = 𝑏 𝑤 +𝑎 1 𝑤 1 + 𝑎 2 𝑤 2 1

Linear separability problem: with linear units (and thresholded linear units), *no solutions exist* for input/output mappings that are not linearly separable. What if we add an extra layer between input and output?

5 𝑎 5 = 𝑎 3 𝑤 5 + 𝑎 4 𝑤 6 w5 w6 = (𝑎 1 𝑤 1 + 𝑎 2 𝑤 3 ) 𝑤 5 + (𝑎 1 𝑤 2 + 𝑎 2 𝑤 4 ) 𝑤 6 3 4 = 𝑎 1 ( 𝑤 1 𝑤 5 + 𝑤 2 𝑤 6 )+ 𝑎 2 ( 𝑤 3 𝑤 5 + 𝑤 4 𝑤 6 ) = 𝑎 1 𝑤 𝑥 + 𝑎 2 𝑤 𝑦 w2 w3 w1 w4 1 2 Same as a linear network without any hidden layer!

What if we use thresholded units?

5 𝑛𝑒𝑡 𝑗 = 𝑖 𝑎 𝑖 𝑤 𝑖𝑗 w5 w6 If netj > thresh, aj = 1 Else aj = 0 3 4 w2 w3 w1 w4 1 2

𝑛𝑒𝑡 𝑗 = 𝑖 𝑎 𝑖 𝑤 𝑖𝑗 5 If netj > 9.9, aj = 1 Else aj = 0 10 -10 3 4 00 11 01 10 Unit 3 Unit 4 10 10 5 5 1 2

00 11 01 10 Unit 3 Unit 4 Input space 00 11 01 10 Input space Hidden space 11 01 00 10

So with thresholded units and a hidden layer, solutions exist… …and solutions can be viewed as “re-representing” the inputs, so as to make the mapping to the output unit learnable. BUT, how can we learn the correct weights instead of just setting them by hand?

But what if: 𝐸= 1 2 𝑗 ( 𝑡 𝑗 − 𝑎 𝑗 ) 2 𝑑𝐸 𝑑 𝑤 𝑖𝑗 =− ( 𝑡 𝑗 −𝑎 𝑗 )× 𝑎 𝑖 𝑑𝐸 𝑑 𝑤 𝑖𝑗 = 𝑑𝐸 𝑑 𝑎 𝑗 𝑑 𝑎 𝑗 𝑑 𝑤 𝑖𝑗 Simple delta rule: ∆ 𝑤 𝑖𝑗 =𝛼( 𝑡 𝑗 − 𝑎 𝑗 )× 𝑎 𝑖 𝑎 𝑗 = 𝑖 𝑎 𝑖 𝑤 𝑖𝑗 𝑛𝑒𝑡 𝑗 = 𝑖 𝑎 𝑖 𝑤 𝑖𝑗 𝑎 𝑗 =𝑓( 𝑛𝑒𝑡 𝑗 ) 𝑑𝐸 𝑑 𝑤 𝑖𝑗 = 𝑑𝐸 𝑑 𝑎 𝑗 𝑑 𝑎 𝑗 𝑑 𝑛𝑒𝑡 𝑗 𝑑 𝑛𝑒𝑡 𝑗 𝑑 𝑤 𝑖𝑗 …What function should we use for aj?

Can’t use a threshold function---why not? Derivative = 0 here Derivative is infinite here Activation Net i

𝑎 𝑗 = 1 1+ 𝑒 − 𝑛𝑒𝑡 𝑗 Sigmoid function: Net input Change in activation 1.00 0.90 0.80 0.70 Change in activation 0.60 0.50 0.40 Activation 0.30 0.20 0.10 0.00 -10 -5 5 10 Net input

𝐸= 1 2 𝑗 ( 𝑡 𝑗 − 𝑎 𝑗 ) 2 𝑑𝐸 𝑑 𝑤 𝑖𝑗 =− ( 𝑡 𝑗 −𝑎 𝑗 )× 𝑎 𝑖 𝑑𝐸 𝑑 𝑤 𝑖𝑗 = 𝑑𝐸 𝑑 𝑎 𝑗 𝑑 𝑎 𝑗 𝑑 𝑤 𝑖𝑗 Simple delta rule: ∆ 𝑤 𝑖𝑗 =𝛼( 𝑡 𝑗 − 𝑎 𝑗 )× 𝑎 𝑖 𝑎 𝑗 = 𝑖 𝑎 𝑖 𝑤 𝑖𝑗 𝑎 𝑗 = 1 1+ 𝑒 − 𝑛𝑒𝑡 𝑗 𝑑 𝑎 𝑗 𝑑 𝑛𝑒𝑡 𝑗 = 𝑎 𝑗 (1− 𝑎 𝑗 ) 𝑑𝐸 𝑑 𝑤 𝑖𝑗 = 𝑑𝐸 𝑑 𝑎 𝑗 𝑑 𝑎 𝑗 𝑑 𝑛𝑒𝑡 𝑗 𝑑 𝑛𝑒𝑡 𝑗 𝑑 𝑤 𝑖𝑗 𝑑𝐸 𝑑 𝑤 𝑖𝑗 =− ( 𝑡 𝑗 −𝑎 𝑗 )× 𝑎 𝑗 (1− 𝑎 𝑗 )×𝑎 𝑖 ∆ 𝑤 𝑖𝑗 =𝛼 ( 𝑡 𝑗 −𝑎 𝑗 )× 𝑎 𝑗 (1− 𝑎 𝑗 )×𝑎 𝑖

𝛿 5 𝛿 5 * w5 𝑑𝐸 𝑑 𝑤 5 = 𝑑𝐸 𝑑 𝑎 5 𝑑 𝑎 5 𝑑 𝑛𝑒𝑡 5 𝑑 𝑛𝑒𝑡 5 𝑑 𝑤 5 5 𝑑𝐸 𝑑 𝑤 5 = 𝑑𝐸 𝑑 𝑎 5 𝑑 𝑎 5 𝑑 𝑛𝑒𝑡 5 𝑑 𝑛𝑒𝑡 5 𝑑 𝑤 5 5 𝛿 5 𝑛𝑒𝑡 𝑗 = 𝑖 𝑎 𝑖 𝑤 𝑖𝑗 𝑑𝐸 𝑑 𝑤 1 w5 w6 3 4 w2 w3 𝑑 𝑛𝑒𝑡 5 𝑑 𝑎 3 = 𝑤 5 𝑑𝐸 𝑑 𝑎 3 = 𝛿 5 * w5 w1 w4 1 2 𝑑 𝑎 3 𝑑 𝑛𝑒𝑡 3 = 𝑎 3 (1− 𝑎 3 ) 𝑑 𝑛𝑒𝑡 3 𝑑 𝑤 1 = 𝑎 1 𝑑𝐸 𝑑 𝑤 1 = 𝛿 5 × 𝑑 𝑛𝑒𝑡 5 𝑑 𝑎 3 𝑑 𝑎 3 𝑑 𝑛𝑒𝑡 3 𝑑 𝑛𝑒𝑡 3 𝑑 𝑤 1

5 6 Targets For outputs delta computed directly based on error. Delta is stored at each unit and also used directly to adjust each incoming weight. 3 4 5 1 2 6 Output For hidden units, there are no targets; “error” signal is instead the sum of the output unit deltas. These are used to compute deltas for the hidden units, which are again stored with unit and used to directly change incoming weights. Hidden Input Deltas, and hence error signal at output, can propagate backward through network through many layers until it reaches the input.

Alternative error functions.

𝐸= 1 2 (𝑡−𝑎) 2 Sum-squared error: 𝑑𝐸 𝑑𝑎 =−(𝑡−𝑎) 𝑑𝐸 𝑑𝑛𝑒𝑡 =− 𝑡−𝑎 𝑎(1−𝑎) 5 w5 w6 3 4 w2 w3 w1 w4 1 2 𝐸=−𝑡 ln 𝑎 −(1−𝑡) ln (1−𝑎) Cross-entropy error: 𝑑𝐸 𝑑𝑛𝑒𝑡 =𝑎−𝑡

5 w5 w6 3 4 w2 w3 w1 w4 1 2

Input 1 Input 2 New input Output 1 1 3 w1 w2 1 2 2