Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Backpropagation Learning Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Financial Informatics –XVI: Supervised Backpropagation Learning
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Fall 2004 Shreekanth Mandayam ECE Department Rowan University.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2008 Shreekanth Mandayam ECE Department Rowan University.
The back-propagation training algorithm
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2008 Shreekanth Mandayam ECE Department Rowan University.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Fall 2004 Shreekanth Mandayam ECE Department Rowan University.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2010 Shreekanth Mandayam ECE Department Rowan University.
Back-Propagation Algorithm
Chapter 6: Multilayer Neural Networks
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Spring 2002 Shreekanth Mandayam Robi Polikar ECE Department.
Data Mining with Neural Networks (HK: Chapter 7.5)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2006 Shreekanth Mandayam ECE Department Rowan University.
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural networks.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Neural Networks. Plan Perceptron  Linear discriminant Associative memories  Hopfield networks  Chaotic networks Multilayer perceptron  Backpropagation.
Artificial Neural Networks
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Waqas Haider Khan Bangyal. Multi-Layer Perceptron (MLP)
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Appendix B: An Example of Back-propagation algorithm
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 20 Oct 26, 2005 Nanjing University of Science & Technology.
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
Lecture 3 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 3/1 Dr.-Ing. Erwin Sitompul President University
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Back Propagation and Representation in PDP Networks Psychology 209 February 6, 2013.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
BACKPROPAGATION: An Example of Supervised Learning One useful network is feed-forward network (often trained using the backpropagation algorithm) called.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks 2nd Edition Simon Haykin
Perceptrons Michael J. Watts
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CS621: Artificial Intelligence
Synaptic DynamicsII : Supervised Learning
of the Artificial Neural Networks.
Neural Networks Chapter 5
Neural Network - 2 Mayank Vatsa
Backpropagation.
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Artificial Neural Networks / Spring 2002
Presentation transcript:

Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient descent method

5 Multi-Layer Perceptron (MLP, 1) Purpose To introduce MPL and variant techniques developed in training networks Understanding MLP learning

5 Multi-Layer Perceptron (MLP, 1) Topics: XOR problem Credit assignment problem Back-propagation algorithm (one of the central topics of the course)

XOR problem XOR (exclusive OR) problem 0+0=0 1+1=2=0 mod 2 1+0=1 0+1=1 Perceptron does not work here

Credit assignment problem Minsky & Papert (1969) offered solution to XOR problem by combining perceptron unit responses using a second layer of units

Credit assignment problem

    This is a linearly separable problem!

For four point { (-1,1), (-1,-1), (1,1),(1,-1) } It is always linearly separable if we want to have three points in a class

xnxn x1x1 x2x2 Input Output Four-layer networks Hidden layer

Properties of architecture No connections within a layer Each unit is a perceptron

Properties of architecture No connections within a layer No direct connections between input and output layers Each unit is a perceptron

Properties of architecture No connections within a layer No direct connections between input and output layers Fully connected between layers Each unit is a perceptron

Properties of architecture No connections within a layer No direct connections between input and output layers Fully connected between layers Often more than 3 layers Number of output units need not equal number of input units Number of hidden units per layer can be more or less than input or output units Each unit is a perceptron

2  

But how are the weights for units 1 and 2 found when only error is computed for output unit 3? There is no direct error signal for unit 1 and 2!!!!! Credit assignment problem Problem of assigning ‘credit’ or ‘blame’ to individual elements involving in forming overall response of a learning system (hidden units) In neural networks, problem relates to dividing which weights should be altered, by how much and in which direction.

Backpropagation learning algorithm ‘BP’ Solution to credit assignment problem in MLP Rumelhart, Hinton and Williams (1986) BP has two phases: Forward pass phase: computes ‘functional signal’, feedforward propagation of input pattern signals through network

Backpropagation learning algorithm ‘BP’ Solution to credit assignment problem in MLP Rumelhart, Hinton and Williams (1986) BP has two phases: Forward pass phase: computes ‘functional signal’, feedforward propagation of input pattern signals through network Backward pass phase: computes ‘error signal’, propagation of error (difference between actual and desired output values) backwards through network starting at output units

I w(t) W(t) y O I’ll work out the trivial case and the general case is similar Task Data {I, d} to minimize E = (d - o) 2 /2 = [d - f(W(t)y(t) ] 2 /2 = [d - f(W(t)f(w(t)I)) ] 2 /2 Error function at the output unit Weight at time t is w(t)and W(t), intend to find the weight w and W at time t+1 Where y = f(w(t)I), output of the hidden unit

Forward pass phase Suppose that we have w(t), W(t) of time t For given input I, we can calculate y = f(w(t)I) and o = f ( W(t) y ) = f ( W(t) f( w(t) I ) ) Error function of output unit will be E = (d - o) 2 /2 I w(t) W(t) y O

 Backward pass phase I w(t) W(t) y O

 Backward pass phase I w(t) W(t) y O where  = ( d-o ) f’

 I w(t) W(t) y O

Work the learning rule by yourself for three layer case

We will concentrate on three-layer, but could easily generalize to general case I inputs, O outputs, w connections between input and hidden units, W connections between hidden units and output, y is the activity of hidden unit y i (t) = f (  j w ij (t) I j (t) ) at time t = f ( net i (t) ) O i (t) = f (  j W ij (t) y j (t) ) at time t = f ( Net i (t) ) net (t) = network input to the unit at time t

Forward pass Weights are fixed during forward and backward pass at time t 1. Compute values for hidden units 2. compute values for output units IiIi w ji (t) W kj (t) yjyj OkOk

Backward Pass Recall delta rule (Lecture 5), error measure for pattern n is We want to know how to modify weights in order to decrease E where both for hidden units and output units This can be rewritten as product of two terms using chain rule

How error for pattern changes as function of change in network input to unit j How net input to unit j changes as a function of change in weight w both for hidden units and output units Term A Term B

Term A Let Term B:

Combining A+B gives So to achieve gradient descent in E should change weights by w ij (t+1)-w ij (t) =  i (t) I j (n) W ij (t+1)-W ij (t) =  i (t) y j (t)

Now need to find  i (t) and  i (t) for each unit in network --simple recursive method used to compute for each unit for output unit Term 1 = 2( d i (t)-O i (t) ) Term 2 = f’ ( Net i (t) ) since O i (t) = f ( Net i (t) ) Combining term 1 + term 2 gives

For hidden unit: Term 1 E(t) = 1/2  (d k (t) -O k (t) ) 2 = 1/2  [d k (t) - f( Net k (t) ) ] 2 = 1/2  [(d k (t) - f(  W ki (t) y i ] 2 = 1/2  [d k (t) - f(  W ki (t) f( net i (t) )] 2

For hidden unit: Term 1 Term 2 as for output unit f’ ( net i (t) ) Combining 1+2 gives

Backward Pass Weights here can be viewed are providing degree of ‘credit’ or ‘blame’ to hidden units

Summary Weight updates are local output unit hidden unit

Once weight changes are computed for all units, weights are updated at same time (bias included as weights here) We now compute the derivative of the activation function.