of the Artificial Neural Networks.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Backpropagation Learning Algorithm
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
NEURAL NETWORKS Perceptron
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Financial Informatics –XVI: Supervised Backpropagation Learning
Simple Neural Nets For Pattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Radial Basis Functions
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Back-Propagation Algorithm
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Classification Part 3: Artificial Neural Networks
Multiple-Layer Networks and Backpropagation Algorithms
Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Today’s Lecture Neural networks Training
Neural networks.
Multiple-Layer Networks and Backpropagation Algorithms
Fall 2004 Backpropagation CS478 - Machine Learning.
Neural Network Architecture Session 2
Supervised Learning in ANNs
Learning with Perceptrons and Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CS621: Artificial Intelligence
CSC 578 Neural Networks and Deep Learning
Artificial Neural Network & Backpropagation Algorithm
Artificial Intelligence Chapter 3 Neural Networks
Artificial Neural Networks
Neural Network - 2 Mayank Vatsa
Multilayer Perceptron & Backpropagation
Capabilities of Threshold Neurons
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence 12. Two Layer ANNs
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Computer Vision Lecture 19: Object Recognition III
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

of the Artificial Neural Networks. Topic 3. Learning Rules of the Artificial Neural Networks.

Multilayer Perceptron. The first layer is the input layer, and the last layer is the output layer. All other layers with no direct connections from or to the outside are called hidden layers.

Multilayer Perceptron. The input is processed and relayed from one layer to the next, until the final result has been computed. This process represents the feedforward scheme.

Multilayer Perceptron. structural credit assignment problem: when an error is made at the output of a network, how is credit (or blame) to be assigned to neurons deep within the network? One of the most popular techniques to train the hidden neurons is error backpropagation, whereby the error of output units is propagated back to yield estimates of how much a given hidden unit contributed to the output error.

Multilayer Perceptron. The error function of multilayer perceptron: The best performance of the network corresponds to the minimum of the total squared error, and during the network training, we adjust the weights of connections in order to get to that minimum.

Multilayer Perceptron. Combination of the weights, including that of hidden neurons, which minimises the error function E is considered to be a solution of multiple layer perceptron learning problem .

Multilayer Perceptron. The error function of multilayer perceptron: The backpropagation algorithm looks for the minimum of the multi-variable error function E in the space of weights of connections w using the method of gradient descent.

Multilayer Perceptron. Following calculus, a local minimum of a function of two or more variables is defined by equality to zero of its gradient: where is partial derivative of the error function E with respect to the weight of connection between h-th unit in the layer k and t-th unit in the previous layer number k-1.

Multilayer Perceptron. We would like to go in the direction opposite to to most rapidly minimise E. Therefore, during the iterative process of gradient descent each weight of connection, including the hidden ones, is updated: using the increment here C represents the learning rate.

Multilayer Perceptron. where Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the error function E be a differentiable function

Multilayer Perceptron. Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the error function E be a differentiable function, which requires the network output Xjp to be differentiable, which requires the activation functions f(S) to be differentiable: where

Multilayer Perceptron. Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the error function E be a differentiable function, which requires the network output Xjp to be differentiable, which requires the activation functions f(S) to be differentiable: This provides a powerful motivation for using continuous and differentiable activation functions f(w,a). where

Multilayer Perceptron. Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the activation functions f(S) to be differentiable. To make a multiple layer perceptron to be “able to learn” here is a useful generic sigmoid activation function associated with a hidden or output neuron: where

Multilayer Perceptron. Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the activation functions f(S) to be differentiable. To make a multiple layer perceptron to be “able to learn” here is a useful generic sigmoid activation function associated with a hidden or output neuron: Important thing about the generic sigmoid function is that it is differentiable, with a very simple and easy to compute derivative where

Multilayer Perceptron. Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the activation functions f(S) to be differentiable. To make a multiple layer perceptron to be “able to learn” here is a useful generic sigmoid activation function associated with a hidden or output neuron: If all activation functions f(S) in the network are differentiable then, according to the chain rule of calculus, differentiating the error function E with respect to the weight of connection in consideration we can express the corresponding partial derivative of the error function where

Multilayer Perceptron. Then…. where

Multilayer Perceptron. where

Multilayer Perceptron. where Thus, correction to the hidden weight of connection between h-th unit in the k-th layer and t-th unit in the previous (k-1)-th layer can be found by

Multilayer Perceptron Learning rule!!! where The correction is defined by the output layer errors ejp, derivatives of activation functions of all neurons in the upper layers with numbers p > k, derivative of activation function of the neuron h itself in the layer k, activation function of connected neuron t in the previous layer (k-1).

Multilayer Perceptron Learning rule!!! where We can easily measure the output errors of the network, and it is us to define all the activation functions. If we also know the derivatives of the activation functions, then we can easily find all the corrections to weights of connections of all neurons in the network, including the hidden ones, during the second run back through the network.

Multilayer Perceptron Training. The training process of multilayer perceptron consists of two phases. Initial values of the weights of connections set up randomly. Then, during the first, feedforward phase, starting from the input layer and further layer-by-layer, outputs of every unit in the network are computed together with the corresponding derivatives. Figure: Directions of two basic signal flows in multilayer perceptron: forward propagation of function signals and back-propagation of error signals.

Multilayer Perceptron Training. The training process of multilayer perceptron consists of two phases. Initial values of the weights of connections set up randomly. Then, during the first, feedforward phase, starting from the input layer and further layer-by-layer, outputs of every unit in the network are computed together with the corresponding derivatives. In the second, feedback phase corrections to all weights of connections of all units including the hidden ones are computed using the outputs and derivatives computed during the feedforward phase. Figure: Directions of two basic signal flows in multilayer perceptron: forward propagation of function signals and back-propagation of error signals.

Multilayer Perceptron Training. To understand the second, error back-propagation phase of computing corrections to the weights, let us follow an example of a small three-layer perceptron. input layer hidden layer output layer Layer N 1 2 Unit N

Multilayer Perceptron Training. To understand the second, error back-propagation phase of computing corrections to the weights, let us follow an example of a small three-layer perceptron. input layer hidden layer output layer Layer N 1 2 Unit N Suppose that we have found all outputs and corresponding derivatives of activation functions of all computing units including the hidden ones in the network.

Multilayer Perceptron Training. input layer hidden layer output layer Layer N 1 2 Unit N We shall mark values of the layer in consideration, values of the layer previous to the one in consideration,

Multilayer Perceptron Training. input layer hidden layer output layer Layer N 1 2 Unit N Weight of connection between unit number 1 (first lower index) in the output layer (layer number 2 shown as the upper index) and unit number 0 (second lower index) in the previous layer (number 1=2-1) after presentation of a training pattern would have a correction

Multilayer Perceptron Training. input layer hidden layer output layer Layer N 1 2 Unit N Analogously, corrections to all six weights of connections between the output layer and the hidden layer are obtained as

Multilayer Perceptron Training. Corrections to hidden units connections. input layer hidden layer output layer Layer N 1 2 Unit N We shall mark values of the layer in consideration, values of the layer previous to the one in consideration, values of the layers above the one in consideration,

Multilayer Perceptron Training. Corrections to hidden units connections. input layer hidden layer output layer Layer N 1 2 Unit N Weight of connection between unit number 1 (first lower index) in the hidden layer (layer number 1 shown in the upper index) and unit number 0 in the previous input layer (second lower index) would have a correction

Multilayer Perceptron Training. Corrections to hidden units connections. input layer hidden layer output layer Layer N 1 2 Unit N Analogously, for all six weights of connections between the hidden layer and the input layer:

Multilayer Perceptron Training. Corrections to hidden units connections. input layer hidden layer output layer Layer N 1 2 Unit N In this way going backwards through the network, one obtain the corrections to all weights …,

Multilayer Perceptron Training. Corrections to hidden units connections. input layer hidden layer output layer Layer N 1 2 Unit N In this way going backwards through the network, one obtain the corrections to all weights …, then update the weights.

Multilayer Perceptron Training. Corrections to hidden units connections. input layer hidden layer output layer Layer N 1 2 Unit N In this way going backwards through the network, one obtain the corrections to all weights …, then update the weights. After that, with the new weights go forward to get new outputs…

Multilayer Perceptron Training. Corrections to hidden units connections. input layer hidden layer output layer Layer N 1 2 Unit N In this way going backwards through the network, one obtain the corrections to all weights …, then update the weights. After that, with the new weights go forward to get new outputs… Find new error, go backwards and so on…

Multilayer Perceptron Training. In this way going backwards through the network, one obtain the corrections to all weights …, then update the weights. After that, with the new weights go forward to get new outputs… Find new error, go backwards and so on… Hopefully, sooner or later the iterative procedure will come to output with the minimum error, i.e. the absolute minimum of the error function E.

Multilayer Perceptron Training. In this way going backwards through the network, one obtain the corrections to all weights …, then update the weights. After that, with the new weights go forward to get new outputs… Find new error, go backwards and so on… Hopefully, sooner or later the iterative procedure will come to output with the minimum error, i.e. the absolute minimum of the error function E. Unfortunately, as a function of many variables, the error function might have more than one minimum, and one may get not to the absolute minimum but to a relative one.

Multilayer Perceptron Training. Unfortunately, as a function of many variables, the error function might have more than one minimum, and one may get not to the absolute minimum but to a relative one. If it happens, the error function stops to decrease regardless of number of iteration. Some measures must be taken to get out of the function relative minimum, for example, adding small random values, i.e. “noise”, to one or more of the weights. Then the iterative procedure starts from that new point to get to the absolute minimum eventually.

Multilayer Perceptron Training. Finally, after successful training, perceptron is able to produce the desired responses to all input patterns of the training set.

Multilayer Perceptron Training. Finally, after successful training, perceptron is able to produce the desired responses to all input patterns of the training set. Then all the network weights of connections are fixed,

Multilayer Perceptron Training. Finally, after successful training, perceptron is able to produce the desired responses to all input patterns of the training set. Then all the network weights of connections are fixed, and the network is presented with inputs it must “recognise”, i.e. not the training set inputs.

Multilayer Perceptron Training. Finally, after successful training, perceptron is able to produce the desired responses to all input patterns of the training set. Then all the network weights of connections are fixed, and the network is presented with inputs it must “recognise”, i.e. not the training set inputs. If an input in consideration produces an output similar to one of the training set, such input is said to belong to the same type or cluster of inputs as the corresponding one of the training set.

Multilayer Perceptron Training. Then all the network weights of connections are fixed, and the network is presented with inputs it must “recognise”, i.e. not the training set inputs. If an input in consideration produces an output similar to one of the training set, such input is said to belong to the same type or cluster of inputs as the corresponding one of the training set. If the network produces an output not similar to any of the training set, then such an input is said not been recognised.

Multilayer Perceptron Training. Conclusion. In 1969 Minsky and Papert not just found the solution to the XOR problem in a form of multilayer perceptron, they also gave a very thorough mathematical analysis of the time it takes to train such networks. Minsky and Papert emphasized that training times increase very rapidly for certain problems as the number of input lines and weights of connections increases.

Multilayer Perceptron Training. Conclusion. Minsky and Papert emphasized that training times increase very rapidly for certain problems as the number of input lines and weights of connections increases. The difficulties were seized upon by opponents of the subject. In particular, this was true of those working in the field of artificial intelligence (AI), who at that time did not want to concern themselves with the underlying “wetware” of the brain, but only with the functional aspects – regarded by them solely as logical processing. Due to the limitations of funding, competition between AI and neural network communities could have only one victor.

Multilayer Perceptron Training. Conclusion. Due to the limitations of funding, competition between AI and neural network communities could have only one victor. Neural networks then went into a relative quietude for more then fifteen years, with only a few devotees still working on it.

Multilayer Perceptron Training. Conclusion. Due to the limitations of funding, competition between AI and neural network communities could have only one victor. Neural networks then went into a relative quietude for more then fifteen years, with only a few devotees still working on it. Then new vigour came from various sources. One was from the increasing power of computers, allowing simulations of otherwise intractable problems.

Multilayer Perceptron Training. Conclusion. New vigour came from various sources. One was from the increasing power of computers, allowing simulations of otherwise intractable problems. Finally, established by the mid 80s the backpropagation algorithm solved the difficulty of training hidden neurons.

Multilayer Perceptron Training. Conclusion. New vigour came from various sources. One was from the increasing power of computers, allowing simulations of otherwise intractable problems. Finally, established by the mid 80s the backpropagation algorithm solved the difficulty of training hidden neurons. Nowadays, Perceptron is an effective tool for recognising protein and amino-acid sequences and processing other complex biological data.