NEURAL NETWORKS Perceptron

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Artificial Intelligence 12. Two Layer ANNs
Perceptron Lecture 4.
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
G53MLE | Machine Learning | Dr Guoping Qiu
Artificial Neural Networks (1)
Perceptron Learning Rule
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Overview over different methods – Supervised Learning
Simple Neural Nets For Pattern Classification
The back-propagation training algorithm
The Perceptron CS/CMPE 333 – Neural Networks. CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS2 The Perceptron – Basics Simplest and one.
20.5 Nerual Networks Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
September 14, 2010Neural Networks Lecture 3: Models of Neurons and Neural Networks 1 Visual Illusions demonstrate how we perceive an “interpreted version”
An Illustrative Example
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
Artificial Neural Network
Neural Networks Lecture 8: Two simple learning algorithms
Artificial neural networks:
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
1 Mehran University of Engineering and Technology, Jamshoro Department of Electronic, Telecommunication and Bio-Medical Engineering Neural Networks Mukhtiar.
Artificial Neural Networks
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Chapter 2 Single Layer Feedforward Networks
EEE502 Pattern Recognition
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
Perceptrons Michael J. Watts
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
1 Neural networks 2. 2 Introduction: Neural networks The nervous system contains 10^12 interconnected neurons.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Chapter 2 Single Layer Feedforward Networks
Artificial neural networks:
第 3 章 神经网络.
Ranga Rodrigo February 8, 2014
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Wed June 12 Goals of today’s lecture. Learning Mechanisms
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Data Mining with Neural Networks (HK: Chapter 7.5)
Neuro-Computing Lecture 4 Radial Basis Function Network
of the Artificial Neural Networks.
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Artificial Intelligence Chapter 3 Neural Networks
The Naïve Bayes (NB) Classifier
Artificial Intelligence Chapter 3 Neural Networks
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Artificial Intelligence Chapter 3 Neural Networks
Computer Vision Lecture 19: Object Recognition III
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

NEURAL NETWORKS Perceptron Dear students, todays topic is the perceptron. The perceptron is the simplest form of a neural network. It consists of a single neuron with adjustable synaptic weights and a hard limiter. The perceptron was introduced by McCulloch and Pitts in 1943 as an artificial neuron with a hard-limiting activation function, . Recently the term multilayer perceptron has often been used as a synonym for the term multilayer feedforward neural network. In this section we will be referring to the former meaning. We will talk about the capabilities of perceptron. Also the learning algorithm known as perveptron learning rule will be explained and illustrated by examples. PROF. DR. YUSUF OYSAL

NEURAL NETWORKS – Perceptron The Perceptron The operation of Rosenblatt’s perceptron is based on the McCulloch and Pitts neuron model. The model consists of a linear combiner followed by a hard limiter. The weighted sum of the inputs is applied to the hard limiter. (Step or sign function) Linear Combiner Hard Limiter w1 Output Inputs  Y The operation of Rosenblatt’s perceptron is based on the McCulloch and Pitts neuron model. The model consists of a linear combiner followed by a hard limiter. The weighted sum of the inputs is applied to the hard limiter. (Step or sign function) The neuron computes the weighted sum of the input signals and compares the result with a threshold value, . If the net input is less than the threshold, the neuron output is –1. But if the net input is greater than or equal to the threshold, the neuron becomes activated and its output attains a value +1. This type of activation function is called a sign function. w2  x2 Threshold

NEURAL NETWORKS – Perceptron Exercise x1 x2 Y 1 A neuron uses a step function as its activation function, has threshold -1.5 and has w1 = 0.1, w2 = 0.1. What is the output with the following values of x1 and x2? Can you find weights that will give an OR function? An XOR function? A neuron uses a step function as its activation function, has threshold 0.2 and has w1 = 0.1, w2 = 0.1. What is the output with the following values of x1 and x2? The output is 0 for the first three input pairs, because magnitude of threshold is bigger than the summation, and input of the hard limit function will be negative for these three input pairs. For the last input pair (1, 1) , the output of the summation node will be positive, and then the output will be 1. This single layer perceptron network implements the AND operatör, as you see. Can you find weights that will give an or function? An XOR function? We will now talk about a systematic way to talk about these questions.

Can a single neuron learn a task? NEURAL NETWORKS – Perceptron Can a single neuron learn a task? In 1958, Frank Rosenblatt introduced a training algorithm that provided the first procedure for training a simple ANN: a perceptron. The perceptron is the simplest form of a neural network. It consists of a single neuron with adjustable synaptic weights and a hard limiter. Linear Threshold Units (LTUs) : Perceptron Learning Rule Linearly Graded Units (LGUs) : Widrow-Hoff learning Rule How does the perceptron learn its classification tasks? This is done by making small adjustments in the weights to reduce the difference between the actual and desired outputs of the perceptron. The initial weights are randomly assigned, usually in the range [0.5, 0.5], and then updated to obtain the output consistent with the training examples. In 1958, Frank Rosenblatt introduced a training algorithm that provided the first procedure for training a simple ANN: a perceptron. The perceptron is the simplest form of a neural network. It consists of a single neuron with adjustable synaptic weights and a hard limiter. Linear Threshold Units (LTUs) : Perceptron Learning Rule Linearly Graded Units (LGUs) : Widrow-Hoff learning Rule How does the perceptron learn its classification tasks? This is done by making small adjustments in the weights to reduce the difference between the actual and desired outputs of the perceptron. The initial weights are randomly assigned, usually in the range [0.5, 0.5], and then updated to obtain the output consistent with the training examples.

Perceptron for Classification NEURAL NETWORKS – Perceptron Perceptron for Classification We try to find suitable values for the weights in such a way that the training examples are correctly classified. A single perceptron classifies input patterns, x, into two classes. The input patterns that can be classified by a single perceptron into two distinct classes are called linearly separable patterns. Geometrically, we try to find a hyper-plane that separates the examples of the two classes. In order for the perceptron to perform a desired task, its weights must be properly selected. In general two basic methods can be employed to select a suitable weight vector: By off-line calculation of weights. If the problem is relatively simple it is often possible to calculate the weight vector from the specification of the problem. By learning procedure. The weight vector is determined from a given (training) set of input-output vectors (exemplars) in such a way to achieve the best classification of the training vectors. Once the weights are selected (the encoding process), the perceptron performs its desired task (e.g., pattern classification) (the decoding process). We try to find suitable values for the weights in such a way that the training examples are correctly classified. A single perceptron classifies input patterns, x, into two classes. The input patterns that can be classified by a single perceptron into two distinct classes are called linearly separable patterns. Geometrically, we try to find a hyper-plane that separates the examples of the two classes. In order for the perceptron to perform a desired task, its weights must be properly selected. In general two basic methods can be employed to select a suitable weight vector: By off-line calculation of weights. If the problem is relatively simple it is often possible to calculate the weight vector from the specification of the problem. By learning procedure. The weight vector is determined from a given (training) set of input-output vectors (exemplars) in such a way to achieve the best classification of the training vectors. Once the weights are selected (the encoding process), the perceptron performs its desired task (e.g., pattern classification) (the decoding process).

NEURAL NETWORKS – Perceptron Learning Rules Linear Threshold Units (LTUs) : Perceptron Learning Rule Linearly Graded Units (LGUs) : Widrow-Hoff learning Rule Training Set . . . x1 x2 xm= 1 y1 y2 yn xm-1 w11 w12 w1m w21 w22 w2m wn1 wnm wn2 Goal: d1 d2 dn The learning rules in general have a goal which makes the neural network output converges to the actual output values. We have a desired input output pairs that are used for traininig. The learning process continues when the network produces outputs very close to actual outputs with small errors.

Perceptron Geometric View NEURAL NETWORKS – Perceptron Perceptron Geometric View The equation below describes a (hyper-)plane in the input space consisting of real valued m-dimensional vectors. The plane splits the input space into two regions, each of them describing one class. decision region for C1 x2 decision boundary C1 x1 The equation below describes a (hyper-)plane in the input space consisting of real valued m-dimensional vectors. The plane splits the input space into two regions, each of them describing one class. A linear combination of signals and weights for which the augmented activation potential is zero, in otherwords total electrical signals going through activation function is zero,, describes a decision surface which partitions the input space into two regions. C2 decision region for C2 w1x1 + w2x2 + w0 = 0

Implementing Logic Gates with Single Layer Perceptron NEURAL NETWORKS – Perceptron Implementing Logic Gates with Single Layer Perceptron In each case we have inputs x and outputs y, and need to determine the weights and thresholds. It is easy to find solutions by inspection: NOT X y 0 1 1 0 AND x1 x2 y 0 0 0 0 1 0 1 0 0 1 1 1 OR 0 1 1 1 0 1 NOT AND OR 1 1 1 We can use McCulloch-Pitts neurons to implement the basic logic gates. All we need to do is find the appropriate connection weights and neuron thresholds to produce the right outputs for each set of inputs. We shall see explicitly how one can construct simple networks that perform NOT, AND, and OR. It is then a well known result from logic that we can construct any logical function from these three operations. The resulting networks, however, will usually have a much more complex architecture than a simple Perceptron. We generally want to avoid decomposing complex problems into simple logic gates, by finding the weights and thresholds that work directly in a Perceptron architecture. It is easy to see that there are an infinite number of solutions. Similarly, there are an infinite number of solutions for the NOT, AND and OR networks. w = -1  = 0.5 w1 = 1 w2 = 1  = -1.5 w1 = 1 w2 = 1  = -0.5

Limitations of Simple Perceptrons NEURAL NETWORKS – Perceptron Limitations of Simple Perceptrons We can follow the same procedure for the XOR network: w1 0 + w2 0 +  < 0  > 0 w1 0 + w2 1 +   0 w2   w1 1 + w2 0 +   0 w1   w1 1 + w2 1 +  < 0 w1 + w2 <  Clearly the second and third inequalities are incompatible with the fourth, so there is in fact no solution. We need more complex networks, e.g. that combine together many simple networks, or use different activation/thresholding/transfer functions. It then becomes much more difficult to determine all the weights and thresholds by hand. In the next slides we will see how a neural network can learn these parameters. First, we need to consider what these more complex networks might involve. XOR XOR x1 x2 y 0 0 0 0 1 1 1 0 1 1 1 0 1 We can follow the same procedure for the XOR network: w1 times 0 + w2 times 0 + teta is maller than 0 which means teta is bigger than 0. w1 times 0 + w2 times 1 + teta is greater than or equal to 0 which means w2 is greater than or equal to teta. w1 times 1 + w2 times 0 + teta is greater than or equal to 0 which means w1 is greater than or equal to teta. And finally w1 times 1 + w2 times 1 + teta is smaller than 0 which means w1 + w2 is smaller than teta. Clearly the second and third inequalities are incompatible with the fourth, so there is in fact no solution. We need more complex networks, e.g. that combine together many simple networks, or use different activation/thresholding/transfer functions. It then becomes much more difficult to determine all the weights and thresholds by hand. In the next slides we will see how a neural network can learn these parameters. First, we need to consider what these more complex networks might involve. Linearly Non-Separable

Decision Hyperplanes and Linear Separability NEURAL NETWORKS – Perceptron Decision Hyperplanes and Linear Separability If we have two inputs, then the weights define a decision boundary that is a one dimensional straight line in the two dimensional input space of possible input values. If we have n inputs, the weights define a decision boundary that is an n-1 dimensional hyperplane in the n dimensional input space: w1x1 + w2x2 + ... + wnxn +  = 0 This hyperplane is clearly still linear (i.e. straight/flat) and can still only divide the space into two regions. We still need more complex transfer functions, or more complex networks, to deal with XOR type problems. Problems with input patterns which can be classified using a single hyperplane are said to be linearly separable. Problems (such as XOR) which cannot be classified in this way are said to be non-linearly separable. If we have two inputs, then the weights define a decision boundary that is a one dimensional straight line in the two dimensional input space of possible input values. If we have n inputs, the weights define a decision boundary that is an n-1 dimensional hyperplane in the n dimensional input space: w1x1 + w2x2 + ... + wnxn +  = 0 This hyperplane is clearly still linear (i.e. straight/flat) and can still only divide the space into two regions. We still need more complex transfer functions, or more complex networks, to deal with XOR type problems. Problems with input patterns which can be classified using a single hyperplane are said to be linearly separable. Problems (such as XOR) which cannot be classified in this way are said to be non-linearly separable.

Learning and Generalization NEURAL NETWORKS – Perceptron Learning and Generalization A network will also produce outputs for input patterns that it was not originally set up to classify (shown with question marks). There are two important aspects of the network’s operation to consider: Learning The network must learn decision surfaces from a set of training patterns so that these training patterns are classified correctly. Generalization After training, the network must also be able to generalize, i.e. correctly classify test patterns it has never seen before. Often we wish the network to learn perfectly, and then generalize well. Sometimes, the training data may contain errors (e.g. noise in the experimental determination of the input values, or incorrect classifications). In this case, learning the training data perfectly may make the generalization worse. There is an important tradeoff between learning and generalization that arises quite generally. A network will also produce outputs for input patterns that it was not originally set up to classify (shown with question marks). There are two important aspects of the network’s operation to consider: Learning The network must learn decision surfaces from a set of training patterns so that these training patterns are classified correctly. Generalization After training, the network must also be able to generalize, i.e. correctly classify test patterns it has never seen before. Often we wish the network to learn perfectly, and then generalize well. Sometimes, the training data may contain errors (e.g. noise in the experimental determination of the input values, or incorrect classifications). In this case, learning the training data perfectly may make the generalization worse. There is an important tradeoff between learning and generalization that arises quite generally.

Generalization in Classification NEURAL NETWORKS – Perceptron Generalization in Classification Suppose the task of our network is to learn a classification decision boundary: Our aim is for the network to generalize to classify new inputs appropriately. If we know that the training data contains noise, we don’t necessarily want the training data to be classified totally accurately as that is likely to reduce the generalization ability. Suppose the task of our network is to learn a classification decision boundary: Our aim is for the network to generalize to classify new inputs appropriately. If we know that the training data contains noise, we don’t necessarily want the training data to be classified totally accurately as that is likely to reduce the generalization ability. We can expect the network output to give a better representation of the underlying function if its output curve does not pass through all the data points. Again, allowing a larger error on the training data is likely to lead to better generalization.

Training a Neural Network NEURAL NETWORKS – Perceptron Training a Neural Network Whether our neural network is a simple Perceptron, or a much more complicated multilayer network with special activation functions, we need to develop a systematic procedure for determining appropriate connection weights. The general procedure is to have the network learn the appropriate weights from a representative set of training data. In all but the simplest cases, however, direct computation of the weights is intractable. Instead, we usually start off with random initial weights and adjust them in small steps until the required outputs are produced. We will now look at a brute force derivation of such an iterative learning algorithm for simple Perceptrons. Then, in later lectures, we shall see how more powerful and general techniques can easily lead to learning algorithms which will work for neural networks of any specification we could possibly dream up. Whether our neural network is a simple Perceptron, or a much more complicated multilayer network with special activation functions, we need to develop a systematic procedure for determining appropriate connection weights. The general procedure is to have the network learn the appropriate weights from a representative set of training data. In all but the simplest cases, however, direct computation of the weights is intractable. Instead, we usually start off with random initial weights and adjust them in small steps until the required outputs are produced. We will now look at a brute force derivation of such an iterative learning algorithm for simple Perceptrons. Then, in later lectures, we shall see how more powerful and general techniques can easily lead to learning algorithms which will work for neural networks of any specification we could possibly dream up.

Wnew = Wold + w Perceptron Learning Learning Rate Error (d  y) Input NEURAL NETWORKS – Perceptron Perceptron Learning Wnew = Wold + w Input Error (d  y) Learning Rate For simple Perceptrons performing classification, we have seen that the decision boundaries are hyperplanes, and we can think of learning as the process of shifting around the hyperplanes until each training pattern is classified correctly. Somehow, we need to formalise that process of “shifting around” into a systematic algorithm that can easily be implemented on a computer. The “shifting around” can conveniently be split up into a number of small steps. If the network weights at time t are wij(t) , then the shifting process corresponds to moving them by an amount Delta wij(t) so that at time t+1 we have weights wij (t +1) = wij (t) + Delta wij (t) It is convenient to treat the thresholds as weights as discussed previously, so we don’t need separate equations for them. It has become clear that each case can be written in the form wij new is equal to wij old +  times (target (d) – output (y)) times input This weight update equation is called the Perceptron Learning Rule. The positive parameter  is called the learning rate or step size – it determines how smoothly we shift the decision boundaries.

Convergence of Perceptron Learning NEURAL NETWORKS – Perceptron Converge? Convergence of Perceptron Learning x y .  d + The weight changes wij need to be applied repeatedly – for each weight wij in the network, and for each training pattern in the training set. One pass through all the weights for the whole training set is called one epoch of training. Eventually, when all the network outputs match the targets for all the training patterns, all the wij will be zero and the process of training will cease. We then say that the training process has converged to a solution. It can be shown that if there does exist a possible set of weights for a Perceptron which solves the given problem correctly, then the Perceptron Learning Rule will find them in a finite number of iterations Moreover, it can be shown that if a problem is linearly separable, then the Perceptron Learning Rule will find a set of weights in a finite number of iterations that solves the problem correctly. If the given training set is linearly separable, the learning process will converge in a finite number of steps. 

The Learning Scenario NEURAL NETWORKS – Perceptron Learning is a recursive procedure of modifying weights from a given set of input-output patterns. For a single perceptron, the objective of the learning (encoding) procedure is to find the decision plane, (that is, the related weight vector), which separates two classes of given input-output training vectors. Once the learning is finished, every input vector will be classified into an appropriate class. A single perceptron can classify only the linearly separable patterns. The perceptron learning procedure is an example of a supervised error-correcting learning law.

The Learning Scenario NEURAL NETWORKS – Perceptron Like this Figure, we can identify a current weight vector, w(n), the next weight vector, w(n + 1), and the correct weight vector, w. Related decision planes are orthogonal to these vectors and are depicted as straight lines. During the learning process the current weight vector w(n) is modified in the direction of the current input vector x(n), if the input pattern is misclassified, that is, if the error is non-zero.

The Learning Scenario NEURAL NETWORKS – Perceptron You see the changes in weight vectors in each iterations.

NEURAL NETWORKS – Perceptron The Learning Scenario

NEURAL NETWORKS – Perceptron The Learning Scenario

The Learning Scenario w4 = w3

The Learning Scenario Presenting the perceptron with enough training vectors, the weight vector w(n) will tend to the correct value w. The demonstration is in augmented space. Conceptually, in augmented space, we adjust the weight vector to fit the data. Rosenblatt proved that if input patterns are linearly separable, then the perceptron learning law converges, and the hyperplane separating two classes of input patterns can be determined. This example demonstrated the perceptron learning law. In the example we demonstrated three main steps typical to most of the neural network applications, namely: Preparation of the training patterns. Normally, the training patterns are specific to the particular application. In our example we generate a set of vectors and partition this set with a plane into two linearly separable classes. The set of vectors is split into two halves, one used as a training set, the other as a validation set. Learning. Using the training set, we apply the perceptron learning law as described. Validation. Using the validation set we verify the quality of the learning process.