September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.

Slides:



Advertisements
Similar presentations
Aula 3 Single Layer Percetron
Advertisements

Multi-Layer Perceptron (MLP)
Perceptron Lecture 4.
G53MLE | Machine Learning | Dr Guoping Qiu
Perceptron Learning Rule
NEURAL NETWORKS Perceptron
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Computer Vision Lecture 18: Object Recognition II
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Separating Hyperplanes
Perceptron.
Intro. ANN & Fuzzy Systems Lecture 8. Learning (V): Perceptron Learning.
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
Simple Neural Nets For Pattern Classification
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
Radial Basis Functions
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
November 19, 2009Introduction to Cognitive Science Lecture 20: Artificial Neural Networks I 1 Artificial Neural Network (ANN) Paradigms Overview: The Backpropagation.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
September 14, 2010Neural Networks Lecture 3: Models of Neurons and Neural Networks 1 Visual Illusions demonstrate how we perceive an “interpreted version”
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Support Vector Machines Classification
Before we start ADALINE
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
Data Mining with Neural Networks (HK: Chapter 7.5)
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
Neural Networks Lecture 17: Self-Organizing Maps
November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Neurons, Neural Networks, and Learning 1. Human brain contains a massively interconnected net of (10 billion) neurons (cortical cells) Biological.
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
1/11 طراحی و آموزش شبکه های عصبی Slide from Dr. M. Pomplun.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
Artificial Intelligence Lecture No. 29 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
From Biological to Artificial Neural Networks Marc Pomplun Department of Computer Science University of Massachusetts at Boston
November 26, 2013Computer Vision Lecture 15: Object Recognition III 1 Backpropagation Network Structure Perceptrons (and many other classifiers) can only.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Chapter 2 Single Layer Feedforward Networks
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
November 20, 2014Computer Vision Lecture 19: Object Recognition III 1 Linear Separability So by varying the weights and the threshold, we can realize any.
Artificial Neural Network
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Start with student evals. What function does perceptron #4 represent?
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Supervised Learning in ANNs
Neural Networks Winter-Spring 2014
Chapter 2 Single Layer Feedforward Networks
Supervised Function Approximation
The Naïve Bayes (NB) Classifier
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Computer Vision Lecture 19: Object Recognition III
Presentation transcript:

September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector pairs, so-called exemplars. Each pair (x, y) consists of an input vector x and a corresponding output vector y. Whenever the network receives input x, we would like it to provide output y. The exemplars thus describe the function that we want to “teach” our network. Besides learning the exemplars, we would like our network to generalize, that is, give plausible output for inputs that the network had not been trained with.

September 21, 2010Neural Networks Lecture 5: The Perceptron 2 Supervised Function Approximation There is a tradeoff between a network’s ability to precisely learn the given exemplars and its ability to generalize (i.e., inter- and extrapolate). This problem is similar to fitting a function to a given set of data points. Let us assume that you want to find a fitting function f:R  R for a set of three data points. You try to do this with polynomials of degree one (a straight line), two, and nine.

September 21, 2010Neural Networks Lecture 5: The Perceptron 3 Supervised Function Approximation Obviously, the polynomial of degree 2 provides the most plausible fit. f(x)x deg. 1 deg. 2 deg. 9

September 21, 2010Neural Networks Lecture 5: The Perceptron 4 Supervised Function Approximation The same principle applies to ANNs: If an ANN has too few neurons, it may not have enough degrees of freedom to precisely approximate the desired function. If an ANN has too few neurons, it may not have enough degrees of freedom to precisely approximate the desired function. If an ANN has too many neurons, it will learn the exemplars perfectly, but its additional degrees of freedom may cause it to show implausible behavior for untrained inputs; it then presents poor ability of generalization. If an ANN has too many neurons, it will learn the exemplars perfectly, but its additional degrees of freedom may cause it to show implausible behavior for untrained inputs; it then presents poor ability of generalization. Unfortunately, there are no known equations that could tell you the optimal size of your network for a given application; there are only heuristics.

September 21, 2010Neural Networks Lecture 5: The Perceptron 5 Evaluation of Networks Basic idea: define error function and measure error for untrained data (testing set)Basic idea: define error function and measure error for untrained data (testing set) Typical: where d is the desired output, and o is the actual output.Typical: where d is the desired output, and o is the actual output. For classification: E = number of misclassified samples/ total number of samplesFor classification: E = number of misclassified samples/ total number of samples

September 21, 2010Neural Networks Lecture 5: The Perceptron 6 The Perceptron x1x1 x2x2 xnxn … W1W1 W2W2 … WnWn f(x 1,x 2,…,x n ) unit i net input signal output threshold 

September 21, 2010Neural Networks Lecture 5: The Perceptron 7 The Perceptron x1x1 x2x2 xnxn … W1W1 W2W2 … WnWn f(x 1,x 2,…,x n ) unit i net input signal output threshold 0 x 0  1 W0W0 W 0 corresponds to -  Here, only the weight vector is adaptable, but not the threshold

September 21, 2010Neural Networks Lecture 5: The Perceptron 8 Perceptron Computation Similar to a TLU, a perceptron divides its n-dimensional input space by an (n-1)-dimensional hyperplane defined by the equation: w 0 + w 1 x 1 + w 2 x 2 + … + w n x n = 0 For w 0 + w 1 x 1 + w 2 x 2 + … + w n x n > 0, its output is 1, and for w 0 + w 1 x 1 + w 2 x 2 + … + w n x n  0, its output is -1. With the right weight vector (w 0, …, w n ) T, a single perceptron can compute any linearly separable function. We are now going to look at an algorithm that determines such a weight vector for a given function.

September 21, 2010Neural Networks Lecture 5: The Perceptron 9 Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly chosen weight vector w 0 ; Let k = 1; while there exist input vectors that are misclassified by w k-1, do Let i j be a misclassified input vector; Let x k = class(i j )  i j, implying that w k-1  x k < 0; Update the weight vector to w k = w k-1 +  x k ; Increment k; end-while;

September 21, 2010Neural Networks Lecture 5: The Perceptron 10 Perceptron Training Algorithm For example, for some input i with class(i) = -1, If w  i > 0, then we have a misclassification. Then the weight vector needs to be modified to w +  w with (w +  w)  i < w  i to possibly improve classification. We can choose  w = -  i, because (w +  w)  i = (w -  i)  i = w  i -  i  i < w  i, and i  i is the square of the length of vector i and is thus positive. If class(i) = 1, things are the same but with opposite signs; we introduce x to unify these two cases.

September 21, 2010Neural Networks Lecture 5: The Perceptron 11 Learning Rate and Termination Terminate when all samples are correctly classified.Terminate when all samples are correctly classified. If the number of misclassified samples has not changed in a large number of steps, the problem could be the choice of learning rate  :If the number of misclassified samples has not changed in a large number of steps, the problem could be the choice of learning rate  : If  is too large, classification may just be swinging back and forth and take a long time to reach the solution;If  is too large, classification may just be swinging back and forth and take a long time to reach the solution; On the other hand, if  is too small, changes in classification can be extremely slow.On the other hand, if  is too small, changes in classification can be extremely slow. If changing  does not help, the samples may not be linearly separable, and training should terminate.If changing  does not help, the samples may not be linearly separable, and training should terminate. If it is known that there will be a minimum number of misclassifications, train until that number is reached.If it is known that there will be a minimum number of misclassifications, train until that number is reached.

September 21, 2010Neural Networks Lecture 5: The Perceptron 12 Guarantee of Success: Novikoff (1963) Theorem 2.1: Given training samples from two linearly separable classes, the perceptron training algorithm terminates after a finite number of steps, and correctly classifies all elements of the training set, irrespective of the initial random non-zero weight vector w 0. Let w k be the current weight vector. We need to prove that there is an upper bound on k.

September 21, 2010Neural Networks Lecture 5: The Perceptron 13 Guarantee of Success: Novikoff (1963) Proof: Assume  = 1, without loss of generality. After k steps of the learning algorithm, the current weight vector is w k = w 0 + x 1 + x 2 + … + x k. (2.1) Since the two classes are linearly separable, there must be a vector of weights w* that correctly classifies them, that is, sgn(w*  i k ) = class(i k ). Multiplying each side of eq. 2.1 with w*, we get: w*  w k = w*  w 0 + w*  x 1 + w*  x 2 + … + w*  x k.

September 21, 2010Neural Networks Lecture 5: The Perceptron 14 Guarantee of Success: Novikoff (1963) w*  w k = w*  w 0 + w*  x 1 + w*  x 2 + … + w*  x k. For each input vector i j, the dot product w*  i j has the same sign as class(i j ). Since the corresponding element of the training sequence x = class(i j )  i j, we can be assured that w*  x = w*  (class(i j )  i j ) > 0. Therefore, there exists an  > 0 such that w*  x i >  for every member x i of the training sequence. Hence: w*  w k > w*  w 0 + k . (2.2)

September 21, 2010Neural Networks Lecture 5: The Perceptron 15 Guarantee of Success: Novikoff (1963) w*  w k > w*  w 0 + k . (2.2) By the Cauchy-Schwarz inequality: |w*  w k | 2  ||w*|| 2  ||w k || 2. (2.3) We may assume that that ||w*|| = 1, since the unit length vector w*/||w*|| also correctly classifies the same samples. Using this assumption and eqs. 2.2 and 2.3, we obtain a lower bound for the square of the length of w k : ||w k || 2 > (w*  w 0 + k  ) 2. (2.4)

September 21, 2010Neural Networks Lecture 5: The Perceptron 16 Guarantee of Success: Novikoff (1963) Since w j = w j-1 + x j, the following upper bound can be obtained for this vector’s squared length: ||w j || 2 = w j  w j = w j-1  w j-1 + 2w j-1  x j + x j  x j = w j-1  w j-1 + 2w j-1  x j + x j  x j = ||w j-1 || 2 + 2w j-1  x j + ||x j || 2 = ||w j-1 || 2 + 2w j-1  x j + ||x j || 2 Since w j-1  x j < 0 whenever a weight change is required by the algorithm, we have: ||w j || 2 - ||w j-1 || 2 < ||x j || 2 Summation of the above inequalities over j = 1, …, k gives an upper bound ||w k || 2 - ||w 0 || 2 < k  max ||x j || 2

September 21, 2010Neural Networks Lecture 5: The Perceptron 17 Guarantee of Success: Novikoff (1963) ||w k || 2 - ||w 0 || 2 < k  max ||x j || 2 Combining this with inequality 2.4: ||w k || 2 > (w*  w 0 + k  ) 2 (2.4) Gives us: (w*  w 0 + k  ) 2 < ||w k || 2 < ||w 0 || 2 + k  max ||x j || 2 Now the lower bound of ||w k || 2 increases at the rate of k 2, and its upper bound increases at the rate of k. Therefore, there must be a finite value of k such that: (w*  w 0 + k  ) 2 > ||w 0 || 2 + k  max ||x j || 2 This means that k cannot increase without bound, so that the algorithm must eventually terminate.