Lecture 8. Learning (V): Perceptron Learning

Slides:



Advertisements
Similar presentations
Perceptron Lecture 4.
Advertisements

Introduction to Neural Networks Computing
G53MLE | Machine Learning | Dr Guoping Qiu
Perceptron Learning Rule
NEURAL NETWORKS Perceptron
also known as the “Perceptron”
Separating Hyperplanes
Intro. ANN & Fuzzy Systems Lecture 8. Learning (V): Perceptron Learning.
Simple Neural Nets For Pattern Classification
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
The Perceptron CS/CMPE 333 – Neural Networks. CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS2 The Perceptron – Basics Simplest and one.
CES 514 – Data Mining Lecture 8 classification (contd…)
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
An Illustrative Example
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Neurons, Neural Networks, and Learning 1. Human brain contains a massively interconnected net of (10 billion) neurons (cortical cells) Biological.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Non-Bayes classifiers. Linear discriminants, neural networks.
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
Lecture 4 Linear machine
Chapter 2 Single Layer Feedforward Networks
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
Intro. ANN & Fuzzy Systems Lecture 38 Mixture of Experts Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Artificial Neural Networks
Lecture 15. Pattern Classification (I): Statistical Formulation
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Dan Roth Department of Computer and Information Science
Chapter 2 Single Layer Feedforward Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Lecture 19. SVM (III): Kernel Formulation
An Introduction to Support Vector Machines
CS621: Artificial Intelligence
Linear Discriminators
Data Mining with Neural Networks (HK: Chapter 7.5)
Pattern Recognition: Statistical and Neural
Lecture 9 MLP (I): Feed-forward Model
CS623: Introduction to Computing with Neural Nets (lecture-2)
Outline Single neuron case: Nonlinear error correcting learning
Lecture 24 Radial Basis Network (I)
Neuro-Computing Lecture 4 Radial Basis Function Network
Perceptron as one Type of Linear Discriminants
Neural Networks Chapter 5
Neural Network - 2 Mayank Vatsa
CS623: Introduction to Computing with Neural Nets (lecture-4)
Perceptrons Introduced in1957 by Rosenblatt
Multi-layer perceptron
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Support Vector Machines
Lecture 18. SVM (II): Non-separable Cases
CS621: Artificial Intelligence
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Artificial Intelligence 9. Perceptron
Chapter - 3 Single Layer Percetron
CS344 : Introduction to Artificial Intelligence
CSE (c) S. Tanimoto, 2007 Neural Nets
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Lecture 8. Learning (V): Perceptron Learning

OUTLINE Perceptron Model Perceptron learning algorithm Convergence of Perceptron learning algorithm Example (C) 2001-2003 by Yu Hen Hu

PERCEPTRON Consists of a single neuron with threshold activation, and binary output values. Net function defines a hyper plane that partitions the feature space into two half spaces. (C) 2001-2003 by Yu Hen Hu

Perceptron Learning Problem Problem Statement: Given training samples D = {(x(k); t(k)); 1  k  K}, t(k)0, 1} or { –1, 1}, find the weight vectors, W such that the number of outputs which match the target value, that is, is maximized. Comment: This corresponds to solving K linear in-equality equations for M unknown variables – A linear programming problem. Example. D = {(1;1), (3;1), (–0.5;–1), (–2;–1)}. 4 inequalities: (1,1): w1•1 + w0 > 0; (3,1): w1•3 + w0 > 0 (–.5,–1): w1•(–.5) + w0 < 0; (–2,–1): w1•(–2) + w0 < 0 (C) 2001-2003 by Yu Hen Hu

Perceptron Example A linear-separable problem: If patterns can be separated by a linear hyper-plane, than the solution space is a non-empty set. (1,1): w1•1 + w0 > 0; (3,1): w1•3 + w0 > 0 (–.5,–1): w1•(–.5) + w0 < 0; (–2,–1): w1•(–2) + w0 < 0 Data Space Solution space (C) 2001-2003 by Yu Hen Hu

Perceptron Learning Rules and Convergence Theorem Perceptron d learning rule: ( > 0: Learning rate) W(k+1) = W(k) +  (t(k) – y(k)) x(k) Convergence Theorem – If (x(k), t(k)) is linearly separable, then W* can be found in finite number of steps using the perceptron learning algorithm. Problems with Perceptron: Can solve only linearly separable problems. May need large number of steps to converge. (C) 2001-2003 by Yu Hen Hu

Proof of Perceptron Learning Theorem Assume w* is the optimal weights, then the reduction of errors in successive iterations are: |w(k+1)–w*|2– |w(k)–w*|2= A  2+ 2  B (*) =  2|x(k)|2 [t(k)–y(k)]2+ 2 [w(k)–w*][t(k)-y(k)]x(k) If y(k) = t(k), RHS of (*) is 0. Hence only consider y(k) ≠ t(k). That is, only t(k)-y(k) = ±1 need to be considered. Thus, A = |x(k)|2> 0. Case I. t(k)=1, y(k)=0  wT(k)x(k) < 0, and (w*)Tx(k) > 0 Case II. t(k)=0, y(k)=1  wT(k)x(k) > 0, and (w*)Tx(k) < 0. In both cases, B < 0. Hence for 0 <  < -2B/A, (*) < 0 (C) 2001-2003 by Yu Hen Hu

Perceptron Learning Example 4 data points: {x1(i), x2(i); t(i); i = 1,2,3,4} = (–1,1;0), (–.8, 1; 1), (0.8, –1; 0), (1, –1; 1). Initialize randomly, say, w(1) = [0.3 –0.5 0.5]T.  = 1 y(1) = sgn([1 –1 1]• w(1)) = sgn(1.3) = 1  t(1) = 0 w(2) = w(1) + 1• (t(1)–y(1)) x(1) = [0.3 –0.5, 0.5] T + 1•(–1)•[1 –1 1] T = [–0.7 0.5 –0.5] T y(2) = sgn([1 –0.8 1]•[–0.7 0.5 –0.5] T) = sgn[–1.6]=0 w(3)=[–0.7 0.5 –0.5] T + 1•(1–0)•[1 –0.8 1] T= [.3 –.3 .5] T y(3), w(4), y(4), • • • can be computed in the same manner. (C) 2001-2003 by Yu Hen Hu

Perceptron Learning Example (a) Initial decision boundary (b) Final decision boundary Perceptron.m (C) 2001-2003 by Yu Hen Hu

Perceptron and Linear Classifier Perceptron can be used as a pattern classifier: For example, sort eggs into medium, large, jumble. Features: weight, length, and diameter A linear classifier forms a (loosely speaking) linear weighted function of feature vector x: g(x) = wTx + wo and them makes a decision based on if g(x)  0. (C) 2001-2003 by Yu Hen Hu

Linear Classifier Example Jumble Egg Classifier decision rule: If w0+ w1weight + w2length > 0 then Jumble egg Let x1: weight, x2: length, then g(x1,x2) = w0+ w1x1+ w2x2= 0 is a hyperplane (straight line in 2D space). PQ (Distance) = [w1 w2] is the normal vector perpendicular to the straight line g(x1, x2) = 0 (C) 2001-2003 by Yu Hen Hu

Limitations of Perceptron If the two classes of feature vectors are linearly separable, then a linear classifier, implemented by a perceptron, can be applied. Question: How about using perceptron to implement the Boolean XOR function that is linearly non-separable! x1 x2 y 1 (C) 2001-2003 by Yu Hen Hu