Lecture 8. Learning (V): Perceptron Learning

Slides:

Advertisements

Similar presentations

Perceptron Lecture 4.

Advertisements

Introduction to Neural Networks Computing

G53MLE | Machine Learning | Dr Guoping Qiu

Perceptron Learning Rule

NEURAL NETWORKS Perceptron

also known as the “Perceptron”

Separating Hyperplanes

Intro. ANN & Fuzzy Systems Lecture 8. Learning (V): Perceptron Learning.

Simple Neural Nets For Pattern Classification

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

The Perceptron CS/CMPE 333 – Neural Networks. CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS2 The Perceptron – Basics Simplest and one.

CES 514 – Data Mining Lecture 8 classification (contd…)

September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.

An Illustrative Example

Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.

Lecture 10: Support Vector Machines

CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Neurons, Neural Networks, and Learning 1. Human brain contains a massively interconnected net of (10 billion) neurons (cortical cells) Biological.

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.

Non-Bayes classifiers. Linear discriminants, neural networks.

Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.

Lecture 4 Linear machine

Chapter 2 Single Layer Feedforward Networks

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.

CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.

Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.

Intro. ANN & Fuzzy Systems Lecture 38 Mixture of Experts Neural Network.

Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

Neural NetworksNN 21 Architecture We consider the architecture: feedforward NN with one layer It is sufficient to study single layer perceptrons with.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.

Artificial Neural Networks

Lecture 15. Pattern Classification (I): Statistical Formulation

LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.

Dan Roth Department of Computer and Information Science

Chapter 2 Single Layer Feedforward Networks

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Lecture 19. SVM (III): Kernel Formulation

An Introduction to Support Vector Machines

CS621: Artificial Intelligence

Linear Discriminators

Data Mining with Neural Networks (HK: Chapter 7.5)

Pattern Recognition: Statistical and Neural

Lecture 9 MLP (I): Feed-forward Model

CS623: Introduction to Computing with Neural Nets (lecture-2)

Outline Single neuron case: Nonlinear error correcting learning

Lecture 24 Radial Basis Network (I)

Neuro-Computing Lecture 4 Radial Basis Function Network

Perceptron as one Type of Linear Discriminants

Neural Networks Chapter 5

Neural Network - 2 Mayank Vatsa

CS623: Introduction to Computing with Neural Nets (lecture-4)

Perceptrons Introduced in1957 by Rosenblatt

Multi-layer perceptron

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Support Vector Machines

Lecture 18. SVM (II): Non-separable Cases

CS621: Artificial Intelligence

Neuro-Computing Lecture 2 Single-Layer Perceptrons

Artificial Intelligence 9. Perceptron

Chapter - 3 Single Layer Percetron

CS344 : Introduction to Artificial Intelligence

CSE (c) S. Tanimoto, 2007 Neural Nets

Outline Announcement Neural networks Perceptrons - continued

Presentation transcript:

Lecture 8. Learning (V): Perceptron Learning

OUTLINE Perceptron Model Perceptron learning algorithm Convergence of Perceptron learning algorithm Example (C) 2001-2003 by Yu Hen Hu

PERCEPTRON Consists of a single neuron with threshold activation, and binary output values. Net function defines a hyper plane that partitions the feature space into two half spaces. (C) 2001-2003 by Yu Hen Hu

Perceptron Learning Problem Problem Statement: Given training samples D = {(x(k); t(k)); 1  k  K}, t(k)0, 1} or { –1, 1}, find the weight vectors, W such that the number of outputs which match the target value, that is, is maximized. Comment: This corresponds to solving K linear in-equality equations for M unknown variables – A linear programming problem. Example. D = {(1;1), (3;1), (–0.5;–1), (–2;–1)}. 4 inequalities: (1,1): w1•1 + w0 > 0; (3,1): w1•3 + w0 > 0 (–.5,–1): w1•(–.5) + w0 < 0; (–2,–1): w1•(–2) + w0 < 0 (C) 2001-2003 by Yu Hen Hu

Perceptron Example A linear-separable problem: If patterns can be separated by a linear hyper-plane, than the solution space is a non-empty set. (1,1): w1•1 + w0 > 0; (3,1): w1•3 + w0 > 0 (–.5,–1): w1•(–.5) + w0 < 0; (–2,–1): w1•(–2) + w0 < 0 Data Space Solution space (C) 2001-2003 by Yu Hen Hu

Perceptron Learning Rules and Convergence Theorem Perceptron d learning rule: ( > 0: Learning rate) W(k+1) = W(k) +  (t(k) – y(k)) x(k) Convergence Theorem – If (x(k), t(k)) is linearly separable, then W* can be found in finite number of steps using the perceptron learning algorithm. Problems with Perceptron: Can solve only linearly separable problems. May need large number of steps to converge. (C) 2001-2003 by Yu Hen Hu

Proof of Perceptron Learning Theorem Assume w* is the optimal weights, then the reduction of errors in successive iterations are: |w(k+1)–w*|2– |w(k)–w*|2= A  2+ 2  B (*) =  2|x(k)|2 [t(k)–y(k)]2+ 2 [w(k)–w*][t(k)-y(k)]x(k) If y(k) = t(k), RHS of (*) is 0. Hence only consider y(k) ≠ t(k). That is, only t(k)-y(k) = ±1 need to be considered. Thus, A = |x(k)|2> 0. Case I. t(k)=1, y(k)=0  wT(k)x(k) < 0, and (w*)Tx(k) > 0 Case II. t(k)=0, y(k)=1  wT(k)x(k) > 0, and (w*)Tx(k) < 0. In both cases, B < 0. Hence for 0 <  < -2B/A, (*) < 0 (C) 2001-2003 by Yu Hen Hu

Perceptron Learning Example 4 data points: {x1(i), x2(i); t(i); i = 1,2,3,4} = (–1,1;0), (–.8, 1; 1), (0.8, –1; 0), (1, –1; 1). Initialize randomly, say, w(1) = [0.3 –0.5 0.5]T.  = 1 y(1) = sgn([1 –1 1]• w(1)) = sgn(1.3) = 1  t(1) = 0 w(2) = w(1) + 1• (t(1)–y(1)) x(1) = [0.3 –0.5, 0.5] T + 1•(–1)•[1 –1 1] T = [–0.7 0.5 –0.5] T y(2) = sgn([1 –0.8 1]•[–0.7 0.5 –0.5] T) = sgn[–1.6]=0 w(3)=[–0.7 0.5 –0.5] T + 1•(1–0)•[1 –0.8 1] T= [.3 –.3 .5] T y(3), w(4), y(4), • • • can be computed in the same manner. (C) 2001-2003 by Yu Hen Hu

Perceptron Learning Example (a) Initial decision boundary (b) Final decision boundary Perceptron.m (C) 2001-2003 by Yu Hen Hu

Perceptron and Linear Classifier Perceptron can be used as a pattern classifier: For example, sort eggs into medium, large, jumble. Features: weight, length, and diameter A linear classifier forms a (loosely speaking) linear weighted function of feature vector x: g(x) = wTx + wo and them makes a decision based on if g(x)  0. (C) 2001-2003 by Yu Hen Hu

Linear Classifier Example Jumble Egg Classifier decision rule: If w0+ w1weight + w2length > 0 then Jumble egg Let x1: weight, x2: length, then g(x1,x2) = w0+ w1x1+ w2x2= 0 is a hyperplane (straight line in 2D space). PQ (Distance) = [w1 w2] is the normal vector perpendicular to the straight line g(x1, x2) = 0 (C) 2001-2003 by Yu Hen Hu

Limitations of Perceptron If the two classes of feature vectors are linearly separable, then a linear classifier, implemented by a perceptron, can be applied. Question: How about using perceptron to implement the Boolean XOR function that is linearly non-separable! x1 x2 y 1 (C) 2001-2003 by Yu Hen Hu