Chapter 2 Single Layer Feedforward Networks

Slides:



Advertisements
Similar presentations
Aula 3 Single Layer Percetron
Advertisements

Multi-Layer Perceptron (MLP)
Perceptron Lecture 4.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
G53MLE | Machine Learning | Dr Guoping Qiu
NEURAL NETWORKS Perceptron
Classification Neural Networks 1
Perceptron.
Overview over different methods – Supervised Learning
Simple Neural Nets For Pattern Classification
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
An Illustrative Example
Chapter 6: Multilayer Neural Networks
Artificial Neural Networks
Before we start ADALINE
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
Data Mining with Neural Networks (HK: Chapter 7.5)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Chapter 2 Single Layer Feedforward Networks
SUPERVISED LEARNING NETWORK
Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
Perceptrons Michael J. Watts
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
One-layer neural networks Approximation problems
Artificial neural networks:
第 3 章 神经网络.
Real Neurons Cell structures Cell body Dendrites Axon
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Wed June 12 Goals of today’s lecture. Learning Mechanisms
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Ch 2. Concept Map ⊂ ⊂ Single Layer Perceptron = McCulloch – Pitts Type Learning starts in Ch 2 Architecture, Learning Adaline : Linear Learning.
CSC 578 Neural Networks and Deep Learning
Data Mining with Neural Networks (HK: Chapter 7.5)
Classification Neural Networks 1
Chapter 3. Artificial Neural Networks - Introduction -
Neuro-Computing Lecture 4 Radial Basis Function Network
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Neural Networks Chapter 5
Neural Network - 2 Mayank Vatsa
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Artificial Intelligence Chapter 3 Neural Networks
Computer Vision Lecture 19: Object Recognition III
Seminar on Machine Learning Rada Mihalcea
Artificial Intelligence Chapter 3 Neural Networks
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Chapter 2 Single Layer Feedforward Networks

Perceptrons By Rosenblatt (1962) For modeling visual perception (retina) A feedforward network of three layers of units: Sensory, Association, and Response Learning occurs only on weights from A units to R units (weights from S units to A units are fixed). Each R unit receives inputs from n A units For a given training sample s:t, change weights between A and R only if the computed output y is different from the target output t (error driven) wSA wAR S A R

Perceptrons A simple perceptron Structure: Sing output node with threshold function n input nodes with weights wi, i = 1 – n To classify input patterns into one of the two classes (depending on whether output = 0 or 1) Example: input patterns: (x1, x2) Two groups of input patterns (0, 0) (0, 1) (1, 0) (-1, -1); (2.1, 0) (0, -2.5) (1.6, -1.6) Can be separated by a line on the (x1, x2) plane x1 - x2 = 2 Classification by a perceptron with w1 = 1, w2 = -1, threshold = 2

Perceptrons Implement threshold by a node x0 Constant output 1 (-1, -1) (1.6, -1.6) Implement threshold by a node x0 Constant output 1 Weight w0 = - threshold A common practice in NN design

Perceptrons Linear separability A set of (2D) patterns (x1, x2) of two classes is linearly separable if there exists a line on the (x1, x2) plane w0 + w1 x1 + w2 x2 = 0 Separates all patterns of one class from the other class A perceptron can be built with 3 input x0 = 1, x1, x2 with weights w0, w1, w2 n dimensional patterns (x1,…, xn) Hyperplane w0 + w1 x1 + w2 x2 +…+ wn xn = 0 dividing the space into two regions Can we get the weights from a set of sample patterns? If the problem is linearly separable, then YES (by perceptron learning)

Examples of linearly separable classes - Logical AND function patterns (bipolar) decision boundary x1 x2 output w1 = 1 -1 -1 -1 w2 = 1 -1 1 -1 w0 = -1 1 -1 -1 1 1 1 -1 + x1 + x2 = 0 - Logical OR function x1 x2 output w1 = 1 -1 -1 -1 w2 = 1 -1 1 1 w0 = 1 1 -1 1 1 1 1 1 + x1 + x2 = 0 x o o o x: class I (output = 1) o: class II (output = -1) x x o x x: class I (output = 1) o: class II (output = -1)

Perceptron Learning The network Training samples Training: Input vector ij (including threshold input = 1) Weight vector w = (w0, w1,…, wn ) Output: bipolar (-1, 1) using the sign node function Training samples Pairs (ij , class(ij)) where class(ij) is the correct classification of ij Training: Update w so that all sample inputs are correctly classified (if possible) If an input ij is misclassified by the current w class(ij) · w · ij < 0 change w to w + Δw so that (w + Δw) · ij is closer to class(ij)

Perceptron Learning Where η > 0 is the learning rate

Perceptron Learning Justification

Perceptron learning convergence theorem Informal: any problem that can be represented by a perceptron can be learned by the learning rule Theorem: If there is a such that for all P training sample patterns , then for any start weight vector , the perceptron learning rule will converge to a weight vector such that for all p ( and may not be the same.) Proof: reading for grad students (Sec. 2.4)

Perceptron Learning Note: It is a supervised learning (class(ij) is given for all sample input ij) Learning occurs only when a sample input misclassified (error driven) Termination criteria: learning stops when all samples are correctly classified Assuming the problem is linearly separable Assuming the learning rate (η) is sufficiently small Choice of learning rate: If η is too large: existing weights are overtaken by Δw = If η is too small (≈ 0): very slow to converge Common choice: η = 1. Non-numeric input: Different encoding schema ex. Color = (red, blue, green, yellow). (0, 0, 1, 0) encodes “green”

Perceptron Learning Learning quality Generalization: can a trained perceptron correctly classify patterns not included in the training samples? Common problem for many NN learning models Depends on the quality of training samples selected. Also to some extent depends on the learning rate and initial weights How can we know the learning is ok? Reserve a few samples for testing

Adaline By Widrow and Hoff (~1960) Adaptive linear elements for signal processing The same architecture of perceptrons Learning method: delta rule (another way of error driven), also called Widrow-Hoff learning rule Try to reduce the mean squared error (MSE) between the net input and the desired out put

Adaline Delta rule Let ij = (i0,j, i1,j,…, in,j ) be an input vector with desired output dj The squared error Its value determined by the weights wl Modify weights by gradient descent approach  Change weights in the opposite direction of

Adaline Learning Algorithm

Adaline Learning Delta rule in batch mode Based on mean squared error over all P samples E is again a function of w = (w0, w1,…, wn ) the gradient of E: Therefore

Adaline Learning Notes: Weights will be changed even if an input is classified correctly E monotonically decreases until the system reaches a state with (local) minimum E (a small change of any wi will cause E to increase). At a local minimum E state, , but E is not guaranteed to be zero (netj != dj) This is why Adaline uses threshold function rather than linear function

Linear Separability Again Examples of linearly inseparable classes - Logical XOR (exclusive OR) function patterns (bipolar) decision boundary x1 x2 output -1 -1 -1 -1 1 1 1 -1 1 1 1 -1 No line can separate these two classes, as can be seen from the fact that the following linear inequality system has no solution because we have w0 < 0 from (1) + (4), and w0 >= 0 from (2) + (3), which is a contradiction o x x: class I (output = 1) o: class II (output = -1)

Why hidden units must be non-linear? Multi-layer net with linear hidden layers is equivalent to a single layer net Because z1 and z2 are linear unit z1 = a1* (x1*v11 + x2*v21) + b1 z1 = a2* (x1*v12 + x2*v22) + b2 nety = z1*w1 + z2*w2 = x1*u1 + x2*u2 + b1+b2 where u1 = (a1*v11+ a2*v12)w1, u2 = (a1*v21 + a2*v22)*w2 nety is still a linear combination of x1 and x2. v11 x1 z1 w1 threshold = 0 v12 Y v21 x2 z2 w2 v22

XOR can be solved by a more complex network with hidden units Threshold = 1 Y z2 z1 x1 x2 2 -2 (-1, -1) (-1, -1) -1 (-1, 1) (-1, 1) 1 (1, -1) (1, -1) 1 (1, 1) (-1, -1) -1 Threshold = 0

Summary Single layer nets have limited representation power (linear separability problem) Error driven seems a good way to train a net Multi-layer nets (or nets with non-linear hidden units) may overcome linear inseparability problem, learning methods for such nets are needed Threshold/step output functions hinders the effort to develop learning methods for multi-layered nets