Perceptrons Introduced in1957 by Rosenblatt

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Perceptron Lecture 4.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
G53MLE | Machine Learning | Dr Guoping Qiu
Perceptron Learning Rule
NEURAL NETWORKS Perceptron
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Overview over different methods – Supervised Learning
Simple Neural Nets For Pattern Classification
September 14, 2010Neural Networks Lecture 3: Models of Neurons and Neural Networks 1 Visual Illusions demonstrate how we perceive an “interpreted version”
An Illustrative Example
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
1 Introduction to Artificial Neural Networks Andrew L. Nelson Visiting Research Faculty University of South Florida.
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
For Friday No reading Take home exam due Exam 2. For Monday Read chapter 22, sections 1-3 FOIL exercise due.
Chapter 2 Single Layer Feedforward Networks
Artificial Neural Networks Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Perceptrons Michael J. Watts
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Today’s Lecture Neural networks Training
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
CS 388: Natural Language Processing: Neural Networks
Chapter 2 Single Layer Feedforward Networks
CS623: Introduction to Computing with Neural Nets (lecture-5)
Artificial neural networks:
第 3 章 神经网络.
Real Neurons Cell structures Cell body Dendrites Axon
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Wed June 12 Goals of today’s lecture. Learning Mechanisms
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Machine Learning Today: Reading: Maria Florina Balcan
Unsupervised learning
Disadvantages of Discrete Neurons
Neural Networks Advantages Criticism
Chapter 3. Artificial Neural Networks - Introduction -
Neuro-Computing Lecture 4 Radial Basis Function Network
CSE (c) S. Tanimoto, 2004 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Neural Network - 2 Mayank Vatsa
G5AIAI Introduction to AI
CSE (c) S. Tanimoto, 2001 Neural Networks
Recurrent Networks A recurrent network is characterized by
Artificial Intelligence Chapter 3 Neural Networks
CSE (c) S. Tanimoto, 2002 Neural Networks
Artificial Intelligence 12. Two Layer ANNs
Artificial Intelligence Chapter 3 Neural Networks
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Artificial Intelligence Chapter 3 Neural Networks
CS623: Introduction to Computing with Neural Nets (lecture-5)
Computer Vision Lecture 19: Object Recognition III
CSE (c) S. Tanimoto, 2007 Neural Nets
Introduction to Neural Network
David Kauchak CS158 – Spring 2019
CSC 578 Neural Networks and Deep Learning
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Perceptrons Introduced in1957 by Rosenblatt Used for pattern recognition Name is in use both for a particular artificial neuron model and for entire systems built from these neurons Introduced as a model for the visual system Heavily criticized by Minsky and Papert (1969) this caused a recession in ANN-research that lasted for more than a decade, until the advent of BP-learning for MLFF networks (Rumelhart e.a. 1986) and RNN-networks (Hopfield e.a.1982-85) 16-Jan-19 Rudolf Mak TU/e Computer Science

Single-layer Perceptrons A discrete-neuron single-layer perceptron consists of an input layer of n real-valued input nodes (not neurons) an output layer of m neurons the output of a discrete neuron can only have the values zero (non firing) and one (firing) each neuron has a real-valued threshold and fires if and only if its accumulated input exceeds that threshold each connection from an input node j to an output neuron i has a real-valued weight wij It computes a vector function f: Rn ! {0,1}m 16-Jan-19 Rudolf Mak TU/e Computer Science

Questions Since a perceptron with n input nodes and m output nodes computes a function Rn ! {0,1}m, we therefore study the questions: Which functions can be computed? Does there exist a learning method, i.e. is there an algorithm that optimizes the weights? 16-Jan-19 Rudolf Mak TU/e Computer Science

Single-layer Single-output Perceptron We start with the simplest configuration: A single-layer single-output perceptron consists of a single neuron whose output is either zero or one, and is given by -w0 is called the threshold 16-Jan-19 Rudolf Mak TU/e Computer Science

Where do we put the threshold Heaviside function Linear combiner Heaviside + threshold Affine combiner Standard Heaviside 16-Jan-19 Rudolf Mak TU/e Computer Science

Artificial Neuron affine combiner transfer function 16-Jan-19 Synonyms Adder, integrator i.p.v. linear combiner Activation function i.p.v. transfer function squashing function W0 is the threshold also called bias V = Sum wk xk + w0 local field of activation potential affine combiner transfer function 16-Jan-19 Rudolf Mak TU/e Computer Science

Form affine to linear combiners 16-Jan-19 Rudolf Mak TU/e Computer Science

Boolean Function: AND logical geometrical X Y X Æ Y 1 2x + 2y > 3 1 2x + 2y > 3 2x + 2y < 3 16-Jan-19 Rudolf Mak TU/e Computer Science

Boolean Function: OR X Y X Ç Y 1 16-Jan-19 1 16-Jan-19 Rudolf Mak TU/e Computer Science

Boolean Functions: XOR Y X © Y 1 16-Jan-19 Rudolf Mak TU/e Computer Science

Linearly Separable Sets A set X 2 Rn £ {0,1} is called (absolutely) linearly separable if there exists a vector w 2 Rn+1 such that for each pair (x,t) 2 X : A training set X is correctly classified by a perceptron if for each (x,t) 2 X the output of the perceptron with input x is also t. A finite set X can be classified correctly by a one-layer perceptron if and only if it is linearly separable. 16-Jan-19 Rudolf Mak TU/e Computer Science

A Linearly Separable Set (in 2D) 16-Jan-19 Rudolf Mak TU/e Computer Science

Not linearly separable set (in 2D) 16-Jan-19 Rudolf Mak TU/e Computer Science

One-layer Perceptron Learning Since the output neurons of a one-layer perceptron are independent, it suffices to study perceptron with a single output. Consider a finite set also called a training set. We say that such a set X is correctly classified by a perceptron, if for each pair (x,t) in X the output of the perceptron with input x is t. A finite set X can be classified correctly by a one-layer perceptron if and only if it is linearly separable. 16-Jan-19 Rudolf Mak TU/e Computer Science

Perceptron Learning Rule (incremental version) 16-Jan-19 Rudolf Mak TU/e Computer Science

Geometric Interpretation < 0 > 0 The weights are modified such that the angle with the input vector is decreased. 16-Jan-19 Rudolf Mak TU/e Computer Science

Geometric Interpretation The weights are modified such that the angle with the input vector is increased. 16-Jan-19 Rudolf Mak TU/e Computer Science

Perceptron Convergence Theorem Let X be a finite, linearly separable training set. Let the initial weight vector and the learning parameter  be chosen an arbitrary positive number. Then for each infinite sequence of training pairs from X, the sequence of weight vectors obtained by applying the perceptron learning rule converges in a finite number of steps. 16-Jan-19 Rudolf Mak TU/e Computer Science

Proof sketch 1 16-Jan-19 Rudolf Mak TU/e Computer Science

Proof sketch 2 16-Jan-19 Rudolf Mak TU/e Computer Science

Proof sketch 3 16-Jan-19 Rudolf Mak TU/e Computer Science

Remarks The perceptron learning algorithm is a form of reinforcement learning and is due to Rosenblatt By adjusting the weights sufficiently the network may learn the current training vector. Other vectors, however, may be unlearned Although the learning algorithm converges for any positive learning parameter , faster convergence can be obtained by a suitable choice, possible dependent on the observed error Scaling of the input vectors can also be beneficial to the convergence of the algorithm 16-Jan-19 Rudolf Mak TU/e Computer Science

Perceptron Learning Rule (batch version) See lecture notes for a proof of convergence (similar to the incremental version). The inner loop is called training for an epoch (Matlab) 16-Jan-19 Rudolf Mak TU/e Computer Science

Learning by Error Minimization Consider the error function Then the gradient of E (w) is given by Hence the weight updates (batch version) are given by 16-Jan-19 Rudolf Mak TU/e Computer Science

Capacity of One-layer Perceptrons The number of boolean functions of n arguments is 2(2n) Each boolean function defines a dichotomy of the points of an n-dimensional hypercube The number of linear dichotomies Bn of the corner points of the hypercube is bounded by C(2n, n), where C(m, n) is the number of linear dichotomies of m points in Rn (in general position) which is given by Results should be known. Derivation not. No questions about section 2.3 in the exam. 16-Jan-19 Rudolf Mak TU/e Computer Science

# bool fie versus # lin. sep. dichotomies Bn 2 1 4 16 14 3 256 104 65536 1882 5 4294967296 94572 6 18446744073709551616 15028134 7 340282366920938463463374607431768211456 8378070864 For n = 2 there are two boolean functions that are not given by linear dichotomies. Which are these? How many linear dichotomies are there for the 5 corners of a regular pentagon? Hint they are in general position. 16-Jan-19 Rudolf Mak TU/e Computer Science

Multi-layer Perceptrons A discrete-neuron multi-layer perceptron consists of an input layer of n real-valued input nodes (not neurons) an output layer of m neurons several intermediate (hidden) layers consisting of one or more neurons. with exception of the last layer the nodes of each layer serve as inputs to the nodes of the next layer each connection from node j in layer k-1 to node i in layer k has a real valued weight wijk It computes a function f: Rn ! {0,1}m 16-Jan-19 Rudolf Mak TU/e Computer Science

Graphical representation input nodes output nodes edge direction left to right not drawn hidden layers 16-Jan-19 Rudolf Mak TU/e Computer Science

Discrete Multi-layer Perceptrons The computational capabilities of multi-layer perceptrons for two and three layers are given by Every boolean function can be computed by a two-layer perceptron Every region in Rn that is bounded by a finite number of n-1 dimensional hyperplanes can be classified by a three-layer perceptron Unfortunately there is no simple learning algorithm for multi-layer perceptrons 16-Jan-19 Rudolf Mak TU/e Computer Science

Disjunctive Normal Form x1 x2 x3 x4 x5 f(x1,x2,x3,x4,x5) 1 Clause Cj x1x2 x3 x4 x5 literals Also known as sum-of-products by electrical engineers Logic table for f 16-Jan-19 Rudolf Mak TU/e Computer Science

Perceptron for a Clause In fact this is a McCulloch Pitts neuron. Edges with weight +1 are excitatory edges. Edges with weight -1 are called inhibitory edges. This is a generalized And-gate 16-Jan-19 Rudolf Mak TU/e Computer Science

2-layer perceptron for a boolean function Neuron in the second layer is a generalized or-gate How would it look if negated variables are allowed (threshold 0.5 – s). 16-Jan-19 Rudolf Mak TU/e Computer Science

XOR revisited 16-Jan-19 Rudolf Mak TU/e Computer Science Use the standard construction 16-Jan-19 Rudolf Mak TU/e Computer Science

XOR revisited again 16-Jan-19 Rudolf Mak TU/e Computer Science Non-layered neural net strictly not a perceptron 16-Jan-19 Rudolf Mak TU/e Computer Science

Minsky Papert observation No diameter limited perceptron can determine whether a geometric figure is connected A and D not connected B and C connected If necessary figures are stretched in the horizontal direction to exceed the limits of perception of the individual neurons A B C D 16-Jan-19 Rudolf Mak TU/e Computer Science

Diameter limited perceptron Each circle is stands for a clause that has a limited number of inputs (receptive fields) Clauses (also called predicates) can be classified in three groups Left group G1 distinguishes a pattern from {A, C} from a pattern from {B, D} Right group G3 distinguishes a pattern from {A, B} from a pattern from {C, D} Middle group has no discriminating power. 16-Jan-19 Rudolf Mak TU/e Computer Science

 16-Jan-19 Rudolf Mak TU/e Computer Science

Star Region 16-Jan-19 Rudolf Mak TU/e Computer Science

3-layer perceptron for star region Set up equations for the lines. Let the origin lie at the center of region R6 For each line let the origin lie in the positive half-space Let l1: 1-x2 = 0 then (0,0) in positive half-space W0 = (1, 1, -1) T Normal vector for line l1 is (1, -1) Rotate this vector over angle f = 2pi/5 to obtain the normal vectors of the other lines. Premultiply by matrix [ cos f -sin f ] counterclockwise [ sin f cos f ] 16-Jan-19 Rudolf Mak TU/e Computer Science

Summary One-layer perceptrons have limited computational capabilities. Only linearly separable sets can be classified. For one-layer perceptrons there exists a learning algorithm with robust convergence properties. Multi-layer perceptrons have larger computational capabilities (all boolean functions for two-layer perceptrons), but for those there does not exist a simple learning algorithm. 16-Jan-19 Rudolf Mak TU/e Computer Science