CSC321: Neural Networks Lecture 3: Perceptrons

Slides:



Advertisements
Similar presentations
Artificial Intelligence 12. Two Layer ANNs
Advertisements

Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Artificial Neural Networks Some slides adapted from Geoffrey Hinton.
Perceptron Lecture 4.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
G53MLE | Machine Learning | Dr Guoping Qiu
Artificial Neural Networks (1)
Perceptron Learning Rule
NEURAL NETWORKS Perceptron
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Computer Vision Lecture 18: Object Recognition II
All lecture slides will be available as .ppt, .ps, & .htm at
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Neural Network Computing Lecture no.1. All rights reserved L. Manevitz Lecture 12 McCullogh-Pitts Neuron The activation of a McCullogh-Pitts Neuron is.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
CS 4700: Foundations of Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Radial Basis Function Networks
Neural Networks Lecture 8: Two simple learning algorithms
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Where We’re At Three learning rules  Hebbian learning regression  LMS (delta rule) regression  Perceptron classification.
How to do backpropagation in a brain
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
1 Mehran University of Engineering and Technology, Jamshoro Department of Electronic, Telecommunication and Bio-Medical Engineering Neural Networks Mukhtiar.
Artificial Neural Networks
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
September 23, 2014Computer Vision Lecture 5: Binary Image Processing 1 Binary Images Binary images are grayscale images with only two possible levels of.
CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.
CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.
ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Chapter 2 Single Layer Feedforward Networks
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Neural Networks and Deep Learning Slides credit: Geoffrey Hinton and Yann LeCun.
Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.
EEE502 Pattern Recognition
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
Perceptrons Michael J. Watts
Chapter 6 Neural Network.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
CSC321: Neural Networks Lecture 1: What are neural networks? Geoffrey Hinton
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets Geoffrey Hinton.
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
Neural Networks for Machine Learning Lecture 3a Learning the weights of a linear neuron Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Machine Learning Supervised Learning Classification and Regression
CSC321 Lecture 18: Hopfield nets and simulated annealing
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Chapter 2 Single Layer Feedforward Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks Chapter 5
Capabilities of Threshold Neurons
The Naïve Bayes (NB) Classifier
Computer Vision Lecture 19: Object Recognition III
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
David Kauchak CS158 – Spring 2019
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

CSC321: Neural Networks Lecture 3: Perceptrons Geoffrey Hinton www.cs.toronto.edu/~hinton/csc321/notes/lec3.htm

The connectivity of a perceptron The input is recoded using hand-picked features that do not adapt. Only the last layer of weights is learned. The output units are binary threshold neurons and are learned independently. output units non-adaptive hand-coded features input units

Binary threshold neurons McCulloch-Pitts (1943) First compute a weighted sum of the inputs from other neurons Then output a 1 if the weighted sum exceeds the threshold. 1 1 if y 0 otherwise z threshold

The perceptron convergence procedure Add an extra component with value 1 to each input vector. The “bias” weight on this component is minus the threshold. Now we can forget the threshold. Pick training cases using any policy that ensures that every training case will keep getting picked If the output unit is correct, leave its weights alone. If the output unit incorrectly outputs a zero, add the input vector to the weight vector. If the output unit incorrectly outputs a 1, subtract the input vector from the weight vector. This is guaranteed to find a suitable set of weights if any such set exists.

Weight space Imagine a space in which each axis corresponds to a weight. A point in this space is a weight vector. Each training case defines a plane. On one side of the plane the output is wrong. To get all training cases right we need to find a point on the right side of all the planes. wrong right bad weights good weights right wrong an input vector o origin

Why the learning procedure works Consider the squared distance between any satisfactory weight vector and the current weight vector. Every time the perceptron makes a mistake, the learning algorithm moves the current weight vector towards all satisfactory weight vectors (unless it crosses the constraint plane). So consider “generously satisfactory” weight vectors that lie within the feasible region by a margin at least as great as the largest update. Every time the perceptron makes a mistake, the squared distance to all of these weight vectors is always decreased by at least the squared length of the smallest update vector. margin right wrong

What perceptrons cannot do The binary threshold output units cannot even tell if two single bit numbers are the same! Same: (1,1)  1; (0,0)  1 Different: (1,0)  0; (0,1)  0 The following set of inequalities is impossible: Data Space 0,1 1,1 weight plane output =1 output =0 0,0 1,0 The positive and negative cases cannot be separated by a plane

What can perceptrons do? They can only solve tasks if the hand-coded features convert the original task into a linearly separable one. How difficult is this? The N-bit parity task : Requires N features of the form: Are at least m bits on? Each feature must look at all the components of the input. The 2-D connectedness task requires an exponential number of features!

The N-bit even parity task There is a simple solution that requires N hidden units. Each hidden unit computes whether more than M of the inputs are on. This is a linearly separable problem. There are many variants of this solution. It can be learned. It generalizes well if: +1 output -2 +2 -2 +2 >0 >1 >2 >3 1 0 1 0 input

Why connectedness is hard to compute Even for simple line drawings, there are exponentially many cases. Removing one segment can break connectedness But this depends on the precise arrangement of the other pieces. Unlike parity, there are no simple summaries of the other pieces that tell us what will happen. Connectedness is easy to compute with an iterative algorithm. Start anywhere in the ink Propagate a marker See if all the ink gets marked.

Distinguishing T from C in any orientation and position What kind of features are required to distinguish two different patterns of 5 pixels independent of position and orientation? Do we need to replicate T and C templates across all positions and orientations? Looking at pairs of pixels will not work Looking at triples will work if we assume that each input image only contains one object. Replicate the following two feature detectors in all positions + + - + - + If any of these equal their threshold of 2, it’s a C. If not, it’s a T.

Beyond perceptrons Need to learn the features, not just how to weight them to make a decision. This is a much harder task. We may need to abandon guarantees of finding optimal solutions. Need to make use of recurrent connections, especially for modeling sequences. The network needs a memory (in the activities) for events that happened some time ago, and we cannot easily put an upper bound on this time. Engineers call this an “Infinite Impulse Response” system. Long-term temporal regularities are hard to learn. Need to learn representations without a teacher. This makes it much harder to define what the goal of learning is.

Beyond perceptrons Need to learn complex hierarchical representations for structures like: “John was annoyed that Mary disliked Bill.” We need to apply the same computational apparatus to the embedded sentence as to the whole sentence. This is hard if we are using special purpose hardware in which activities of hardware units are the representations and connections between hardware units are the program. We must somehow traverse deep hierarchies using fixed hardware and sharing knowledge between levels.

Sequential Perception We need to attend to one part of the sensory input at a time. We only have high resolution in a tiny region. Vision is a very sequential process (but the scale varies) We do not do high-level processing of most of the visual input (lack of motion tells us nothing has changed). Segmentation and the sequential organization of sensory processing are often ignored by neural models. Segmentation is a very difficult problem Segmenting a figure from its background seems very easy because we are so good at it, but its actually very hard. Contours sometimes have imperceptible contrast, but we still perceive them. Segmentation often requires a lot of top-down knowledge.