CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

Slides:



Advertisements
Similar presentations
CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
Advertisements

Neural networks Introduction Fitting neural networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
CS590M 2008 Fall: Paper Presentation
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Why equivariance is better than premature invariance
CSC321: Neural Networks Lecture 3: Perceptrons
Learning Convolutional Feature Hierarchies for Visual Recognition
How to do backpropagation in a brain
Aula 5 Alguns Exemplos PMR5406 Redes Neurais e Lógica Fuzzy.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
Artificial Neural Networks
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Overview of Back Propagation Algorithm
CS 484 – Artificial Intelligence
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
CSC2535: Advanced Machine Learning Lecture 11a Priors and Prejudice Geoffrey Hinton.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
CSC2535 Lecture 2: Some examples of backpropagation learning
CSC2535: Advanced Machine Learning Lecture 6a Convolutional neural networks for hand-written digit recognition Geoffrey Hinton.
How to do backpropagation in a brain
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton.
CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 22: Transforming autoencoders for learning the right representation of shapes Geoffrey.
Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto.
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Csc Lecture 8 Modeling image covariance structure Geoffrey Hinton.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Deep Convolutional Nets
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Neural Networks and Deep Learning Slides credit: Geoffrey Hinton and Yann LeCun.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
CSC321: Neural Networks Lecture 1: What are neural networks? Geoffrey Hinton
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
1 Convolutional neural networks Abin - Roozgard. 2  Introduction  Drawbacks of previous neural networks  Convolutional neural networks  LeNet 5 
Today’s Lecture Neural networks Training
Convolutional Neural Network
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
Deep Feedforward Networks
Deep Learning Amin Sobhani.
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Instructor : Saeed Shiry
Convolutional Networks
Random walk initialization for training very deep feedforward networks
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
Creating Data Representations
On Convolutional Neural Network
Slides adapted from Geoffrey Hinton, Yann Le Cun, Yoshua Bengio
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
Presentation transcript:

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton

Learning by back-propagating error derivatives input vector hidden layers outputs Back-propagate error signal to get derivatives for learning Compare outputs with correct answer to get error signal

Some Success Stories Back-propagation has been used for a large number of practical applications. –Recognizing hand-written characters –Predicting the future price of stocks –Detecting credit card fraud –Recognize speech (wreck a nice beach) –Predicting the next word in a sentence from the previous words This is essential for good speech recognition. –Understanding the effects of brain damage

Applying backpropagation to shape recognition People are very good at recognizing shapes –It’s intrinsically difficult and computers are bad at it Some reasons why it is difficult: –Segmentation: Real scenes are cluttered. –Invariances: We are very good at ignoring all sorts of variations that do not affect the shape. –Deformations: Natural shape classes allow variations (faces, letters, chairs). –A huge amount of computation is required.

The invariance problem Our perceptual systems are very good at dealing with invariances –translation, rotation, scaling –deformation, contrast, lighting, rate We are so good at this that its hard to appreciate how difficult it is. –Its one of the main difficulties in making computers perceive. –We still don’t have generally accepted solutions.

The invariant feature approach Extract a large, redundant set of features that are invariant under transformations –e.g. “pair of parallel lines with a dot between them. With enough of these features, there is only one way to assemble them into an object. –we don’t need to represent the relationships between features directly because they are captured by other features. We must avoid forming features from parts of different objects!

The normalization approach Do preprocessing to normalize the data –e. g. put a box around an object and represent the locations of its pieces relative to this box. –Eliminates as many degrees of freedom as the box has. translation, rotation, scale, shear, elongation –But its not always easy to choose the box R d b d

The replicated feature approach Use many different copies of the same feature detector. –The copies all have slightly different positions. –Could also replicate across scale and orientation. Tricky and expensive –Replication reduces the number of free parameters to be learned. Use several different feature types, each with its own replicated pool of detectors. –Allows each patch of image to be represented in several ways. The red connections all have the same weight.

Backpropagation with weight constraints It is easy to modify the backpropagation algorithm to incorporate linear constraints between the weights. We compute the gradients as usual, and then modify the gradients so that they satisfy the constraints. –So if the weights started off satisfying the constraints, they will continue to satisfy them.

Combining the outputs of replicated features Get a small amount of translational invariance at each level by averaging four neighboring replicated detectors to give a single output to the next level. –Taking the maximum of the four should work better. Achieving invariance in multiple stages seems to be what the monkey visual system does. –Segmentation may also be done in multiple stages.

The hierarchical partial invariance approach At each level of the hierarchy, we use an “or” to get features that are invariant across a bigger range of transformations. –Receptive fields in the brain look like this. –We can combine this approach with an initial approximate normalization. or

Le Net Yann LeCun and others developed a really good recognizer for handwritten digits by using backpropagation in a feedforward net with: –Many hidden layers –Many pools of replicated units in each layer. –Averaging of the outputs of nearby replicated units. –A wide net that can cope with several characters at once even if they overlap. Look at all of the demos of LENET at –These demos are required “reading” for the tests.

A brute force approach LeNet uses knowledge about the invariances to design: – the network architecture –or the weight constraints –or the types of feature But its much simpler to incorporate knowledge of invariances by just creating extra training data: –for each training image, produce new training data by applying all of the transformations we want to be insensitive to (Le Net can benefit from this too) –Then train a large, dumb net on a fast computer. –This works surprisingly well if the transformations are not too big (so do approximate normalization first).

Making dumb backpropagation work really well for recognizing digits Using the standard viewing transformations plus local deformation fields to get LOTS of data. Use a single hidden layer with very small initial weights: –it needs to break symmetry very slowly to find a good local minimum Use a more appropriate error measure for multi- class categorization.

Problems with squared error The squared error measure has some drawbacks –If the desired output is 1 and the actual output is there is almost no gradient for a logistic unit to fix up the error. –We know that the outputs should sum to 1, but we are depriving the network of this knowledge. Is there a different cost function that is more appropriate and works better? –Force the outputs to represent a probability distribution across discrete alternatives.

Softmax The output units use a non- local non-linearity: The cost function is the negative log prob of the right answer The steepness of C exactly balances the flatness of the output non-linearity output units x y x y x y desired value