Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.

Slides:

Advertisements

Similar presentations

Neural Networks and Kernel Methods

Advertisements

Applications of one-class classification

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.

Slides from: Doug Gray, David Poole

NEURAL NETWORKS Backpropagation Algorithm

Neural networks Introduction Fitting neural networks

Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.

Simple Neural Nets For Pattern Classification

Handwritten Character Recognition Using Artificial Neural Networks Shimie Atkins & Daniel Marco Supervisor: Johanan Erez Technion - Israel Institute of.

October 14, 2010Neural Networks Lecture 12: Backpropagation Examples 1 Example I: Predicting the Weather We decide (or experimentally determine) to use.

Artificial Neural Networks

CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.

CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.

CSC2535: Advanced Machine Learning Lecture 11a Priors and Prejudice Geoffrey Hinton.

Radial-Basis Function Networks

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Neural Networks Lecture 8: Two simple learning algorithms

CSC2535 Lecture 2: Some examples of backpropagation learning

July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.

How to do backpropagation in a brain

Artificial Neural Networks

Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization Problem Term presentation for CSC7333 Machine.

Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.

CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.

Chapter 9 Neural Network.

Machine Learning Chapter 4. Artificial Neural Networks

Artificial Neural Network Supervised Learning دكترمحسن كاهاني

Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.

COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.

CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton.

CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.

CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto.

School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.

CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.

CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.

CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.

CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.

CSC321: Lecture 7:Ways to prevent overfitting

Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.

Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.

CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.

Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.

CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.

Today’s Lecture Neural networks Training

Machine Learning Supervised Learning Classification and Regression

Neural Network Architecture Session 2

CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.

Deep Learning Amin Sobhani.

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

Matt Gormley Lecture 16 October 24, 2016

network of simple neuron-like computing elements

CSC321: Neural Networks Lecture 11: Learning in recurrent networks

Presentation transcript:

Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto

Good old-fashioned neural networks input vector hidden layers outputs Back-propagate error signal to get derivatives for learning Compare outputs with correct answer to get error signal bias

Two different ways to use backpropagation for handwritten digit recognition Standard method: Train a neural network with 10 output units to map from pixel intensities to class labels. –This works well, but it does not make use of a lot of prior knowledge that we have about how the images were generated. The generative approach: First write a graphics program that converts a motor program (a sequence of muscle commands) into an image. –Then learn to map pixel intensities to motor programs. –Digits classes are far more natural in the space of motor programs These twos have very similar motor programs

A variation It is very difficult to train a single net to recognize very different motor programs. So train a separate network for each digit class. –When given a test image, get each of the 10 networks to extract a motor program. – Then see which of the 10 motor programs is best at reconstructing the test image. Also consider how similar that motor program is to the other motor programs for that class.

A simple generative model We can generate digit images by simulating the physics of drawing. A pen is controlled by four springs. –The trajectory of the pen is determined by the 4 spring stiffnesses at 17 time steps. The ink is produced from 60 evenly spaced points along the trajectory: –Use bilinear interpolation to distribute the ink at each point to the 4 closest pixels. –Use time-invariant parameters for the ink intensity and the width of a convolution kernel. –Then clip intensities above 1. We can also learn to set the mass, viscosity, and positions of the four endpoints for each image.

Some ways to invert a generator Look inside the generator to see how it works and try to invert each step of the generative process. –Its hard to invert processes that lose information The third dimension The correspondence between model-parts and image-parts. Define a prior distribution over codes and generate lots of (code, image) pairs. Then train a “recognition” neural network that does image  code. –But where do we get the prior over codes? –The distribution of codes is exactly what we want to learn from the data! –Is there any way to do without a the prior over codes?

A way to train a class-specific model from a single prototype Start with a single prototype code Learn to invert the generator in the vicinity of the prototype by adding noise to the code and generating (code, image) pairs for training a neural net. Then use the learned model to get codes for real images that are similar to the prototype. Add noise to these codes and generate more training data. –This extends the region that can be inverted along the data manifold (with genetic jumps). + prototype nearby datapoint manifold of digit class in code space

An example during the later stages of training About 2 ms

How training examples are created generate clean code noisy code predicted code image recon- structed image training image add noise hidden units biases = prototype code error recon. error used for training used for testing

How the perceived trajectory changes at the early stages of learning

Typical fits to the training data at the end of training

performance on test data blue = 2 red = 3

The five errors

Performance on test data if we do not use an extra trick

On test data, the model often gets the registration slightly wrong The neural net has solved the difficult global search problem, but it has got the fine details wrong. So we need to perform a local search to fix up the details

Local search Use the trajectory produced by the neural network as an initial guess. –Then use the difference between the data and the reconstruction to adjust the guess. This means we need to convert residual errors in pixel space into gradients in trajectory space. –But how to we backpropagate the pixel residuals through the generative graphics model? Make a neural network version of the generative model.

How the generative neural net is trained generate clean code noisy code image training image add noise recognition hidden units generative hidden units recon. error used for training recon- structed image We use the same training pairs as for the recognition net

An example during the later stages of training the generative neural network

How the generative neural net is used generate clean code current code real image current image initialize recognition hidden units generative hidden units recon error pixel error gradients code gradients update code

The 2 model gets better at fitting 2’s But it also gets better a fitting 3’s

Improvement from local search Local search reduces the squared pixel error of the correct model by 20-50% –It depends on how well the networks are trained The squared pixel error of the wrong model is reduced by a much smaller fraction – It often increases because the pixel residuals are converted to motor program adjustments using a generative net that has not been trained in that part of the space. The classification error improves by 25-40%

Why spring stiffnesses are a good language The four spring stiffnesses define a quadratic energy function. This function generates an acceleration. The acceleration depends on where the mass is. If the mass is not where we thought it was during planning, the force that is generated is the planned force plus a feedback term that pulls the mass back towards where it should have been planned force feedback term iso-energy contour planned position

A more ambitious application Suppose we have a way to generate realistic images of faces from underlying variables that describe emotional expression, lighting, pose, and individual 3-D face shape. Given a big database of unlabeled faces, we should be able to recover the state of the generator.

Conclusions so far Computers are becoming fast enough to allow adaptive networks to be used in all sorts of novel ways. The most principled way to do perception is by inverting a generative model.

THE END

An even more ambitious application Suppose we have a model of a person in which the muscles are springs. We are also given a prototype motor program that makes the model walk for a few steps. Can we train a neural network to convert the current dynamic state plus a partial description of a desired walk into a motor program that produces the right behaviour for the next 100 mille-seconds?

Some other generative models We could use a B-spline with 8 control points to generate the curve (Williams et. al). This does not learn to fit the images nearly as well. If we use 16 control points it ties itself in knots. Momentum is useful.