CSC321: Computation in Neural Networks Lecture 21: Stochastic Hopfield nets and simulated annealing Geoffrey Hinton.

Slides:



Advertisements
Similar presentations
Computational Intelligence
Advertisements

Lecture 10 Boltzmann machine
Deep Learning Bing-Chen Tsai 1/21.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
For Wednesday Read chapter 19, sections 1-3 No homework.
Laurent Itti: CS564 - Brain Theory and Artificial Intelligence
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
Force directed graph drawing Thomas van Dijk. The problem Given a set of vertices and edges, compute positions for the vertices. If the edges don’t have.
CSC321: Neural Networks Lecture 3: Perceptrons
Fundamentals of Circuits: Direct Current (DC)
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
Stochastic Neural Networks, Optimal Perceptual Interpretation, and the Stochastic Interactive Activation Model PDP Class January 15, 2010.
Network Goodness and its Relation to Probability PDP Class Winter, 2010 January 13, 2010.
Basic Models in Neuroscience Oren Shriki 2010 Associative Memory 1.
Simulated Annealing Van Laarhoven, Aarts Version 1, October 2000.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Neural Networks Chapter 2 Joost N. Kok Universiteit Leiden.
Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 2 Ryan Kinworthy CSCE Advanced Constraint Processing.
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Stochastic Optimization and Simulated Annealing Psychology /719 January 25, 2001.
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
CIAR Second Summer School Tutorial Lecture 1a Sigmoid Belief Nets and Boltzmann Machines Geoffrey Hinton.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
CSC321: Neural Networks Lecture 24 Products of Experts Geoffrey Hinton.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
CSC2535 Lecture 4 Boltzmann Machines, Sigmoid Belief Nets and Gibbs sampling Geoffrey Hinton.
Ch 22 pp Lecture 2 – The Boltzmann distribution.
B. Stochastic Neural Networks
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
The Essence of PDP: Local Processing, Global Outcomes PDP Class January 16, 2013.
Constraint Satisfaction and Schemata Psych 205. Goodness of Network States and their Probabilities Goodness of a network state How networks maximize goodness.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
CSC321: Lecture 7:Ways to prevent overfitting
How to learn a generative model of images Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
Boltzman Machines Stochastic Hopfield Machines Lectures 11e 1.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam en Universiteit Utrecht
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CSC2535 Lecture 5 Sigmoid Belief Nets
Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets Geoffrey Hinton.
CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 21, Week 101 More on DEEP ANNs –Convolution –Max Pooling –Drop Out Final ANN Wrapup FYI:
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
CSC321 Lecture 18: Hopfield nets and simulated annealing
Heuristic Optimization Methods
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Artificial Neural Networks
Boltzmann Machine (BM) (§6.4)
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
CSC 578 Neural Networks and Deep Learning
Stochastic Methods.
Presentation transcript:

CSC321: Computation in Neural Networks Lecture 21: Stochastic Hopfield nets and simulated annealing Geoffrey Hinton

Another computational role for Hopfield nets Instead of using the net to store memories, use it to construct interpretations of sensory input. –The input is represented by the visible units. –The interpretation is represented by the states of the hidden units. –The badness of the interpretation is represented by the energy This raises two difficult issues: –How do we escape from poor local minima to get good interpretations? –How do we learn the weights on connections to the hidden units? Visible units. Used to represent the inputs Hidden units. Used to represent an interpretation of the inputs

An example: Interpreting a line drawing Use one “2-D line” unit for each possible line in the picture. –Any particular picture will only activate a very small subset of the line units. Use one “3-D line” unit for each possible 3-D line in the scene. –Each 2-D line unit could be the projection of many possible 3-D lines. Make these 3-D lines compete. Make 3-D lines support each other if they join in 3-D. Make them strongly support each other if they join at right angles. Join in 3-D Join in 3-D at right angle 2-D lines 3-D lines picture

Noisy networks find better energy minima A Hopfield net always makes decisions that reduce the energy. –This makes it impossible to escape from local minima. We can use random noise to escape from poor minima. –Start with a lot of noise so its easy to cross energy barriers. –Slowly reduce the noise so that the system ends up in a deep minimum. This is “simulated annealing”. A B C

Stochastic units Replace the binary threshold units by binary stochastic units that make biased random decisions. –The “temperature” controls the amount of noise –Decreasing all the energy gaps between configurations is equivalent to raising the noise level. temperature

The annealing trade-off At high temperature the transition probabilities for uphill jumps are much greater. At low temperature the equilibrium probabilities of good states are much better than the equilibrium probabilities of bad ones. Energy increase

How temperature affects transition probabilities A B A B High temperature transition probabilities Low temperature transition probabilities

Thermal equilibrium Thermal equilibrium is a difficult concept! –It does not mean that the system has settled down into the lowest energy configuration. –The thing that settles down is the probability distribution over configurations. The best way to think about it is to imagine a huge ensemble of systems that all have exactly the same energy function. –The probability distribution is just the fraction of the systems that are in each possible configuration. –We could start with all the systems in the same configuration, or with an equal number of systems in each possible configuration. –After running the systems stochastically in the right way, we eventually reach a situation where the number of systems in each configuration remains constant even though any given system keeps moving between configurations

An analogy Imagine a casino in Las Vegas that is full of card dealers (we need many more than 52! of them). We start with all the card packs in standard order and then the dealers all start shuffling their packs. –After a few time steps, the king of spades still has a good chance of being next to queen of spades. The packs have not been fully randomized. –After prolonged shuffling, the packs will have forgotten where they started. There will be an equal number of packs in each of the 52! possible orders. –Once equilibrium has been reached, the number of packs that leave a configuration at each time step will be equal to the number that enter the configuration. The only thing wrong with this analogy is that all the configurations have equal energy, so they all end up with the same probability.