CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets Geoffrey Hinton.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Bioinspired Computing Lecture 16
Chapter3 Pattern Association & Associative Memory
Slides from: Doug Gray, David Poole
Deep Learning Bing-Chen Tsai 1/21.
Artificial Neural Networks (1)
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
CSC321: Neural Networks Lecture 3: Perceptrons
1 Neural networks 3. 2 Hopfield network (HN) model A Hopfield network is a form of recurrent artificial neural network invented by John Hopfield in 1982.
Machine Learning Neural Networks
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks: Concepts (Reading:
Neural Networks Marco Loog.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Neural Networks Chapter 2 Joost N. Kok Universiteit Leiden.
8/6/20151 Neural Networks CIS 479/579 Bruce R. Maxim UM-Dearborn.
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Neural Networks Lecture 8: Two simple learning algorithms
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Deep Learning and Neural Nets Spring 2015
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
The Boltzmann Machine Psych 419/719 March 1, 2001.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
IE 585 Associative Network. 2 Associative Memory NN Single-layer net in which the weights are determined in such a way that the net can store a set of.
CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.
CIAR Second Summer School Tutorial Lecture 1a Sigmoid Belief Nets and Boltzmann Machines Geoffrey Hinton.
CSC321: Neural Networks Lecture 24 Products of Experts Geoffrey Hinton.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
CSC2535 Lecture 4 Boltzmann Machines, Sigmoid Belief Nets and Gibbs sampling Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
The Essence of PDP: Local Processing, Global Outcomes PDP Class January 16, 2013.
Constraint Satisfaction and Schemata Psych 205. Goodness of Network States and their Probabilities Goodness of a network state How networks maximize goodness.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
Boltzman Machines Stochastic Hopfield Machines Lectures 11e 1.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Chapter 6 Neural Network.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CSC2535 Lecture 5 Sigmoid Belief Nets
CSC321: Computation in Neural Networks Lecture 21: Stochastic Hopfield nets and simulated annealing Geoffrey Hinton.
CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
J. Kubalík, Gerstner Laboratory for Intelligent Decision Making and Control Artificial Neural Networks II - Outline Cascade Nets and Cascade-Correlation.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
CSC321 Lecture 18: Hopfield nets and simulated annealing
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Real Neurons Cell structures Cell body Dendrites Axon
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Boltzmann Machine (BM) (§6.4)
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
CS623: Introduction to Computing with Neural Nets (lecture-11)
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

CSC2535: Computation in Neural Networks Lecture 8: Hopfield nets Geoffrey Hinton

Hopfield Nets Networks of binary threshold units with recurrent connections are very hard to analyse. But Hopfield realized that if the connections are symmetric, there is a global energy function –Each “configuration” of the network has an energy. –The binary threshold decision rule obeys the energy function – it minimizes energy locally. Hopfield proposed that memories could be energy minima of a neural net. –The binary threshold decision rule can then be used to “clean up” incomplete or corrupted memories.

The energy function The global energy is the sum of many contributions. Each contribution depends on one connection weight and the binary states of two neurons: The simple quadratic energy function makes it easy to compute how the state of one neuron affects the global energy:

Settling to an energy minimum Pick the units one at a time and flip their states if it reduces the global energy. Find the minima in this net If units make simultaneous decisions the energy could go up

Storing memories If we use activities of 1 and -1, we can store a state vector by incrementing the weight between any two units by the product of their activities. –Treat biases as weights from a permanently on unit With states of 0 and 1 the rule is slightly more complicated.

Spurious minima Each time we memorize a configuration, we hope to create a new energy minimum. But what if two nearby minima merge to create a minimum at an intermediate location? This limits the capacity of a Hopfield net. Using Hopfield’s storage rule the capacity of a totally connected net with N units is only 0.15N memories.

Better storage rules We could improve efficiency by using sparse vectors. –Its optimal to have log N bits on in each vector and the rest off. This gives useful bits per bit provided we adjust the thresholds dynamically during retrieval. Instead of trying to store vectors in one shot as Hopfield does, cycle through the training set and use the perceptron convergence procedure to train each unit to have the correct state given the states of all the other units in that vector. –This uses the capacity of the weights efficiently.

Avoiding spurious minima by unlearning Hopfield and Feinstein and Palmer suggested the following strategy: –Let the net settle from a random initial state and then do unlearning. –This will get rid of deep, spurious minima and increase memory capacity. –Crick and Mitchison proposed it as a model of what dreams are for. But how much unlearning should we do? –And can we analyze what unlearning achieves?

Wishful thinking? Would’nt it be nice if interleaved learning and unlearning corresponded to maximum likelihood fitting of a model to the training set. –This seems improbable, especially if we want to include hidden units whose states are not specified by the vectors to be stored.

Another computational role for Hopfield nets Instead of using the net to store memories, use it to construct interpretations of sensory input. –The input is represented by the visible units. –The interpretation is represented by the states of the hidden units. –The badness of the interpretation is represented by the energy This raises two difficult issues: –How do we escape from poor local minima to get good interpretations? –How do we learn the weights on connections to the units? Visible units. Used to represent the inputs Hidden units. Used to represent an interpretation of the inputs

Stochastic units make search easier Replace the binary threshold units by binary stochastic units. Use temperature to make it easier to cross energy barriers. –Start at high temperature where its easy to cross energy barriers. –Reduce slowly to low temperature where good states are much more probable than bad ones. A B C

The annealing trade-off At high temperature the transition probabilities for uphill jumps are much greater. At low temperature the equilibrium probabilities of good states are much better than the probabilities of bad ones.

Why annealing works In high dimensions, energy barriers are typically much more degenerate than the minima that they separate. At high temperature the free energy of the barrier is lower than the free energy of the minima. E F A C B B B A B C