Download presentation
Presentation is loading. Please wait.
Published byAbner Clarke Modified over 9 years ago
1
Boltzman Machines Stochastic Hopfield Machines Lectures 11e https://class.coursera.org/neuralnets-2012-001/lecture/131 1
2
2 Document classification given binary vectors Nuclear power station – dont want positive examples!
3
3 Two ways a model can generate data: 1)Causal model: First generate latent variables (hidden units), then … 2)Boltzman Machines: …
4
4
5
5
6
6
7
7
8
8
9
What to do when the network is big 9
10
What is Needed for Learning: 10
11
Learning in Boltzman Machines Lecture 12a 11
12
Modelling the input vectors 12 There are no labels; we want to build a model of a set of input vectors.
13
13
14
14
15
15
16
16 Given that one needs to know about all the other weights, it is very surprising that there is a simple learning algorithm:
17
17 How often i and j are on together when v is clamped on visible units How often i and j are on together when v is NOT clamped
18
18 First term in the rule says raise the weights in proportion to the product of activities that the units have (Hebbian learning). But if we only use this rule, the weights will all become positive and the whole system will blow up. So the second term in the rule says to decrease how often the units are on together when you are sampling from the model’s distribution. An alternate view is that the first term is like the storage term for a Hopfield net and the second term term for getting rid of the spurious minima. And this is the correct way of thinking about that (that tells you how much unlearning to do).
19
19
20
Unlearning to get rid of the spurious minima 20
21
-Sample how often to units are on together = measuring the correlation between two units -Repeat over all the data vectors -You expect the energy landscape to have many different minimum that are fairly separated and have about the same energy. -Model a set of images all of which has the same energy and unreasonable images with very high energy.
22
Restricted Boltzman Machines Lecture 12c 22
23
Much simplified architecture: No connection between hidden units If visible units are given, equilibrium distribution of hidden units can be computed in one step – because hidden units are all independent from one another given the state of visible units Proper Boltzman Machine learning alg. is still slow for a restricted Boltzman machine In 1998, a short cut for Boltzman machines (Hinton) approx. but works well in practice caused resurgence in this area 23
24
24
25
25 Note that this does not depend on what other units are doing; so can be computed all in parallel.
26
26 Fantasy particles == global configurations After each weight update, you update the fantasy particles a little and that should bring them back to close to being in equilibrium. Algorithm works very well at building density models.
27
Alternate but much faster algorithm: 27
28
28 Hinton 2002 -
29
29
30
30
31
31
32
Example of Contrastive Divergence Lecture 12d 32
33
33
34
34
35
35
36
36
37
37
38
RBMs for Collaborative Filtering Lecture 12e 38
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.