Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Boltzmann Machine Psych 419/719 March 1, 2001.

Similar presentations


Presentation on theme: "The Boltzmann Machine Psych 419/719 March 1, 2001."— Presentation transcript:

1 The Boltzmann Machine Psych 419/719 March 1, 2001

2 Recall Constraint Satisfaction.. We have a network of units and connections… Finding an optimal state involves relaxation: letting the network settle into a configuration that maximizes a goodness function This is done by annealing

3 Simulated Annealing Update unit states according to a probability distribution, which is based on: –The input to the unit. Higher input = greater odds of being on –The temperature. High temperature = more random. Low temperature = deterministic function of input Start with high temperature, and gradually reduce it

4 Constraint Satisfaction Networks Have Nice Properties Can settle into stable configurations based on partial or noisy information Can do pattern completion Have well formed attractors corresponding to stable states BUT: How can we make a network learn?

5 What about Backprop? Two problems: –Tends to split the probability distributions –If input is ambiguous (say, the word LEAD), output reflects that distribution. Not like the necker cube –Also: not very biologically plausible. –Error gradients travel backwards along connections. Neurons don’t seem to do this.

6 We Need Hidden Units Hidden units are needed to solve xor- style problems In these networks, we have a set of symmetric connections between units. Some units are visible and others are hidden

7 The Boltzmann Machine: Memorizing Patterns Here, we want to train the network on a set of patterns. We want the network to learn about the statistics and relationships between the parts of the patterns. Not really performing an explicit mapping (like backprop is good for)

8 How it Works Step 1. Pick an example Step 2. Run network in positive phase Step 3. Run network in negative phase Step 4. Compare the statistics of the two phases Step 5. Update the weights based on statistics Step 6. Go to step 1 and repeat.

9 Step 1: Pick Example Pretty simple. Just select an example at random.

10 Step 2. The Positive Phase Clamp our visible units with the pattern specified by our current example Let network settle using the simulated annealing method Record the outputs of the units Start again with our example, settling again and recording units again.

11 Step 3. The Negative Phase Here, we don’t clamp the network units. We just let it settle to some state as before. Do this several times, again recording the unit outputs.

12 Step 4. Compare Statistics For each pair of units, we compute the odds that both units are coactive (both on) for the positive phase. Do it also for the negative phase. If we have n units, this gives us two n x n matrices of probabilities p i,j is probability that both unit i and j are both on.

13 Step 5: Update Weights Change each weight according to the difference of the probabilities for the positive and negative phase Here, k is like a learning rate

14 Why it Works This reduces the difference between what the network settles to when the inputs are clamped, and what it settles to when its allowed to free-run. So, the weights learn about what kinds of visible units go together. Recruits hidden units to help learn higher order relationships

15 Can Be Used For Mappings Too Here, the positive phase involves clamping both the input and output units and letting the network settle. The negative phase involves clamping just the input units Network learns that given the input, it should settle to a state where the output units are what they should be

16 Contrastive Hebbian Learning Very similar to a normal Boltzmann machine, except we can have units whose outputs are a deterministic function of their input (like the logistic). As before, we have two phases: positive and negative.

17 Contrastive Hebbian Learning Rule Weight updates based on actual unit outputs, not probabilities that they’re both on.

18 Problems Weight explosion. If weights get too big too early, network will get stuck in one goodness optimum. –Can be alleviated with weight decay Settling time. Time to process an example is long, due to settling process. Learning time. Takes a lot of presentations to learn. Symmetric weights? Phases?

19 Sleep? It has been suggested that something like the minus phase might be happening during sleep: Spontaneous correlations between hidden units (not those driven by external input) get subtracted off. Will vanish, unless driven by external input while awake. Not a lot of evidence to support this conjecture. We can learn while awake!

20 For Next Time Optional reading handed out. Ends section on learning internal representations. Next: biologically plausible learning. Remember: –No class next Thursday –Homework 3 due March 13 –Project proposal due March 15. See web page.


Download ppt "The Boltzmann Machine Psych 419/719 March 1, 2001."

Similar presentations


Ads by Google