Energy models and Deep Belief Networks

Slides:

Advertisements

Similar presentations

Greedy Layer-Wise Training of Deep Networks

Advertisements

Deep Belief Nets and Restricted Boltzmann Machines

Deep Learning Bing-Chen Tsai 1/21.

CIAR Second Summer School Tutorial Lecture 2a Learning a Deep Belief Net Geoffrey Hinton.

Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.

CS590M 2008 Fall: Paper Presentation

Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.

Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.

What kind of a Graphical Model is the Brain?

CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)

Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009

CIAR Summer School Tutorial Lecture 2b Learning a Deep Belief Net

Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.

Wake-Sleep algorithm for Representational Learning

Deep Belief Networks for Spam Filtering

Associative Learning.

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/

CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.

Deep Boltzman machines Paper by : R. Salakhutdinov, G. Hinton Presenter : Roozbeh Gholizadeh.

Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero

How to do backpropagation in a brain

Boltzmann Machines and their Extensions S. M. Ali Eslami Nicolas Heess John Winn March 2013 Heriott-Watt University.

Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.

CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.

CIAR Second Summer School Tutorial Lecture 1b Contrastive Divergence and Deterministic Energy-Based Models Geoffrey Hinton.

Varieties of Helmholtz Machine Peter Dayan and Geoffrey E. Hinton, Neural Networks, Vol. 9, No. 8, pp , 1996.

Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient Tijmen Tieleman University of Toronto (Training MRFs using new algorithm.

Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.

CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.

Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient Tijmen Tieleman University of Toronto.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.

M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014)

CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.

How to learn a generative model of images Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.

Boltzman Machines Stochastic Hopfield Machines Lectures 11e 1.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Deep learning Tsai bing-chen 10/22.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.

CSC2515 Fall 2008 Introduction to Machine Learning Lecture 8 Deep Belief Nets All lecture slides will be available as.ppt,.ps, &.htm at

CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.

CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.

CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.

CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

1 Restricted Boltzmann Machines and Applications Pattern Recognition (IC6304) [Presentation Date: ] [ Ph.D Candidate,

Big data classification using neural network

Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton

Learning Deep Generative Models by Ruslan Salakhutdinov

Deep Learning Amin Sobhani.

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

Article Review Todd Hricik.

Restricted Boltzmann Machines for Classification

CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.

Multimodal Learning with Deep Boltzmann Machines

Deep Learning Qing LU, Siyuan CAO.

Deep Belief Networks Psychology 209 February 22, 2013.

Structure learning with deep autoencoders

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

Restricted Boltzman Machines

Face Recognition with Deep Learning Method

CS 4501: Introduction to Computer Vision Training Neural Networks II

Deep Architectures for Artificial Intelligence

Deep Belief Nets and Ising Model-Based Network Construction

network of simple neuron-like computing elements

CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.

CSC 2535: Computation in Neural Networks Lecture 9 Learning Multiple Layers of Features Greedily Geoffrey Hinton.

Products of Experts Products of Experts, ICANN’99, Geoffrey E. Hinton, 1999 Training Products of Experts by Minimizing Contrastive Divergence, GCNU TR.

Autoencoders David Dohan.

CSC 578 Neural Networks and Deep Learning

Presentation transcript:

Energy models and Deep Belief Networks Definition examples overcomplete ICA: generative vs. energy-based models Learning contrastive divergence (score matching) Restricted Boltzmann Machines Deep Belief Networks infinite networks <-> RBMs learning: greedy + wake-sleep phase

Deep Belief Networks Demo: http://www.cs.toronto.edu/~hinton/digits.html Other datasets: http://www.cs.toronto.edu/~roweis/data.html

Goal architecture

Ingredient 1 Write code to perform inference and CD learning in a RBM Test that, given weights and visible unit v, you get a correct distribution over h (both probabilities and samples); same for v given h Test that, given v and perturbing correct w, the algorithm comes back to minimum Optional: insert decay and momentum term

Ingredient 2 Write code to perform inference and CD learning in a RBM with softmax labels Define simple model with 2 labels, 2 hidden units, 4 input units; label 1 generates either [1,0,0,0] or [0,1,0,0]; label 2 generates [0,0,1,0] or [0,0,0,1] Verify that generating with fixed labels gives you correct input patterns Verify that after learning, labels are inferred correctly in all 4 cases Optional: insert decay and momentum term

Greedy learning At this point you can already train the DBN with the greedy algorithm ! The input to each RBM is given by the probabilities over the hidden states at the previous layer Begin using a few letters from the small dataset binaryalphadigs.mat (100 units per layer should be ok) Try to generate letters by clamping one of the labels Given all examples of one class, how invariant is the representation in the various layers? Does that change if you learn a model without labels? If you want to use the MNIST dataset, divide the data in balanced mini-batches (e.g., 10 examples from each class)

Classification Try to classify the input letters: easy, inaccurate: use the recognition model, set uniform probabilities over labels; look at the probabilities over labels at equilibrium hard, more accurate: get representation at top hidden layer, compute free energy for that state and each label lit in turn (Teh, Hinton, 2001)

Up-down algorithm Write code to refine the greedy solution using the up-down algorithm see Appendix B in (Hinton et al., 2006) increase the number of Gibbs iterations at the top during learning Optional: how does accuracy depend on the architecture? How good can one get by just greedily learning many layers?