Learning Deep Generative Models by Ruslan Salakhutdinov

Slides:



Advertisements
Similar presentations
The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Advertisements

Greedy Layer-Wise Training of Deep Networks
Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning.
Deep Learning Bing-Chen Tsai 1/21.
CIAR Second Summer School Tutorial Lecture 2a Learning a Deep Belief Net Geoffrey Hinton.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
Advanced topics.
Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.
Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.
Supervised Learning Recap
An Introduction to Variational Methods for Graphical Models.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Computer vision: models, learning and inference
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
CIAR Summer School Tutorial Lecture 2b Learning a Deep Belief Net
Deep Learning.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Deep Belief Networks for Spam Filtering
Visual Recognition Tutorial
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.
Deep Boltzman machines Paper by : R. Salakhutdinov, G. Hinton Presenter : Roozbeh Gholizadeh.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero
CIAR Second Summer School Tutorial Lecture 2b Autoencoders & Modeling time series with Boltzmann machines Geoffrey Hinton.
How to do backpropagation in a brain
Deep Boltzmann Machines
Boltzmann Machines and their Extensions S. M. Ali Eslami Nicolas Heess John Winn March 2013 Heriott-Watt University.
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Markov Random Fields Probabilistic Models for Images
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Lecture 2: Statistical learning primer for biologists
Machine Learning 5. Parametric Methods.
Cognitive models for emotion recognition: Big Data and Deep Learning
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Deep learning Tsai bing-chen 10/22.
CSC2515 Fall 2008 Introduction to Machine Learning Lecture 8 Deep Belief Nets All lecture slides will be available as.ppt,.ps, &.htm at
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Learning Primer Swadhin Pradhan Reading Group Presentation 03/30/2016, UT Austin.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Deep Feedforward Networks
Energy models and Deep Belief Networks
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Restricted Boltzmann Machines for Classification
CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Multimodal Learning with Deep Boltzmann Machines
Deep Learning Qing LU, Siyuan CAO.
Structure learning with deep autoencoders
Department of Electrical and Computer Engineering
Deep Architectures for Artificial Intelligence
An Introduction to Variational Methods for Graphical Models
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
[Figure taken from googleblog
Regulation Analysis using Restricted Boltzmann Machines
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Expectation-Maximization & Belief Propagation
CSC 2535: Computation in Neural Networks Lecture 9 Learning Multiple Layers of Features Greedily Geoffrey Hinton.
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Learning Deep Generative Models by Ruslan Salakhutdinov Frank Jiang COS 598E: Unsupervised Learning

Motivation Deep, non-linear models can capture powerful, complex representations of the data. Non-linear distributed representations (vs mixture models) However, they are hard to optimize due to non-convex loss functions with many poor local minima. Small quantity of labelled data compared to large unlabelled dataesets. Want to learn structure from unlabelled data for use in discriminative tasks.

Solution In 2006, Hinton et al. introduced a fast, greedy learning algorithm for deep generative models called Deep Belief Networks. Based on the idea of stacking Restricted Boltzmann Machines. In 2009, Salakhutdinov and Hinton introduced Deep Boltzmann Machines, which extend DBNs.

Energy-Based Models Modelling a probability distribution through an energy function: The lower the energy, the higher the probability. Want to optimize the energy manifold to create troughs around actual data, and peaks elsewhere.

Restricted Boltzmann Machines Markov random field with two-layer architecture: hidden and visible. Each layer consists of stochastic binary variables. Edges exist from every hidden node to every visible node, but not within a layer.

Restricted Boltzmann Machines RBMs are defined by the following energy function: The joint distribution over the hidden and visible layers is:

Restricted Boltzmann Machines The marginal distribution over the visible layer is: Due to the bipartite structure of the RBM, the hidden variables can be explicitly marginalized out:

RBM – Conditional Distributions These equations give us these important and very intuitive conditional distributions:

RBM - Learning Procedure We want to maximize the probability of our data given this probability model, so we perform maximum likelihood estimation on the marginal P(v). Given a set of observations {vn}N, the derivatives of the log-likelihood with respect to the model parameters are:

RBM – Learning Procedure Unfortunately, computation of the model expectation requires time exponential in the number of variables. In 2002, Hinton came up with an algorithm called Contrastive Divergence, which approximates the model expectation with a sample from a Gibbs chain initialized from the data and run for T steps. In practice, setting T = 1 has yielded good results. Contrastive Divergence can be implemented using either expectations or sampling.

Inference Inference uses the same alternating Gibbs sampling to generate samples of either the visible data or hidden variables. Running the Gibbs sampler for one step with fixed visible variables yields the hidden variables corresponding to that visible data. Running the Gibbs sampler for one step with fixed hidden variables yields a representation of what that set of hidden variables encodes. Running the Gibbs sampler for a long time with random initial data yields a random sample from the joint distribution of the RBM.

Gaussian-Bernoulli RBMs Extension of RBMs to model real-valued data. The conditional distributions are: Note that the conditional distribution of the visible data given the hidden variables is a Gaussian with mean shifted by a weighted sum of the hidden variables.

Gaussian-Bernoulli RBMs

Replicated Softmax RBMs

Deep Belief Networks A DBN is a probabilistic generative model with an RBM in the top two layers and a directed sigmoid belief network in the lower layers.

Deep Belief Networks The joint distribution over the DBN can be decomposed into the top RBM and the directed sigmoid belief network: Directed sigmoid belief network: Top RBM:

Deep Belief Networks If we set W(2) = W(1)T, then the joint distribution over the visible and first hidden layer of the DBN is actually equal to the joint distribution of the RBM with weights W(1). Note that the marginal P(h(1)) is the same for the RBM and the DBN, as is the conditional P(v | h(1)).

DBNs – Learning Procedure Greedy layer-by-layer training approach where each pair of layers is trained as an RBM

DBNs – Learning Procedure The variational lower bound of the log-likelihood of the visible data is: We set the approximating distribution Q to the corresponding conditional from the RBM: Hence to maximize this bound with fixed W(1), we maximize , which is equivalent to maximum likelihood training of the second-layer RBM.

DBN – Learning Procedure The variational bound is initially tight when training the second hidden layer, and so improving it strictly improves the log-likelihood of the data. However for deeper layers, it is not initially tight and so there is no guarantee, but in practice it does work.

DBN – Inference

DBN - Inference

Deep Boltzmann Machines DBMs are a more general form of DBNs where all of the edges are undirected.

Discriminative DBNs and DBMs DBNs/DBMs can easily be applied to discriminative tasks after pre-training on unlabelled data. The stochastic binary variables of the DBN/DBM are converted to deterministic, real-valued variables. A top layer is added to the network, which transforms the hidden variables into the desired output. The standard backpropagation algorithm is then used to optimize the weights of this multilayer network. For DBMs, an approximating distribution Q(h(2) | v) is used to create an augmented input for the network, in addition to the actual input.

Deep Boltzmann Machines

Experimental Results (with DBMs) MNIST: 0.95% error rate on full MNIST test set with 2-layer RBM, compared to 1.4% by SVMs, 1.2% achieved by DBN.

Experimental Results (with DBMs) NORB: 10.8% error on full test set, compared to 11.6% by SVM, 22.5% by logistic regression, 18.4% by k-nearest-neighbours.