Restricted Boltzmann Machines for Classification Jin Mao Postdoc, School of Information, University of Arizona Nov. 3rd, 2016
Outline Restricted Boltzmann Machines RBM For Classification Semi-supervised Learning Experiments
Restricted Boltzmann Machines General RBM Boltzmann Machines (BMs) are a particular form of log-linear Markov Random Field (MRF), i.e., the energy function is linear in its parameters. Restrict: no visible unit is connected to any other visible unit and no hidden unit is connected to any other hidden unit.
Restricted Boltzmann Machines General RBM The energy function for x: The data negative log-likelihood gradient: log-linear Z
Restricted Boltzmann Machines Learning Weights Set the states of the visible units according to a training example Next, update the states of the hidden units For each hidden unit, compute its activation energy For each edge, compute positive weight for each edge Positive(e_ij)
Restricted Boltzmann Machines Learning Weights Now reconstruct the visible units Update the weight of each edge For each visible unit, compute its activation energy For each edge, compute negative weight for each edge Negative(e_ij) measures the association that the network itself generates (or “daydreams” about) when no units are fixed to training data.
Restricted Boltzmann Machines Learning Weights Test convergence by KL-Divergence
Restricted Boltzmann Machines Application The power lies in the hidden layer In classification problems as feature extractors (Gehler et al., 2006) as a good initial training phase for deep neural network classifiers (Hinton,2007).
RBM For Classification Generative Model Objective: Energy Function: Gradient: Activation Functions:
RBM For Classification Discriminative Model Objective: Energy Function: Gradient:
RBM For Classification Hybrid Model Objective:
Semi-supervised Learning There are few labeled training data but many unlabeled examples of inputs. Semi-supervised learning algorithms (Chapelle et al., 2006) address this situation by using the unlabeled data to introduce constraints on the trained model. In the RBM framework, a natural constraint is the model should be a good generative model of the unlabeled data Unsupervised objectives:
Semi-supervised Learning Combine unsupervised objective with supervised objective
Experiments Model Selection Experiments on two classification problems: character recognition and text classification. In all experiments, we performed model selection on a validation set before testing. For the different RBM models, model selection consisted in finding good values for the learning rate, the size of the hidden layer n and good weights for the different types of learning (generative and semi-supervised weights).
Experiments Model Selection Document Classification Character Recognition Document Classification
Reference Other materials This presentation is from: Larochelle, H., & Bengio, Y. (2008, July). Classification using discriminative restricted Boltzmann machines. In Proceedings of the 25th international conference on Machine learning (pp. 536-543). ACM. Other materials https://deeplearning4j.org/restrictedboltzmannmachine http://deeplearning.net/tutorial/rbm.html http://blog.echen.me/2011/07/18/introduction-to-restricted-boltzmann-machines/ Hinton, G. (2010). A practical guide to training restricted Boltzmann machines. Momentum, 9(1), 926.
Thank you!