Download presentation
Presentation is loading. Please wait.
Published byRandall Nicholson Modified over 6 years ago
1
Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint: Introduce yourself Say you've mainly worked on neural nets, since a few months more on Bayes nets. Using Theano for evaluating and differentiating the joint log-pdf Diederik P. (Durk) Kingma Ph.D. Candidate
2
Bayesian networks Intro: Bayes nets with continuous latent vars
Example: generative model of MNIST Inference/learning with EM and HMC Fast, except in deep/recurrent bayes nets Trick: Auxiliary form More efficient inference/learning in an auxiliary latent space Why so interesting?
3
Bayesian networks (with continuous latent variables)
Joint p.d.f. = product of prior/conditional p.d.f.'s We can use conditional p.d.f.'s and any DAG graph structure => Natural way to use prior knowledge Countless ML models are actually Bayes nets: Generative models (PCA, ICA), probabilistic classification/regression models (neural nets, etc.), LDA, nonlinear dynamical systems, etc Inference fast in special cases but slow in the general case Personally I like Bayes nets because: Prior knowledge: Very important for state-of-the-art performance in any problem, including big data problems. Flexible framework: Possible to model a broad range of problems. Possible to cast many existing regression/classification problems as instances of Bayes nets.
4
Bayesian networks with cont. latent variables
Reasonably fast algorithm to train them all?
5
Inference in Bayes nets (with continuous latent variables)
Joint p.d.f.: We' re going to use Hybrid Monte Carlo (HMC) A method for sampling z|x using first-order gradients of log p(z|x) Gradients of p(z|x) are easy: \begin{align} p(\bx, \bz) &= \prod_j \pT(\bx_j, \bpa_j) \prod_j \pT(\bz_j, \bpa_j) \\ \log p(\bx, \bz) &= \sum_j \log \pT(\bx_j, \bpa_j) + \sum_j \log \pT(\bz_j, \bpa_j) \end{align}
6
Learning in any Bayes net with continuous variables
MCMC-EM E-step: generate samples with gradient-based sampler, e.g. HMC M-step: maximize w.r.t. : with gradient descent, e.g. Adagrad
7
Basis functions 2-D Latent space projected to observed space (transformed to unit square using inverse CDF of gaussian) z ~ p(z|x)
8
Example: simple generative MNIST model
z x
9
Problem: HMC mixes slowly for deep or recurrent Bayes nets
10
Auxiliary form: equivalent neural net with injected noise
11
Auxiliary form
12
Example: Gaussian conditional with nonlinear dependency of mean
13
Experiment: DBN
14
Experiment: DBN Original form Auxiliary form Terribly slow mixing
Samples Autocorrelation Samples Autocorrelation Terribly slow mixing Fast mixing
15
Extra: EM-free Learning
In Auxiliary form: all latent random variables are root nodes MC Likelihood: Only works if all z's are root nodes of the graph and p(z) is not affected by 'w' Dropout: MC likelihood with L=1
16
Conclusion: auxiliary form
Fast mixing => Fast inference and learning Very broadly applicable (e.g. inverse CDF method) Link at: Next step: applications!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.