Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint:

Similar presentations


Presentation on theme: "Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint:"— Presentation transcript:

1 Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint: Introduce yourself Say you've mainly worked on neural nets, since a few months more on Bayes nets. Using Theano for evaluating and differentiating the joint log-pdf Diederik P. (Durk) Kingma Ph.D. Candidate

2 Bayesian networks Intro: Bayes nets with continuous latent vars
Example: generative model of MNIST Inference/learning with EM and HMC Fast, except in deep/recurrent bayes nets Trick: Auxiliary form More efficient inference/learning in an auxiliary latent space Why so interesting?

3 Bayesian networks (with continuous latent variables)
Joint p.d.f. = product of prior/conditional p.d.f.'s We can use conditional p.d.f.'s and any DAG graph structure => Natural way to use prior knowledge Countless ML models are actually Bayes nets: Generative models (PCA, ICA), probabilistic classification/regression models (neural nets, etc.), LDA, nonlinear dynamical systems, etc Inference fast in special cases but slow in the general case Personally I like Bayes nets because: Prior knowledge: Very important for state-of-the-art performance in any problem, including big data problems. Flexible framework: Possible to model a broad range of problems. Possible to cast many existing regression/classification problems as instances of Bayes nets.

4 Bayesian networks with cont. latent variables
Reasonably fast algorithm to train them all?

5 Inference in Bayes nets (with continuous latent variables)
Joint p.d.f.: We' re going to use Hybrid Monte Carlo (HMC) A method for sampling z|x using first-order gradients of log p(z|x) Gradients of p(z|x) are easy: \begin{align} p(\bx, \bz) &= \prod_j \pT(\bx_j, \bpa_j) \prod_j \pT(\bz_j, \bpa_j) \\ \log p(\bx, \bz) &= \sum_j \log \pT(\bx_j, \bpa_j) + \sum_j \log \pT(\bz_j, \bpa_j) \end{align}

6 Learning in any Bayes net with continuous variables
MCMC-EM E-step: generate samples with gradient-based sampler, e.g. HMC M-step: maximize w.r.t. : with gradient descent, e.g. Adagrad

7 Basis functions 2-D Latent space projected to observed space (transformed to unit square using inverse CDF of gaussian) z ~ p(z|x)

8 Example: simple generative MNIST model
z x

9 Problem: HMC mixes slowly for deep or recurrent Bayes nets

10 Auxiliary form: equivalent neural net with injected noise

11 Auxiliary form

12 Example: Gaussian conditional with nonlinear dependency of mean

13 Experiment: DBN

14 Experiment: DBN Original form Auxiliary form Terribly slow mixing
Samples Autocorrelation Samples Autocorrelation Terribly slow mixing Fast mixing

15 Extra: EM-free Learning
In Auxiliary form: all latent random variables are root nodes MC Likelihood: Only works if all z's are root nodes of the graph and p(z) is not affected by 'w' Dropout: MC likelihood with L=1

16 Conclusion: auxiliary form
Fast mixing => Fast inference and learning Very broadly applicable (e.g. inverse CDF method) Link at: Next step: applications!


Download ppt "Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint:"

Similar presentations


Ads by Google