Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint:

Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint: Introduce yourself Say you've mainly worked on neural nets, since a few months more on Bayes nets. Using Theano for evaluating and differentiating the joint log-pdf Diederik P. (Durk) Kingma Ph.D. Candidate

Bayesian networks Intro: Bayes nets with continuous latent vars
Example: generative model of MNIST Inference/learning with EM and HMC Fast, except in deep/recurrent bayes nets Trick: Auxiliary form More efficient inference/learning in an auxiliary latent space Why so interesting?

Bayesian networks (with continuous latent variables)
Joint p.d.f. = product of prior/conditional p.d.f.'s We can use conditional p.d.f.'s and any DAG graph structure => Natural way to use prior knowledge Countless ML models are actually Bayes nets: Generative models (PCA, ICA), probabilistic classification/regression models (neural nets, etc.), LDA, nonlinear dynamical systems, etc Inference fast in special cases but slow in the general case Personally I like Bayes nets because: Prior knowledge: Very important for state-of-the-art performance in any problem, including big data problems. Flexible framework: Possible to model a broad range of problems. Possible to cast many existing regression/classification problems as instances of Bayes nets.

Bayesian networks with cont. latent variables
Reasonably fast algorithm to train them all?

Inference in Bayes nets (with continuous latent variables)
Joint p.d.f.: We' re going to use Hybrid Monte Carlo (HMC) A method for sampling z|x using first-order gradients of log p(z|x) Gradients of p(z|x) are easy: \begin{align} p(\bx, \bz) &= \prod_j \pT(\bx_j, \bpa_j) \prod_j \pT(\bz_j, \bpa_j) \\ \log p(\bx, \bz) &= \sum_j \log \pT(\bx_j, \bpa_j) + \sum_j \log \pT(\bz_j, \bpa_j) \end{align}

Learning in any Bayes net with continuous variables
MCMC-EM E-step: generate samples with gradient-based sampler, e.g. HMC M-step: maximize w.r.t. : with gradient descent, e.g. Adagrad

Basis functions 2-D Latent space projected to observed space (transformed to unit square using inverse CDF of gaussian) z ~ p(z|x)

Example: simple generative MNIST model
z x

Problem: HMC mixes slowly for deep or recurrent Bayes nets

Auxiliary form: equivalent neural net with injected noise

Auxiliary form

Example: Gaussian conditional with nonlinear dependency of mean

Experiment: DBN

Experiment: DBN Original form Auxiliary form Terribly slow mixing
Samples Autocorrelation Samples Autocorrelation Terribly slow mixing Fast mixing

Extra: EM-free Learning
In Auxiliary form: all latent random variables are root nodes MC Likelihood: Only works if all z's are root nodes of the graph and p(z) is not affected by 'w' Dropout: MC likelihood with L=1

Conclusion: auxiliary form
Fast mixing => Fast inference and learning Very broadly applicable (e.g. inverse CDF method) Link at: Next step: applications!

Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint:

Similar presentations

Presentation on theme: "Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint:

Similar presentations

Presentation on theme: "Speeding up Gradient-Based Inference and Learning in deep/recurrent Bayes Nets with Continuous Latent Variables (by a neural network equivalence) Preprint:"— Presentation transcript:

Similar presentations

About project

Feedback