Variational Bayesian Inference

Variational Bayesian Inference
For when you look at Gibbs sampling and think “Well, that’s just too easy”.

What is it for? Observed variables X Latent variables Z
We know P(X, Z) We want P(Z | X)

What is it for? Observed variables X Latent variables Z
We know P(X, Z) We want P(Z | X) Observed variables X Models M1, M2 with parameters θ1, θ2 We know P(X | Mi, θi) We want P(X | Mi)

The basic idea Approximate P(Z | X) using some simpler distribution Q(Z) Find Q*(Z) which minimizes some distance function d(Q; P) using calculus of variations (hence variational Bayes)

The basic idea Minimize Kullback-Leibler divergence of P from Q:
𝐷 𝐾𝐿 𝑄||𝑃 = 𝐸 𝑄 log𝑄 𝑍 −log𝑃 𝑍|𝑋 𝐷 𝐾𝐿 𝑄||𝑃 = 𝑍 𝑄 𝑍 log𝑄 𝑍 −log𝑃 𝑍|𝑋 𝑑𝑍 𝐷 𝐾𝐿 𝑄||𝑃 = 𝑍 𝑄 𝑍 log𝑄 𝑍 −log𝑃 𝑍,𝑋 𝑑𝑍+log𝑃 𝑋 Re-write KL divergence using P(Z|X) = P(Z,X) / P(X)

The key trick Assume Q factorizes over some partition of Z:
𝑄 𝐙 = 𝑖 𝑞 𝑖 𝐙 𝐢 Then by applying the calculus of variations*, we find: Re-write KL divergence using P(Z|X) = P(Z,X) / P(X) log 𝑞 𝑖 ∗ 𝐙 𝐢 = 𝐸 𝑖≠𝑗 log𝑃 𝐙,𝐗 +𝐶 *left as a simple exercise for the viewer

Iterate, iterate, iterate.
log 𝑞 𝑖 ∗ 𝐙 𝐢 = 𝐸 𝑖≠𝑗 log𝑃 𝐙,𝐗 +𝐶 Optimum qi*(Zi) are usually functions of fixed hyperparameters and moments of other latent variables not in Zi. Iterate, using parameters from previous iterations to determine moments of latent variables. Re-write KL divergence using P(Z|X) = P(Z,X) / P(X)

Beware! Q(Z) approximates P(Z | X), but qi(Zi) does not necessarily approximate P(Zi | X) Variational Bayes approximates joint posterior, not marginals Re-write KL divergence using P(Z|X) = P(Z,X) / P(X) Fox, Charles W., and Stephen J. Roberts. "A tutorial on variational Bayesian inference." Artificial intelligence review 38.2 (2012):

Beware! Re-write KL divergence using P(Z|X) = P(Z,X) / P(X)
Fox, Charles W., and Stephen J. Roberts. "A tutorial on variational Bayesian inference." Artificial intelligence review 38.2 (2012):

A Simple Example Re-write KL divergence using P(Z|X) = P(Z,X) / P(X)
μ~𝑁 μ 0, λ 0 τ −1 τ~𝐺𝑎𝑚𝑚𝑎 𝑎 0, 𝑏 0 𝑥 𝑖 ~𝑁 μ, τ −1 for𝑖=1,...,𝑁 Re-write KL divergence using P(Z|X) = P(Z,X) / P(X)

A Not-So-Simple Example
Re-write KL divergence using P(Z|X) = P(Z,X) / P(X)

Variational message passing
No need for all that maths if every hidden variable: Has an exponential distribution (when conditioned on its parents) Is conjugate with respect to the distributions over these parent variables P(X | Y) has the same form as P(W | X) w.r.t. X In this case parents send children the expectation of their sufficient statistic and children send parents their natural parameter Re-write KL divergence using P(Z|X) = P(Z,X) / P(X) Winn, John, and Christopher M. Bishop. "Variational message passing." Journal of Machine Learning Research 6.Apr (2005):

Variational Bayesian Inference

Similar presentations

Presentation on theme: "Variational Bayesian Inference"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Variational Bayesian Inference

Similar presentations

Presentation on theme: "Variational Bayesian Inference"— Presentation transcript:

Similar presentations

About project

Feedback