Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.


Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013

Outline Probability distributions Maximum likelihood estimation Maximum a posteriori estimation Conjugate priors Conceptualizing models as collection of priors Noninformative priors Empirical Bayes

Probability distribution Density estimation – to model distribution p(x) of a random variable x given a finite set of observations x 1, …, x N. Nonparametric approachParametric approach Histogram Kernel density estimation Nearest neighbor approach Gaussian distribution Beta distribution …

The Exponential Family Gaussian distribution Binomial distribution Beta distribution etc…

Gaussian distribution Central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed Bean machine by Sir Francis Galton

Maximum likelihood estimation The frequentist approach to estimate parameters of the distribution given a set of observations is to maximize likelihood. – data are i.i.d – monotonic transformation

MLE for Gaussian distribution – simple average

Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information.

MAP for Gaussian distribution Posterior distribution is given by – weighted average

Conjugate prior In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. For any member of the exponential family, there exists a conjugate prior that can be written in the form Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma

MLE for Binomial distribution Binomial distribution models the probability of m “heads” out of N tosses. The only parameter of the distribution μ encodes probability of a single event (“head”) Maximum likelihood estimation is given by

MAP for Binomial distribution The conjugate prior for this distribution is Beta The posterior is then given by where l = N – m, simply the number of “tails”.

Models as collection of priors - 1 Take a simple regression model Add a prior on weights And get Bayesian linear regression!

Models as collection of priors - 2 Take again a simple regression model Add a prior on function And get Gaussian processes ! ynyn β ynyn β K Where y n is some function of x n

Models as collection of priors - 3 Take a model where x n is discrete and unknown Add a prior on states ( x n ), assuming they are temporarily smooth And get Hidden Markov Model! θ x1x1 x2x2 x n-1 xnxn x n+1 t1t1 tntn t2t2 t n-1 t n+1

Noninformative priors Sometimes we have no strong prior belief but still want to apply Bayesian inference. Then we need noninformative priors. If our parameter λ is a discrete variable with K states then we can simply set each prior probability to 1/K. However for continues variables it is not so clear. One example of a noninformative prior could be a noninformative prior over μ for Gaussian distribution: with We can see that the effect of the prior on the posterior over μ is vanished in this case.

Empirical Bayes But what if still want to assume some prior information but want to learn it from the data instead of assuming in advance? Imagine the following model We cannot use full Bayesian inference but we can approximate it by finding the best λ * to maximize p(X|λ) N θsθs xnxn S λ

We can estimate the result by the following iterative procedure (EM-algorithm): Initialize λ * E-step: M-step: It illustrates the other term for Empirical Bayes – maximum marginal likelihood. This is not fully Bayesian treatment however offers a useful compromise between Bayesian and frequentist approaches. Empirical Bayes Compute p(θ|X, λ) given fixed λ *

Thank you for your attention!