Lecture 15 Sampling
Approximate Inference Motivation comes from EM, where in the E-step we need to do inference which is often intractable. Approximate techniques include: Laplace approx. (Gaussian around the mode), mean field (factorized ansatz), loopy belief propagation – all are biased but are efficient. Sampling is asymptotically unbiased but is typically slow.
Sampling Ancestral sampling: top-down sampling from Bayes’ net. Sampling by transformation: try to find a transformation f(.) such that x = f(y) and y is simple to sample from. Rejection sampling: find a curve f(x) >= p(x) and sample from f(x). Then accept a fraction p(x)/f(x). Many more methods that we don’t have time for. Most methods however break down in high dimensions: curse of dimensionality.
MCMC & Gibbs We give up that all samples have to be independent. Sample a new sample given the previous one according to some proposal distribution S(x’|x). We only accept the new sample x’ with a certain probability A(x,x’). T(x,x’) = S(x’|x) X A(x,x’) is chosen such that the samples eventually come from the desired distribution p(x). p(x) should be the invariant distribution of the kernel T, which is unique of the chain is ergodic (aperiodic & irreducible). Detailed balance is a condition that is sufficient and is easier to satisfy.
MH & Gibbs Metropis Hastings Markov Chain Monte Carlo sampler always accepts of the prob. of x’ is larger and reject with a fraction p(x’)/p(x) (in case of symmetric S). Gibbs sampling is MH method that always accepts and uses the conditional distributions to sample: X, Y ~P(X,Y) iterate X~P(X|Y) & Y~P(Y|X). MCMC has problems if the random variables are highly dependent and if there are multiple modes. In both cases it takes very long to explore all of the space that has significant probability. This is called slow mixing.