Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)

References An introduction to MCMC for Machine Learning Andrieu et al. (Machine Learning 2003) Introduction to Monte Carlo methods David MacKay. Markov Chain Monte Carlo for Computer Vision Zhu, Delleart and Tu. (a tutorial at ICCV05) http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm Various PPTs for MCMC in the web

Contents MCMC Metropolis-Hasting algorithm Mixture and cycles of MCMC kernels Auxiliary variable sampler Adaptive MCMC Other application of MCMC Convergence problem and Trick of MCMC Remained Problems Conclusion

MCMC Problem of MC(Monte Carlo) Assembling the entire distribution for MC is usually hard: Complicated energy landscapes High-dimensional system. Extraordinarily difficult normalization Solution : MCMC Build up distribution from Markov chain Choose local transition probabilities which generate distribution of interest (ensure detailed balance) Each random variable is chosen based on the previous variable in the chain “Walk” along the Markov chain until convergence reached Result : Normalization not required, calculation are local…

MCMC What is Markov Chain? A Markov chain is a mathematical model for stochastic system that generates random variable X 1, X 2, …, X t, where the distribution The distribution of the next random variable depends only on the current random variable. The entire chain represents a stationary probability distribution.

MCMC What is Markov Chain Monte Carlo? MCMC is general purpose technique for generating fair samples from a probability in high-dimensional space, using random numbers (dice) drawn from uniform probability in certain range. Markov chain states Independent trials of dice

MCMC MCMC as a general purpose computing technique Task 1: simulation: draw fair (typical) samples from a probability which governs a system. Task 2: Integration/computing in very high dimensions, i.e. to compute Task 3: Optimization with an annealing scheme Task 4: Learning: un supervised learning with hidden variables (simulated from posterior) or MLE learning of parameters p(x;Θ) needs simulations as well.

MCMC Some notation The stochastic process is called a Markov chain if The chain is homogeneous if remains invariant for all i, with for any i. Chain depends solely on the current state of the chain and a fixed transition matrix.

MCMC Example Transition graph for Markov chain with three state (s=3) Transition matrix For the initial state This stability result plays a fundamental role in MCMC. 1 0.1 0.9 0.4 0.6

MCMC Convergence properties For any starting point, the chain will convergence to the invariant distribution p(x), as long as T is a stochastic transition matrix that obeys the following properties: 1) Irreducibility That is every state must be (eventually) reachable from every other state. 2) Aperiodicity This stops the chain from oscillating between different states 3) Reversibility (detailed balance) condition This holds the system remain its stationary distribution.. discrete continuous Kernal, proposal distribution

MCMC Eigen-analysis From the spectral theory, p(x) is the left eigenvector of the matrix T with corresponding eigenvalue 1. The second largest eigenvalue determines the rate of convergence of the chain, and should be as small as possible. Eigenvalue v 1 always 1 Stationary distribution

Metropolis-Hastings algorithm The MH algorithm The most popular MCMC method Invariant distgribution p(x) Proposal distribution q(x*|x) Candidate value x* Acceptance probability A(x,x*) Kernel K. 1.Initialize. 2.For i=0 to N-1 - Sample - If else

Metropolis-Hastings algorithm Results of running the MH algorithm Target distribution : Proposal distribution

Metropolis-Hastings algorithm Different choices of the proposal standard deviation MH requires careful design of the proposal distribution. If is narrow, only 1 mode of p(x) might be visited. If is too wide, the rejection rate can be high. If all the modes are visited while the acceptance probability is high, the chain is said to “mix” well.

Mixture and cycles of MCMC kernels Mixture and cycle It is possible to combine several samplers into mixture and cycles of individual samplers. If transition kernels K 1, K 2 have invariant distribution, then cycle hybrid kernel and mixture hybrid kernel are also transition kernels.

Mixture and cycles of MCMC kernels Mixtures of kernels Incorporate global proposals to explore vast region of the state space and local proposals to discover finer details of the target distribution. -> target distribution with many narrow peaks (= reversible jump MCMC algorithm)

Mixture and cycles of MCMC kernels Cycles of kernels Split a multivariate state vector into components (block) -> It can be updated separately. -> Blocking highly correlated variables (= Gibbs sampling algorithm)

Auxiliary variable samplers Auxiliary variable It is often easier to sample from an augmented distribution p(x,u), where u is an auxiliary variable. It is possible to obtain marginal samples x by sampling (x, u), and ignoring the sample u. Hybrid Monte Carlo (HMC) Use gradient approximation Slice sampling

Adaptive MCMC Adaptive selection of proposal distribution The variance of proposal distribution is important. To automate the process of choosing the proposal distribution as much as possible. Problem Adaptive MCMC can disturb the stationary distribution. Gelfand and Sahu(1994) Station distribution is disturbed despite the fact that each participating kernel has the same stationary distribution. Avoidance Carry out adaptation only initial fixed number of step. Parallel chains And so on… -> inefficient, much more research is required.

Other application of MCMC Simulated annealing method for global optimization To find global maximum of p(x) Monte Carlo EM To find fast approximation for E-step Sequential Monte Carlo method and particle filters To carry out on-line approximation of probability distributions using samples. ->using parallel sampling

Convergence problem and Trick of MCMC Convergence problem Determining the length of the Markov chain is a difficult task. Trick Initial set problem (for starting biases) Discards an initial set of samples (Burn-in) Set initial sample value manually. Markov chain test Apply several graphical and statistical tests to assess if the chain has stabilized. -> It doesn’t provide entirely satisfactory diagnostics. Study about convergence problem

Remained problems Large dimension model The combination of sampling algorithm with either gradient optimization or exact one. Massive data set A few solution based on importance sampling have been proposed. Many and varied applications… -> But there is still great room for innovation in this area.

Conclusion MCMC The Markov Chain Monte Carlo methods cover a variety of different fields and applications. There are great opportunities for combining existing sub-optimal algorithms with MCMC in many machine learning problems. Some areas are already benefiting from sampling methods include: Tracking, restoration, segmentation Probabilistic graphical models Classification Data association for localization Classical mixture models.

Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)

Similar presentations

Presentation on theme: "Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)

Similar presentations

Presentation on theme: "Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)"— Presentation transcript:

Similar presentations

About project

Feedback