Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)

Slides:



Advertisements
Similar presentations
02/12/ a tutorial on Markov Chain Monte Carlo (MCMC) Dima Damen Maths Club December 2 nd 2008.
Advertisements

Slice Sampling Radford M. Neal The Annals of Statistics (Vol. 31, No. 3, 2003)
Monte Carlo Methods and Statistical Physics
Bayesian Estimation in MARK
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Markov Chains Modified by Longin Jan Latecki
Bayesian Methods with Monte Carlo Markov Chains III
Markov Chains 1.
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Lecture 3: Markov processes, master equation
Image Parsing: Unifying Segmentation and Detection Z. Tu, X. Chen, A.L. Yuille and S-C. Hz ICCV 2003 (Marr Prize) & IJCV 2005 Sanketh Shetty.
Bayesian Reasoning: Markov Chain Monte Carlo
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
BAYESIAN INFERENCE Sampling techniques
Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Today Introduction to MCMC Particle filters and MCMC
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Introduction to Monte Carlo Methods D.J.C. Mackay.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Bayes Factor Based on Han and Carlin (2001, JASA).
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Priors, Normal Models, Computing Posteriors
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Stochastic Monte Carlo methods for non-linear statistical inverse problems Benjamin R. Herman Department of Electrical Engineering City College of New.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
Simulated Annealing.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Exact Inference (Last Class) Variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
Numerical Bayesian Techniques. outline How to evaluate Bayes integrals? Numerical integration Monte Carlo integration Importance sampling Metropolis algorithm.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Sampling based inference and MCMC
Markov Chain Monte Carlo methods --the final project of stat 6213
Advanced Statistical Computing Fall 2016
Jun Liu Department of Statistics Stanford University
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Haim Kaplan and Uri Zwick
Lecture 15 Sampling.
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Markov Networks.
Presentation transcript:

Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)

References An introduction to MCMC for Machine Learning Andrieu et al. (Machine Learning 2003) Introduction to Monte Carlo methods David MacKay. Markov Chain Monte Carlo for Computer Vision Zhu, Delleart and Tu. (a tutorial at ICCV05) Various PPTs for MCMC in the web

Contents MCMC Metropolis-Hasting algorithm Mixture and cycles of MCMC kernels Auxiliary variable sampler Adaptive MCMC Other application of MCMC Convergence problem and Trick of MCMC Remained Problems Conclusion

MCMC Problem of MC(Monte Carlo) Assembling the entire distribution for MC is usually hard: Complicated energy landscapes High-dimensional system. Extraordinarily difficult normalization Solution : MCMC Build up distribution from Markov chain Choose local transition probabilities which generate distribution of interest (ensure detailed balance) Each random variable is chosen based on the previous variable in the chain “Walk” along the Markov chain until convergence reached Result : Normalization not required, calculation are local…

MCMC What is Markov Chain? A Markov chain is a mathematical model for stochastic system that generates random variable X 1, X 2, …, X t, where the distribution The distribution of the next random variable depends only on the current random variable. The entire chain represents a stationary probability distribution.

MCMC What is Markov Chain Monte Carlo? MCMC is general purpose technique for generating fair samples from a probability in high-dimensional space, using random numbers (dice) drawn from uniform probability in certain range. Markov chain states Independent trials of dice

MCMC MCMC as a general purpose computing technique Task 1: simulation: draw fair (typical) samples from a probability which governs a system. Task 2: Integration/computing in very high dimensions, i.e. to compute Task 3: Optimization with an annealing scheme Task 4: Learning: un supervised learning with hidden variables (simulated from posterior) or MLE learning of parameters p(x;Θ) needs simulations as well.

MCMC Some notation The stochastic process is called a Markov chain if The chain is homogeneous if remains invariant for all i, with for any i. Chain depends solely on the current state of the chain and a fixed transition matrix.

MCMC Example Transition graph for Markov chain with three state (s=3) Transition matrix For the initial state This stability result plays a fundamental role in MCMC

MCMC Convergence properties For any starting point, the chain will convergence to the invariant distribution p(x), as long as T is a stochastic transition matrix that obeys the following properties: 1) Irreducibility That is every state must be (eventually) reachable from every other state. 2) Aperiodicity This stops the chain from oscillating between different states 3) Reversibility (detailed balance) condition This holds the system remain its stationary distribution.. discrete continuous Kernal, proposal distribution

MCMC Eigen-analysis From the spectral theory, p(x) is the left eigenvector of the matrix T with corresponding eigenvalue 1. The second largest eigenvalue determines the rate of convergence of the chain, and should be as small as possible. Eigenvalue v 1 always 1 Stationary distribution

Metropolis-Hastings algorithm The MH algorithm The most popular MCMC method Invariant distgribution p(x) Proposal distribution q(x*|x) Candidate value x* Acceptance probability A(x,x*) Kernel K. 1.Initialize. 2.For i=0 to N-1 - Sample - If else

Metropolis-Hastings algorithm Results of running the MH algorithm Target distribution : Proposal distribution

Metropolis-Hastings algorithm Different choices of the proposal standard deviation MH requires careful design of the proposal distribution. If is narrow, only 1 mode of p(x) might be visited. If is too wide, the rejection rate can be high. If all the modes are visited while the acceptance probability is high, the chain is said to “mix” well.

Mixture and cycles of MCMC kernels Mixture and cycle It is possible to combine several samplers into mixture and cycles of individual samplers. If transition kernels K 1, K 2 have invariant distribution, then cycle hybrid kernel and mixture hybrid kernel are also transition kernels.

Mixture and cycles of MCMC kernels Mixtures of kernels Incorporate global proposals to explore vast region of the state space and local proposals to discover finer details of the target distribution. -> target distribution with many narrow peaks (= reversible jump MCMC algorithm)

Mixture and cycles of MCMC kernels Cycles of kernels Split a multivariate state vector into components (block) -> It can be updated separately. -> Blocking highly correlated variables (= Gibbs sampling algorithm)

Auxiliary variable samplers Auxiliary variable It is often easier to sample from an augmented distribution p(x,u), where u is an auxiliary variable. It is possible to obtain marginal samples x by sampling (x, u), and ignoring the sample u. Hybrid Monte Carlo (HMC) Use gradient approximation Slice sampling

Adaptive MCMC Adaptive selection of proposal distribution The variance of proposal distribution is important. To automate the process of choosing the proposal distribution as much as possible. Problem Adaptive MCMC can disturb the stationary distribution. Gelfand and Sahu(1994) Station distribution is disturbed despite the fact that each participating kernel has the same stationary distribution. Avoidance Carry out adaptation only initial fixed number of step. Parallel chains And so on… -> inefficient, much more research is required.

Other application of MCMC Simulated annealing method for global optimization To find global maximum of p(x) Monte Carlo EM To find fast approximation for E-step Sequential Monte Carlo method and particle filters To carry out on-line approximation of probability distributions using samples. ->using parallel sampling

Convergence problem and Trick of MCMC Convergence problem Determining the length of the Markov chain is a difficult task. Trick Initial set problem (for starting biases) Discards an initial set of samples (Burn-in) Set initial sample value manually. Markov chain test Apply several graphical and statistical tests to assess if the chain has stabilized. -> It doesn’t provide entirely satisfactory diagnostics. Study about convergence problem

Remained problems Large dimension model The combination of sampling algorithm with either gradient optimization or exact one. Massive data set A few solution based on importance sampling have been proposed. Many and varied applications… -> But there is still great room for innovation in this area.

Conclusion MCMC The Markov Chain Monte Carlo methods cover a variety of different fields and applications. There are great opportunities for combining existing sub-optimal algorithms with MCMC in many machine learning problems. Some areas are already benefiting from sampling methods include: Tracking, restoration, segmentation Probabilistic graphical models Classification Data association for localization Classical mixture models.