Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/Contents 11.0 Introduction 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs Sampling 11.4 Slice Sampling 11.5 The Hybrid Monte Carlo Algorithm 11.6 Estimating the Partition Function

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 0. Introduction The problem: finding the expectation of some function f(z) w.r.t. a prob. dist. p(z). Can be approximated by sampling independent points from the distribution p and summation.

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1. Basic Sampling Algorithms Transformation method  Use a uniform generator and transform the output Can we get h -1 always?

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Rejection sampling  Assumptions  Sampling directly from target distribution p(z) is difficult.  Estimating p(z) is easy for any value of z. How to choose q?

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/  Adaptive rejection sampling  Construct q on the fly based on the measured values of p.  If a sample is rejected, it is added to the set of grid points and the q get refined.  Exponential decrease of acceptance rate with dimensionality

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Importance sampling  Directly approximate the expectation without sampling.  Motivation  The expectation can be approximated by finite summation.  But, the number of summation increases exponentially with dimensionality.  Not all regions of z space have significant p value.

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/  Sample from approximate dist. q weighted by p Depends on the choice of q Can produce error with no diagnostic indication Depends on the choice of q Can produce error with no diagnostic indication

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Sampling-importance-resampling  It is difficult to set k in rejection sampling 1.Sampling from q. 2.Set weight on each sample as in importance sampling. 3.Resample from the samples.  Final samples approximate p as the sample size increases. Depends on the choice of q Can get momentum at this step.

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Sampling and the EM algorithm  Sampling can be used to approximate the E step in EM algorithm: Monte Carlo EM algorithm.  IP algorithm 1. (I-Step, Imputation step, ~ E-Step) Sample from the joint posterior. 2. (P-Step, Posterior step, ~ M-Step) Compute a revised estimate of the posterior using samples from I-Step. 1. 2.

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 2. Markov Chain Monte Carlo Allows sampling from a large class of distribution. Scales well with the dimensionality of the sample space. Basic Metropolis Algorithm  Maintain a record of state z (t)  Next state is sampled from q(z|z (t) ) (q must be symmetric).  Candidate state from q is accepted with prob.  If rejected, current state is added to the record and becomes the next state.  Dist. of z tends to p in the infinity.  The original sequence is autocorrelated and get every M th sample to get independent samples. For large M, the retained samples will be independent.

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Random walk behavior  After t steps, the average distance covered by a random walk is proportional to the square root of t.  Very inefficient in exploring the state space.  To avoid this behavior is essential to MCMC.

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Metropolis-Hastings algorithm  Generalization of Metropolis algorithm  q can be non-symmetric.  Accept prob.  P defined by Metropolis-Hastings algorithm is a invariant distribution.  The common choice for q is Gaussian  Step size vs. convergence time

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 3. Gibbs Sampling Simple and widely applicable Special case of Metropolis-Hastings algorithm.  Each step replaces the value of one of the variables by a value drawn from the dist. of that variable conditioned on the values of the remaining variables. The procedure 1.Initialize z i 2.For t=1,…,T  Sample

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1. p is an invariant of each of Gibbs sampling steps and whole Markov chain.  At each step, the marginal dist. p(z \i ) is invariant.  Each step correctly samples from the cond. dist. p(z i |z \i ) 2. The Markov chain defined is ergodic.  The cond. dist. must be non-zero.  The Gibbs sampling correctly samples from p. Gibbs sampling as an instance of Metropolis-Hastings algorithm.  A step involving z k in which z \k remain fixed.  Transition prob. q k (z * |z) = p(z * k |z \k )

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Random walk behavior  The number of steps needed to obtain independent samples is of order (L/l) 2.  Over-relaxation The practical applicability depends on the ease of sampling from the conditional distributions.

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 5. The Hybrid Monte Carlo Algorithm Hamiltonian dynamics  Joint distribution over phase space (z, r) with total energy as Hamiltonian H.  H is invariant : replace r by drawing from its conditional probability on z. Hamiltonian dynamics + Metropolis algorithm  Updates the momentum by Markov chain.  Hamiltonian dynamical update by leapfrog algorithm.  Accept new state by min(1, exp{H(z, r) – H(z *, r * )})

(C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 6. Estimating the Partition Function Knowing the normalization constant in density function. Estimating ratio of partition functions  Model comparison or model averaging  Importance sampling with energy function G. Finding absolute value of the partition function for complex distribution: chaining.

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.

Similar presentations

Presentation on theme: "Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.

Similar presentations

Presentation on theme: "Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National."— Presentation transcript:

Similar presentations

About project

Feedback