Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.

Similar presentations


Presentation on theme: "Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National."— Presentation transcript:

1 Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/

2 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/Contents 11.0 Introduction 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs Sampling 11.4 Slice Sampling 11.5 The Hybrid Monte Carlo Algorithm 11.6 Estimating the Partition Function

3 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 0. Introduction The problem: finding the expectation of some function f(z) w.r.t. a prob. dist. p(z). Can be approximated by sampling independent points from the distribution p and summation.

4 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1. Basic Sampling Algorithms Transformation method  Use a uniform generator and transform the output Can we get h -1 always?

5 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Rejection sampling  Assumptions  Sampling directly from target distribution p(z) is difficult.  Estimating p(z) is easy for any value of z. How to choose q?

6 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/  Adaptive rejection sampling  Construct q on the fly based on the measured values of p.  If a sample is rejected, it is added to the set of grid points and the q get refined.  Exponential decrease of acceptance rate with dimensionality

7 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Importance sampling  Directly approximate the expectation without sampling.  Motivation  The expectation can be approximated by finite summation.  But, the number of summation increases exponentially with dimensionality.  Not all regions of z space have significant p value.

8 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/  Sample from approximate dist. q weighted by p Depends on the choice of q Can produce error with no diagnostic indication Depends on the choice of q Can produce error with no diagnostic indication

9 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Sampling-importance-resampling  It is difficult to set k in rejection sampling 1.Sampling from q. 2.Set weight on each sample as in importance sampling. 3.Resample from the samples.  Final samples approximate p as the sample size increases. Depends on the choice of q Can get momentum at this step.

10 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Sampling and the EM algorithm  Sampling can be used to approximate the E step in EM algorithm: Monte Carlo EM algorithm.  IP algorithm 1. (I-Step, Imputation step, ~ E-Step) Sample from the joint posterior. 2. (P-Step, Posterior step, ~ M-Step) Compute a revised estimate of the posterior using samples from I-Step. 1. 2.

11 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 2. Markov Chain Monte Carlo Allows sampling from a large class of distribution. Scales well with the dimensionality of the sample space. Basic Metropolis Algorithm  Maintain a record of state z (t)  Next state is sampled from q(z|z (t) ) (q must be symmetric).  Candidate state from q is accepted with prob.  If rejected, current state is added to the record and becomes the next state.  Dist. of z tends to p in the infinity.  The original sequence is autocorrelated and get every M th sample to get independent samples. For large M, the retained samples will be independent.

12 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Random walk behavior  After t steps, the average distance covered by a random walk is proportional to the square root of t.  Very inefficient in exploring the state space.  To avoid this behavior is essential to MCMC.

13 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Markov chains  Homogeneous  Invariant distribution  Detailed balance  Ergodicity  Equilibrium distribution

14 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Metropolis-Hastings algorithm  Generalization of Metropolis algorithm  q can be non-symmetric.  Accept prob.  P defined by Metropolis-Hastings algorithm is a invariant distribution.  The common choice for q is Gaussian  Step size vs. convergence time

15 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 3. Gibbs Sampling Simple and widely applicable Special case of Metropolis-Hastings algorithm.  Each step replaces the value of one of the variables by a value drawn from the dist. of that variable conditioned on the values of the remaining variables. The procedure 1.Initialize z i 2.For t=1,…,T  Sample

16 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 1. p is an invariant of each of Gibbs sampling steps and whole Markov chain.  At each step, the marginal dist. p(z \i ) is invariant.  Each step correctly samples from the cond. dist. p(z i |z \i ) 2. The Markov chain defined is ergodic.  The cond. dist. must be non-zero.  The Gibbs sampling correctly samples from p. Gibbs sampling as an instance of Metropolis-Hastings algorithm.  A step involving z k in which z \k remain fixed.  Transition prob. q k (z * |z) = p(z * k |z \k )

17 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ Random walk behavior  The number of steps needed to obtain independent samples is of order (L/l) 2.  Over-relaxation The practical applicability depends on the ease of sampling from the conditional distributions.

18 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 4. Slice Sampling Adaptive step size automatically adjusted to match the characteristics of the distribution.

19 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 5. The Hybrid Monte Carlo Algorithm Hamiltonian dynamics  Joint distribution over phase space (z, r) with total energy as Hamiltonian H.  H is invariant : replace r by drawing from its conditional probability on z. Hamiltonian dynamics + Metropolis algorithm  Updates the momentum by Markov chain.  Hamiltonian dynamical update by leapfrog algorithm.  Accept new state by min(1, exp{H(z, r) – H(z *, r * )})

20 (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/http://bi.snu.ac.kr/ 6. Estimating the Partition Function Knowing the normalization constant in density function. Estimating ratio of partition functions  Model comparison or model averaging  Importance sampling with energy function G. Finding absolute value of the partition function for complex distribution: chaining.


Download ppt "Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National."

Similar presentations


Ads by Google