Omiros Papaspiliopoulos and Gareth O. Roberts Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models Omiros Papaspiliopoulos and Gareth O. Roberts Presented by Yuting Qi ECE Dept., Duke Univ. 10/06/06
Overview DP hierarchical models Two Gibbs samplers Polya urn (Escobar, west, 94, 95) Blocked Gibbs sampler (Ishwaran 00) Retrospective sampling MCMC for DP with Retrospective sampling Performance Conclusions
DP Mixture Models (1) DP Mixture models (DMMs) Assume Yi is drawn from a parametric distribution with parameters Xi and . All Xi have one common prior P, some Xi may take same value. Prior distribution P ~ DP( , H). Property of Pólya urn scheme Marginalize P,
DP Mixture Models (2) Explicit form of DP (Sethuraman 94) Relationship is large => Vj ~ Beta(1, ) is small => small pj, many sticks of short length => P consists of infinite number of Zj with small pj => P->H . ->0 => Vj ~ Beta(1, ) is large => few large sticks => P has a large mass on a small subset of Zj => most Xi share the same value. Vj ~ Beta(1, ) Zj ~ H
West’s Gibbs Sampler (1) Estimation of joint posterior Sample from full conditional distributions H,0 , posterior by updating prior H via the likelihood function, DP prior, Pólya urn likelihood
West’s Gibbs Sampler (2) Gibbs sampling scheme: Sample Xi , is equivalent to sample indicator Ki (Ki= k, Xi takes on value Xk*) for Xi , given old {Ki}i=1,n , {Xk*}k=1,K*, generate new Ki from posteriors. For Ki=0, draw new Xi from For Ki>0, generate a new set of Xk* according to posteriors Drawbacks: Converges slowly; Difficult to implement when H and likelihood are not conjugate.
Blocked Gibbs Sampler (1) Stick-breaking representation Estimation of joint posterior Update P in each Gibbs iteration No Xi involved Must truncate at a finite level K
Blocked Gibbs Sampler (2) Sampling scheme: Sample Zj: For those j occupied by Xi, sample Zj from conditional posterior For those j not occupied by any Xi, sample Zj from base prior H Sample K from its conditional posteriors: Sample p from its conditional posteriors: pk,j is posterior of pk updated by likelihood Vk* is posterior of Vk updated by likelihood
Retrospective Sampling (1) In Blocked Gibbs sampler, given pj, sample Ki, and set Xi= Zki , infinite sampled pairs (pj, Zj) is not feasible. To sample Ki, first generate Ui from uniform [0, 1], then set Ki=j iif Retrospective sampling exchanges order of sampling Ui and sampling the pairs (pj, Zj) . If for a given Ui , more pj needed than we currently have, simulate pairs (pj, Zj) retrospectively, until is satisfied.
Retrospective Sampling (2) Algorithm
MCMC for DDM (1) MCMC with retrospective sampling Notation:
MCMC for DDM (2) Sampling Scheme: Sample Zj: Sample p from its conditional posteriors: Sample K from retrospective sampling.
MCMC for DDM (3) Sampling K Using Metropolis-Hasting steps. When update Ki, the sampler proposes to move from k to k(i,j), The distribution for generating the proposed j is Mi is a constant controls the probability of proposing j greater than max{k}.
MCMC for DDM (4) Algorithm
Performance “lepto” data set (unimodal): 0.67N(0,1)+0.33N(0.3,0.25^2) “bimod” data set (bimodal): 0.5N(-1, 0.5^2)+0.5N(1, 0.5^2) Autocorrelation time: a standard way to measure the speed of convergence How well the algorithm explores the high-dimensional model space
Performance
Conclusions & Comments Retrospective methodology is applied to DP, avoiding approximation. Robust to large dataset. Comments: One of the most wordy and worst-organized paper I’ve read.