Omiros Papaspiliopoulos and Gareth O. Roberts

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Emulation of a Stochastic Forest Simulator Using Kernel Stick-Breaking Processes (Work in Progress) James L. Crooks (SAMSI, Duke University)

Generalised linear mixed models in WinBUGS

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei

Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.

Course: Neural Networks, Instructor: Professor L.Behera.

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.

Hierarchical Dirichlet Processes

Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.

Nonparametric hidden Markov models Jurgen Van Gael and Zoubin Ghahramani.

Bayesian Estimation in MARK

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.

Gibbs Sampling Qianji Zheng Oct. 5th, 2010.

Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Normative models of human inductive inference Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.

Bayes Factor Based on Han and Carlin (2001, JASA).

Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.

Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.

High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.

An Efficient Sequential Design for Sensitivity Experiments Yubin Tian School of Science, Beijing Institute of Technology.

Bayesian Parametric and Semi- Parametric Hierarchical models: An application to Disinfection By-Products and Spontaneous Abortion: Rich MacLehose November.

Variational Inference for the Indian Buffet Process

1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

Reducing MCMC Computational Cost With a Two Layered Bayesian Approach

Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.

Generalized Spatial Dirichlet Process Models Jason A. Duan Michele Guindani Alan E. Gelfand March, 2006.

Bayesian Density Regression Author: David B. Dunson and Natesh Pillai Presenter: Ya Xue April 28, 2006.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.

How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.

Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

Markov Chain Monte Carlo in R

Fast search for Dirichlet process mixture models

Introduction to Sampling based inference and MCMC

Bayesian Semi-Parametric Multiple Shrinkage

MCMC Output & Metropolis-Hastings Algorithm Part I

Bayesian Generalized Product Partition Model

Advanced Statistical Computing Fall 2016

Gibbs sampling.

Non-Parametric Models

Jun Liu Department of Statistics Stanford University

Markov chain monte carlo

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

CAP 5636 – Advanced Artificial Intelligence

Kernel Stick-Breaking Process

Predictive distributions

Collapsed Variational Dirichlet Process Mixture Models

Exact and Approximate Sum Representations for the Dirichlet Process

Multitask Learning Using Dirichlet Process

Chinese Restaurant Representation Stick-Breaking Construction

CS 188: Artificial Intelligence

Markov Chain Monte Carlo

Rong Zhang, Brett Inder, Xibin Zhang

Robust Full Bayesian Learning for Neural Networks

Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics

Markov Networks.

Outline Texture modeling - continued Markov Random Field models

Presentation transcript:

Omiros Papaspiliopoulos and Gareth O. Roberts Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models Omiros Papaspiliopoulos and Gareth O. Roberts Presented by Yuting Qi ECE Dept., Duke Univ. 10/06/06

Overview DP hierarchical models Two Gibbs samplers Polya urn (Escobar, west, 94, 95) Blocked Gibbs sampler (Ishwaran 00) Retrospective sampling MCMC for DP with Retrospective sampling Performance Conclusions

DP Mixture Models (1) DP Mixture models (DMMs) Assume Yi is drawn from a parametric distribution with parameters Xi and  . All Xi have one common prior P, some Xi may take same value. Prior distribution P ~ DP( , H). Property of Pólya urn scheme Marginalize P,

DP Mixture Models (2) Explicit form of DP (Sethuraman 94) Relationship  is large => Vj ~ Beta(1, ) is small => small pj, many sticks of short length => P consists of infinite number of Zj with small pj => P->H .  ->0 => Vj ~ Beta(1, ) is large => few large sticks => P has a large mass on a small subset of Zj => most Xi share the same value. Vj ~ Beta(1, ) Zj ~ H

West’s Gibbs Sampler (1) Estimation of joint posterior Sample from full conditional distributions H,0 , posterior by updating prior H via the likelihood function, DP prior, Pólya urn likelihood

West’s Gibbs Sampler (2) Gibbs sampling scheme: Sample Xi , is equivalent to sample indicator Ki (Ki= k, Xi takes on value Xk*) for Xi , given old {Ki}i=1,n , {Xk*}k=1,K*, generate new Ki from posteriors. For Ki=0, draw new Xi from For Ki>0, generate a new set of Xk* according to posteriors Drawbacks: Converges slowly; Difficult to implement when H and likelihood  are not conjugate.

Blocked Gibbs Sampler (1) Stick-breaking representation Estimation of joint posterior Update P in each Gibbs iteration No Xi involved Must truncate at a finite level K

Blocked Gibbs Sampler (2) Sampling scheme: Sample Zj: For those j occupied by Xi, sample Zj from conditional posterior For those j not occupied by any Xi, sample Zj from base prior H Sample K from its conditional posteriors: Sample p from its conditional posteriors: pk,j is posterior of pk updated by likelihood Vk* is posterior of Vk updated by likelihood

Retrospective Sampling (1) In Blocked Gibbs sampler, given pj, sample Ki, and set Xi= Zki , infinite sampled pairs (pj, Zj) is not feasible. To sample Ki, first generate Ui from uniform [0, 1], then set Ki=j iif Retrospective sampling exchanges order of sampling Ui and sampling the pairs (pj, Zj) . If for a given Ui , more pj needed than we currently have, simulate pairs (pj, Zj) retrospectively, until is satisfied.

Retrospective Sampling (2) Algorithm

MCMC for DDM (1) MCMC with retrospective sampling Notation:

MCMC for DDM (2) Sampling Scheme: Sample Zj: Sample p from its conditional posteriors: Sample K from retrospective sampling.

MCMC for DDM (3) Sampling K Using Metropolis-Hasting steps. When update Ki, the sampler proposes to move from k to k(i,j), The distribution for generating the proposed j is Mi is a constant controls the probability of proposing j greater than max{k}.

MCMC for DDM (4) Algorithm

Performance “lepto” data set (unimodal): 0.67N(0,1)+0.33N(0.3,0.25^2) “bimod” data set (bimodal): 0.5N(-1, 0.5^2)+0.5N(1, 0.5^2) Autocorrelation time: a standard way to measure the speed of convergence How well the algorithm explores the high-dimensional model space

Performance

Conclusions & Comments Retrospective methodology is applied to DP, avoiding approximation. Robust to large dataset. Comments: One of the most wordy and worst-organized paper I’ve read.