Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.

Slides:

Advertisements

Similar presentations

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei

Advertisements

Xiaolong Wang and Daniel Khashabi

Course: Neural Networks, Instructor: Professor L.Behera.

Probabilistic models Haixu Tang School of Informatics.

Hierarchical Dirichlet Processes

Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,

DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Bayesian Nonparametrics via Probabilistic Programming Frank Wood

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.

Nonparametric hidden Markov models Jurgen Van Gael and Zoubin Ghahramani.

HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.

Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.

1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

Presenting: Assaf Tzabari

The moment generating function of random variable X is given by Moment generating function.

Computer vision: models, learning and inference

Computer vision: models, learning and inference Chapter 3 Common probability distributions.

Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-

Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.

Chapter Two Probability Distributions: Discrete Variables

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.

Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.

1 A Bayes method of a Monotone Hazard Rate via S-paths Man-Wai Ho National University of Singapore Cambridge, 9 th August 2007.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream (UAI 2010) Amr Ahmed and Eric.

Variational Inference for the Indian Buffet Process

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Random Variables.

1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.

Stick-Breaking Constructions

Bayesian Prior and Posterior Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Nov. 24, 2000.

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August.

A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.

Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.

Generalized Spatial Dirichlet Process Models Jason A. Duan Michele Guindani Alan E. Gelfand March, 2006.

Bayesian Density Regression Author: David B. Dunson and Natesh Pillai Presenter: Ya Xue April 28, 2006.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Central Limit Theorem Let X 1, X 2, …, X n be n independent, identically distributed random variables with mean  and standard deviation . For large n:

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.

Bayesian Semi-Parametric Multiple Shrinkage

An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

Probability Theory and Parameter Estimation I

Bayesian Generalized Product Partition Model

Advanced Statistical Computing Fall 2016

Inference: Conclusion with Confidence

Ch3: Model Building through Regression

Non-Parametric Models

Omiros Papaspiliopoulos and Gareth O. Roberts

Dirichlet process tutorial

A Non-Parametric Bayesian Method for Inferring Hidden Causes

Kernel Stick-Breaking Process

OVERVIEW OF BAYESIAN INFERENCE: PART 1

Collapsed Variational Dirichlet Process Mixture Models

Exact and Approximate Sum Representations for the Dirichlet Process

Multitask Learning Using Dirichlet Process

Generalized Spatial Dirichlet Process Models

Chinese Restaurant Representation Stick-Breaking Construction

Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.

Presentation transcript:

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06

Overview Introduction What’s Stick-breaking priors? Relationship between different priors Two Gibbs samplers Polya Urn Gibbs sampler Blocked Gibbs sampler Results Conclusions

Introduction What’s Stick-Breaking Priors? Discrete random probability measures p k : random weights, independent of Z k, Z k are iid random elements with a distribution H, where H is nonatomic.. Random weights are constructed through stick-breaking procedure.

Introduction (cont’d) Steak-breaking construction:, i.i.d. random variables. N is finite: set V N =1 to guarantee. p k have the generalized Dirichlet distribution which is conjugate to multinomial distribution. N is infinite: Infinite dimensional priors include the DP, two-parameter Poisson-Dirichlet process (Pitman-Yor process), and beta two-parameter process. 01 v1v1 1-v 1 (1-v 1 )(1-v 2 ) v 2 (1-v 1 )v 3 (1-v 1 ) (1-v 2 ) …

Pitman-Yor Process, Two-parameter Poisson-Dirichlet Process: Discrete random probability measures Q n have a GEM distribution Prediction rule (Generalized Polya Urn characterization): A special case of Stick-breaking random measure:

Generalized Dirichlet Random Weights Finite stick-breaking priors & GD: Random weights p=[p 1,..,p N ] constructed from a finite Stick-breaking procedure is a Generalized Dirichlet distribution (GD). The density for p is f(p 1,..,p N )=f(p N | p N-1,…, p 1 ) f(p N-1 | p N-2,…, p 1 )…f(p 1 ) a k =  k, b k =  k+1 +…+  N

Generalized Dirichlet Random Weights Finite dimensional Dirichlet priors: A random measure with weights, p=(p 1,…,p N )~Dirichlet(  1,…,  N ), p has a GD distribution w/ a k =  k, b k =  k+1 +…+  N. Connection: all random measures based on Dirichlet random weights are Stick-breaking random measure w/ finite N.

Truncations Finite Stick-breaking random measure can be a truncation of. Discard the N+1, N+2,… terms in, and replace p N with 1-p 1 -…-p N-1. It’s an approximation. When as a prior is applied in Bayeisan hierarchical model, the Bayesian marginal density under the truncation is

Truncations (cont’d) If n=1000, N=20,  =1, then ~10^(-5)

Polya Urn Gibbs Sampler Stick-breaking measures used as priors in Bayesian semiparametric models, Integrating over P, we have Polya Urn Gibbs sampler: (a) (b)

Blocked Gibbs Sampler Assume the prior is a finite dimensional, the model is rewritten as Direct Posterior Inference Iteratively draw values Values from joint distribution of Each draw defines a random measure

Blocked Gibbs Algorithm Algorithm: Let denote the set of current m unique values of K,

Comparisons In Polya Urn Process, in one Gibbs iteration, each data inquires existing m clusters & a new cluster one by one. The extreme case is each data belongs to one cluster, ie, # of cluster equals to # of data points. In Blocked Gibbs sampler, in one Gibbs iteration, all n data points inquire existing m clusters & N-m new different clusters. That’s the infinite un-present clusters in Polya Urn process is represented by N-m clusters in Blocked Gibbs sampler. Since # of data points is finite, once N >= n, N possible clusters are enough for all data even in the extreme case where each data belongs to one cluster. In this sense, Blocked Gibbs sampler is equivalent to Polya Urn Gibbs sampler.

Results Simulated 50 observations from a standard normal distribution.