Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Teg Grenager NLP Group Lunch February 24, 2005
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Course: Neural Networks, Instructor: Professor L.Behera.
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes
Hierarchical Dirichlet Process (HDP)
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Bayesian Nonparametrics via Probabilistic Programming Frank Wood
Combining Information from Related Regressions Duke University Machine Learning Group Presented by Kai Ni Apr. 27, 2007 F. Dominici, G. Parmigiani, K.
Nonparametric hidden Markov models Jurgen Van Gael and Zoubin Ghahramani.
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.
A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Hierarchical Bayesian Nonparametrics with Applications Michael I. Jordan University of California, Berkeley Acknowledgments: Emily Fox, Erik Sudderth,
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Motivation Parametric models can capture a bounded amount of information from the data. Real data is complex and therefore parametric assumptions is wrong.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
An Overview of Nonparametric Bayesian Models and Applications to Natural Language Processing Narges Sharif-Razavian and Andreas Zollmann.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream (UAI 2010) Amr Ahmed and Eric.
Probability and Measure September 2, Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued.
Randomized Algorithms for Bayesian Hierarchical Clustering
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
Bayesian Generalized Kernel Mixed Models Zhihua Zhang, Guang Dai and Michael I. Jordan JMLR 2011.
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)
Stick-Breaking Constructions
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.
Latent Dirichlet Allocation
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)
Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August.
Generalized Spatial Dirichlet Process Models Jason A. Duan Michele Guindani Alan E. Gelfand March, 2006.
Bayesian Density Regression Author: David B. Dunson and Natesh Pillai Presenter: Ya Xue April 28, 2006.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Variational Infinite Hidden Conditional Random Fields with Coupled Dirichlet Process Mixtures K. Bousmalis, S. Zafeiriou, L.-P. Morency, M. Pantic, Z.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Bayesian Generalized Product Partition Model
Variational Bayes Model Selection for Mixture Distribution
CS 2750: Machine Learning Density Estimation
Non-Parametric Models
Omiros Papaspiliopoulos and Gareth O. Roberts
Dirichlet process tutorial
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Kernel Stick-Breaking Process
Collapsed Variational Dirichlet Process Mixture Models
Hierarchical Topic Models and the Nested Chinese Restaurant Process
Chinese Restaurant Representation Stick-Breaking Construction
Presentation transcript:

Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing Clusters Among Related Groups:

Overview Motivation Dirichlet Processes Hierarchical Dirichlet Processes Inference Experimental results Conclusions

Motivation Multi-task learning: clustering Goal: Share clusters among multiple related clustering problems (model-based). Approach: Hierarchical; Nonparametric Bayesian; DP Mixture Model: learn a generative model over the data, treating the classes as hidden variables;

Dirichlet Processes Let ( ,  ) be a measurable space, G 0 be a probability measure on the space, and  be a positive real number. A Dirichlet process is any distribution of a random probability measure G over ( ,  ) such that, for all finite partitions (A 1,…,A r ) of , G ~DP( , G 0 ) if G is a random probability measure with distribution given by the Dirichlet process. Draws G from DP are generally not distinct, discrete,, Ө k ~G 0, β k are random and depend on . Properties:

Chinese Restaurant Processes CRP(the polya urn scheme) Φ 1,…,Φ i-1, i.i.d., r.v., distributed according to G; Ө 1,…, Ө K to be the distinct values taken on by Φ 1,…,Φ i-1, n k be # of Φ i’ = Ө k, 0<i’<i, This slide is from “Chinese Restaurants and Stick-Breaking: An Introduction to the Dirichlet Process”, NLP Group, Stanford, Feb. 2005

DP Mixture Model One of the most important application of DP: nonparametric prior distribution on the components of a mixture model. Why no direct application of density estimation? Because G is discrete?

HDP – Problem statement We have J groups of data, {X j }, j=1,…, J. For each group, X j ={x ji }, i=1, …, nj. In each group, X j ={x ji } are modeled with a mixture model. The mixing proportions are specific to the group. Different groups share the same set of mixture components (underlying clusters, ), but different group is a different combination of the mixture components. Goal: Discover the distribution of within a group; Discover the distribution of across groups;

HDP - General representation G 0 : the global prob. measure ~ DP(r, H), r: concentration parameter, H is the base measure. G j : the probability distribution for group j, ~ DP(α, G0). Φ ji : the hidden parameters of distribution F( Φ ji ) corresponding to x ji. The overall model is: Two-level DPs.

HDP - General representation G 0 places non-zeros mass only on, thus,, i.i.d, r.v. distributed according to H.

HDP-CR franchise First level: within each group, DP mixture Φ j1,…,Φ ji-1, i.i.d., r.v., distributed according to G j ; Ѱ j1,…, Ѱ jT j to be the values taken on by Φ j1,…,Φ ji-1, n jk be # of Φ ji’ = Ѱ jt, 0<i’<i. Second level: across group, sharing components Base measure of each group is a draw from DP: Ѱ jt | G 0 ~ G 0, G 0 ~ DP(r, H), Ө 1,…, Ө K to be the values taken on by Ѱ j1,…, Ѱ jT j, m k be # of Ѱ jt =Ө k, all j, t.

HDP-CR franchise Values of Φ ji are shared among groups. Integrating out G 0

Inference- MCMC Gibbs sampling the posterior in the CR franchise: Instead of directly dealing with Φ ji & Ѱ jt to get p(Φ, Ѱ |X), p(t, k, Ө|X) is achieved by sampling t, k, Ө, where, t={t ji }, t ji is the table index that Φ ji associated with, Φ ji = Ѱ jt ji. K={k jt }, k jt is the index that Ѱ jt takes value on Ө k, Ѱ jt =Ө k jt. Knowing the prior distribution as shown in CPR franchise, the posterior is sampled iteratively, Sampling t: Sampling K: Sampling Ө:

Experiments on the synthetic data Data description: We have three group data; Each group is a Gaussian mixture; Different group can share same clusters; Each cluster has 50 2-D data points, features are independent; Group 1: [1, 2, 3, 7] Group 2: [3, 4, 5, 7] Group 3: [5, 6, 1, 7]

Experiments on the synthetic data HDPs definition: here, F(x ji |φ ji ) is Gussian distribution, φ ji ={μ ji, σ ji }; φ ji take values on one of θ k ={μ k, σ k }, k=1…. μ ~ N(m, σ/β), σ -1 ~ Gamma (a, b), i. e., H is Norm- Gamma joint distribution. m, β, a, b are given hyperparameters. Goal: Model each group as a Gaussian mixture ; Model the cluster distribution over groups ;

Experiments on the synthetic data Results on Synthetic Data Global distribution: Estimated over all groups and the corresponding mixing proportions The number of components is open- ended, here only partial is shown.

Experiments on the synthetic data Mixture within each group : The number of components in each group is also open-ended, here only partial is shown.

Conclusions & discussions This hierarchical Bayesian method can automatically determine the appropriate number of mixture components needed. A set of DPs are coupled via their base measure to achieve the component sharing among groups. Non-parametric priors; not non-parametric density estimation.