Hierarchical Topic Models and the Nested Chinese Restaurant Process

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Topic models Source: Topic models, David Blei, MLSS 09.
Teg Grenager NLP Group Lunch February 24, 2005
Xiaolong Wang and Daniel Khashabi
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Course: Neural Networks, Instructor: Professor L.Behera.
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes
Hierarchical Dirichlet Process (HDP)
A Tutorial on Learning with Bayesian Networks
Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA.
Information retrieval – LSI, pLSI and LDA
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Hierarchical Dirichlet Processes
Computer vision: models, learning and inference Chapter 8 Regression.
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Chapter Two Probability Distributions: Discrete Variables
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Online Learning for Latent Dirichlet Allocation
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream (UAI 2010) Amr Ahmed and Eric.
Randomized Algorithms for Bayesian Hierarchical Clustering
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.
Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University Departments.
Gaussian Processes For Regression, Classification, and Prediction.
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Fast search for Dirichlet process mixture models
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Accelerated Sampling for the Indian Buffet Process
Non-Parametric Models
Topic and Role Discovery In Social Networks
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Latent Dirichlet Analysis
Hidden Markov Models Part 2: Algorithms
Multitask Learning Using Dirichlet Process
Matching Words with Pictures
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Michal Rosen-Zvi University of California, Irvine
Latent Dirichlet Allocation
Junghoo “John” Cho UCLA
Topic models for corpora and for graphs
Topic Models in Text Processing
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

Hierarchical Topic Models and the Nested Chinese Restaurant Process Liutong Chen(lc6re) Siwei Liu(sl7vy) Shaojia Li(sl4ab)

Introduction CRP Model Experiment Conclusion

Introduction Takes Bayesian approach to generate an appropriate prior via a distribution on partitions Builds a hierarchical topic model Illustrates approach on simulated data

CRP A Chinese restaurant with an infinite number of tables and each with infinite capacity. Customer 1 sits at the first table The next customer either sits at the same table as customer 1 or the next table the m customers will sit at a table drawn from the following equation:

Extending CRP to Hierarchies Assumption: Infinite amount of infinite table Chinese restaurant in a city. One of them is root restaurant. Each tables has cards that refer to other restaurant Each restaurant is referred to exactly once.

Extending CRP to Hierarchies Scene: 1.Time 1, a tourist enter the root restaurant and choose a table by above equation. 2.Time 2, the tourist go to the referred restaurant and choose a table by the equation. 3.Time L, the tourist is at the L-th referred restaurant and establish a path from root to L level in the infinite tree. 4. M tourist with L times, the collection of paths is a particular L level subtree of the infinite tree.

Hierarchical LDA Given an L-level tree and each node is associated with a topic. Choose a path from root to leaf Draw a vector of topic proportions θ from an L-dimensional Dirichlet Generate the words in the document from a mixture of the topics along the path from root to leaf, with mixing proportions θ

Nested CRP with LDA Let c1 be the root restaurant. For each level l ∈ {2,...,L}: Draw a table from restaurant cl−1 using Eq. (1). Set cl to be the restaurant referred to by that table. Draw an L-dimensional topic proportion vector θ from Dir(α). For each word n ∈ {1,...,N}: Draw z ∈ {1,...,L} from Mult(θ). Draw wn from the topic associated with restaurant cz . We use nested CRP to relax the assumption of a fixed tree structure. K possible topics. The nested CRP can be used to place a prior on possible trees. The node labeled T refers to a collection of an infinite number of L-level paths drawn from a nested CRP

Gibbs Sampling The goal is to estimate: zm,n, the assignment of the nth word in the mth document to one of the L available topics, and cm,l, the restaurant corresponding to the lth topic in document m. First, given the current state of the CRP, we sample zm,n variables of the underlying LDA model following the algorithm: In previous method, we know the prior distribution and we make a generative model to generate a document. Now if we are given a corpus of M documents, we want to infer the topic distribution and word distribution. (Since LDA regard the estimated word and topic distribution as the prior of the dirichlet distribution. Therefore, in estimating the topic and word distribution.) In the process to estimating topic distribution and word distribution, We know the prior distribution is the Dirichlet distribution, what we need it to find the posterior distribution for topic and word, theta and alpha. The goal is to estimate the posterior distribution P(wj | zk) and P(zk | di).

Gibbs Sampling Second, given the values of the LDA hidden variables, we sample the cm,l variables which are associated with the CRP prior. The conditional distribution for cm, the L topics associated with document m, is: This expression is an instance of Bayes’ rule with p(wm | c, w−m, z) as the likelihood of the data given a particular choice of cm and p(cm | c−m) as the prior on cm implied by the nested CRP.

Experiment 1. Compare CRP method with Bayes Factor method 2. Estimate five different hierarchies. 3. Demonstration on real data

Compare CRP method with Bayes Factor method Compared with Bayes factors method, CRP is Faster Only one free parameter to set More effective 1. Bayes factors involves multiple runs of a Gibbs sampler , while CRP only need a single run 2. Bayes factors need to choose an appropriate range of K, while CRP only need to set a free parameter gamma. 3. Found dimension match more on CRP Prior

Estimate five different hierarchies Result of estimating hierarchies on simulated data, structure refers to a three-level hierarchy The first integer is the number of branches from the root The following numbers are the number of children of each branch Leaf error refers to how many leaves were incorrect in the resulting tree

Demonstration on real data Dataset 1717 NIPS: 208890 words vocabulary of 1600 terms Estimates a three level hierarchy The model has nicely captured the function words without using an auxiliary list At the next level, it separated the words pertaining to neuroscience abstracts and machine learning abstracts Finally, it generate several important subtopics with the two fields.

Conclusion Nested Chinese Restaurant Process Gibbs sampling procedure for the model Extension: 1. Depth of hierarchies can vary from document to document 2. Documents in models are allowed to mix over paths Ext1: Each document is still a mixture of topics along a path in a hierarchy, but different documents can express paths of different lengths as they represent levels of specialization. Ext2: although in this model, a document is associated with a single path, it is also natural to consider models in which documents are allowed to mix over path.

Questions?