Hierarchical Topic Models and the Nested Chinese Restaurant Process Liutong Chen(lc6re) Siwei Liu(sl7vy) Shaojia Li(sl4ab)
Introduction CRP Model Experiment Conclusion
Introduction Takes Bayesian approach to generate an appropriate prior via a distribution on partitions Builds a hierarchical topic model Illustrates approach on simulated data
CRP A Chinese restaurant with an infinite number of tables and each with infinite capacity. Customer 1 sits at the first table The next customer either sits at the same table as customer 1 or the next table the m customers will sit at a table drawn from the following equation:
Extending CRP to Hierarchies Assumption: Infinite amount of infinite table Chinese restaurant in a city. One of them is root restaurant. Each tables has cards that refer to other restaurant Each restaurant is referred to exactly once.
Extending CRP to Hierarchies Scene: 1.Time 1, a tourist enter the root restaurant and choose a table by above equation. 2.Time 2, the tourist go to the referred restaurant and choose a table by the equation. 3.Time L, the tourist is at the L-th referred restaurant and establish a path from root to L level in the infinite tree. 4. M tourist with L times, the collection of paths is a particular L level subtree of the infinite tree.
Hierarchical LDA Given an L-level tree and each node is associated with a topic. Choose a path from root to leaf Draw a vector of topic proportions θ from an L-dimensional Dirichlet Generate the words in the document from a mixture of the topics along the path from root to leaf, with mixing proportions θ
Nested CRP with LDA Let c1 be the root restaurant. For each level l ∈ {2,...,L}: Draw a table from restaurant cl−1 using Eq. (1). Set cl to be the restaurant referred to by that table. Draw an L-dimensional topic proportion vector θ from Dir(α). For each word n ∈ {1,...,N}: Draw z ∈ {1,...,L} from Mult(θ). Draw wn from the topic associated with restaurant cz . We use nested CRP to relax the assumption of a fixed tree structure. K possible topics. The nested CRP can be used to place a prior on possible trees. The node labeled T refers to a collection of an infinite number of L-level paths drawn from a nested CRP
Gibbs Sampling The goal is to estimate: zm,n, the assignment of the nth word in the mth document to one of the L available topics, and cm,l, the restaurant corresponding to the lth topic in document m. First, given the current state of the CRP, we sample zm,n variables of the underlying LDA model following the algorithm: In previous method, we know the prior distribution and we make a generative model to generate a document. Now if we are given a corpus of M documents, we want to infer the topic distribution and word distribution. (Since LDA regard the estimated word and topic distribution as the prior of the dirichlet distribution. Therefore, in estimating the topic and word distribution.) In the process to estimating topic distribution and word distribution, We know the prior distribution is the Dirichlet distribution, what we need it to find the posterior distribution for topic and word, theta and alpha. The goal is to estimate the posterior distribution P(wj | zk) and P(zk | di).
Gibbs Sampling Second, given the values of the LDA hidden variables, we sample the cm,l variables which are associated with the CRP prior. The conditional distribution for cm, the L topics associated with document m, is: This expression is an instance of Bayes’ rule with p(wm | c, w−m, z) as the likelihood of the data given a particular choice of cm and p(cm | c−m) as the prior on cm implied by the nested CRP.
Experiment 1. Compare CRP method with Bayes Factor method 2. Estimate five different hierarchies. 3. Demonstration on real data
Compare CRP method with Bayes Factor method Compared with Bayes factors method, CRP is Faster Only one free parameter to set More effective 1. Bayes factors involves multiple runs of a Gibbs sampler , while CRP only need a single run 2. Bayes factors need to choose an appropriate range of K, while CRP only need to set a free parameter gamma. 3. Found dimension match more on CRP Prior
Estimate five different hierarchies Result of estimating hierarchies on simulated data, structure refers to a three-level hierarchy The first integer is the number of branches from the root The following numbers are the number of children of each branch Leaf error refers to how many leaves were incorrect in the resulting tree
Demonstration on real data Dataset 1717 NIPS: 208890 words vocabulary of 1600 terms Estimates a three level hierarchy The model has nicely captured the function words without using an auxiliary list At the next level, it separated the words pertaining to neuroscience abstracts and machine learning abstracts Finally, it generate several important subtopics with the two fields.
Conclusion Nested Chinese Restaurant Process Gibbs sampling procedure for the model Extension: 1. Depth of hierarchies can vary from document to document 2. Documents in models are allowed to mix over paths Ext1: Each document is still a mixture of topics along a path in a hierarchy, but different documents can express paths of different lengths as they represent levels of specialization. Ext2: although in this model, a document is associated with a single path, it is also natural to consider models in which documents are allowed to mix over path.
Questions?