Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.

Similar presentations


Presentation on theme: "Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz."— Presentation transcript:

1 Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz

2 Document classification One-class approach: one topic per document, with words generated according to the topic. For example, a Naive Bayes model.

3 Document classification It is more realistic to assume more than one topic per document. Generative model: pick a mixture distribution over K topics and generate words from it.

4 Document classification Even more realistic: topics may be organized in a hierarchy (not independent); Pick a path from root to leaf in a tree; each node is a topic; sample from the mixture.

5 Dirichlet distribution (DD) Distribution over distribution vectors of dimension K: P(p; u,  ) = 1/Z(u)  i p i ui Parameters are a prior distribution (“previous observations”); Symmetric Dirichlet distribution assumes a uniform prior distribution (u i = u j, any i, j).

6 Latent Dirichlet Allocation (LDA) Generative model of multiple-topic documents; Generate a mixture distribution on topics using a Dirichlet distribution; Pick a topic according to their distribution and generate words according to the word distribution for the topic.

7 Latent Dirichlet Allocation (LDA) K W   w Words Topics Topic distribution  DD hyper parameter

8 Chinese Restaurant Process (CRP) 1 out of 9 customers

9 Chinese Restaurant Process (CRP) 2 out of 9 customers

10 Chinese Restaurant Process (CRP) 3 out of 9 customers

11 Chinese Restaurant Process (CRP) 4 out of 9 customers

12 Chinese Restaurant Process (CRP) 5 out of 9 customers

13 Chinese Restaurant Process (CRP) 6 out of 9 customers

14 Chinese Restaurant Process (CRP) 7 out of 9 customers

15 Chinese Restaurant Process (CRP) 8 out of 9 customers

16 Chinese Restaurant Process (CRP) 9 out of 9 customers Data point (a distribution itself) sampled

17 Species Sampling Mixture Generative model of multiple-topic documents; Generate a mixture distribution on topics using a CRP prior; Pick a topic according to their distribution and generate words according to the word distribution for the topic.

18 Species Sampling Mixture K W   w Words Topics Topic distribution  CRP hyper parameter

19 Nested CRP 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6

20 Hierarchical LDA (hLDA) Generative model of multiple-topic documents; Generate a mixture distribution on topics using a Nested CRP prior; Pick a topic according to their distribution and generate words according to the word distribution for the topic.

21 hLDA graphical model

22 Artificial data experiment 100 1000-word documents on 25-term vocabulary Each vertical bar is a topic

23 CRP prior vs. Bayes Factors

24 Predicting the structure

25 NIPS abstracts

26 Comments Accommodates growing collections of data; Hierarchical organization makes sense, but not clear to me why the CRP prior is the best prior for that; No mention of time; maybe it takes a very long time.


Download ppt "Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz."

Similar presentations


Ads by Google