Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Jonathan Huang

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Teg Grenager NLP Group Lunch February 24, 2005
Xiaolong Wang and Daniel Khashabi
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Information retrieval – LSI, pLSI and LDA
Hierarchical Dirichlet Processes
Title: The Author-Topic Model for Authors and Documents
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Generative learning methods for bags of features
Probabilistic Clustering-Projection Model for Discrete Data
Statistical Topic Modeling part 1
Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.
Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April TexPoint fonts used in EMF. Read the TexPoint manual before.
Generative Topic Models for Community Analysis
Probabilistic inference
Latent Dirichlet Allocation a generative model for text
Generative learning methods for bags of features
British Museum Library, London Picture Courtesy: flickr.
A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images.
Models for Authors and Text Documents Mark Steyvers UCI In collaboration with: Padhraic Smyth (UCI) Michal Rosen-Zvi (UCI) Thomas Griffiths (Stanford)
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
LSA, pLSA, and LDA Acronyms, oh my!
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
LSA, pLSA, and LDA Acronyms, oh my!
Latent Dirichlet Allocation (LDA) Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University and Arvind Ramanathan of Oak Ridge National.
Online Learning for Latent Dirichlet Allocation
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Probabilistic Topic Models
27. May Topic Models Nam Khanh Tran L3S Research Center.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
Integrating Topics and Syntax -Thomas L
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
An Introduction to Latent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Web-Mining Agents Topic Analysis: pLSI and LDA
Latent Dirichlet Allocation (LDA)
Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003.
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
B. Freeman, Tomasz Malisiewicz, Tom Landauer and Peter Foltz,
Online Multiscale Dynamic Topic Models
The topic discovery models
Latent Variables, Mixture Models and EM
The topic discovery models
Probabilistic Models with Latent Variables
The topic discovery models
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Topic models for corpora and for graphs
Michal Rosen-Zvi University of California, Irvine
Latent Dirichlet Allocation
Junghoo “John” Cho UCLA
Topic models for corpora and for graphs
Topic Models in Text Processing
Presentation transcript:

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang Advisor: Carlos Guestrin 11/15/2005

“Bag of Words” Models  Let’s assume that all the words within a document are exchangeable.

Mixture of Unigrams Mixture of Unigrams Model (this is just Naïve Bayes) For each of M documents,  Choose a topic z.  Choose N words by drawing each one independently from a multinomial conditioned on z. In the Mixture of Unigrams model, we can only have one topic per document! ZiZi w 4i w 3i w 2i w i1

The pLSI Model Probabilistic Latent Semantic Indexing (pLSI) Model For each word of document d in the training set,  Choose a topic z according to a multinomial conditioned on the index d.  Generate the word by drawing from a multinomial conditioned on z. In pLSI, documents can have multiple topics. d z d4 z d3 z d2 z d1 w d4 w d3 w d2 w d1

Motivations for LDA  In pLSI, the observed variable d is an index into some training set. There is no natural way for the model to handle previously unseen documents.  The number of parameters for pLSI grows linearly with M (the number of documents in the training set).  We would like to be Bayesian about our topic mixture proportions.

Dirichlet Distributions  In the LDA model, we would like to say that the topic mixture proportions for each document are drawn from some distribution.  So, we want to put a distribution on multinomials. That is, k-tuples of non-negative numbers that sum to one.  The space is of all of these multinomials has a nice geometric interpretation as a (k-1)-simplex, which is just a generalization of a triangle to (k-1) dimensions.  Criteria for selecting our prior: It needs to be defined for a (k-1)-simplex. Algebraically speaking, we would like it to play nice with the multinomial distribution.

Dirichlet Examples

Dirichlet Distributions  Useful Facts: This distribution is defined over a (k-1)-simplex. That is, it takes k non-negative arguments which sum to one. Consequently it is a natural distribution to use over multinomial distributions. In fact, the Dirichlet distribution is the conjugate prior to the multinomial distribution. (This means that if our likelihood is multinomial with a Dirichlet prior, then the posterior is also Dirichlet!) The Dirichlet parameter  i can be thought of as a prior count of the i th class.

The LDA Model  z4z4 z3z3 z2z2 z1z1 w4w4 w3w3 w2w2 w1w1    z4z4 z3z3 z2z2 z1z1 w4w4 w3w3 w2w2 w1w1  z4z4 z3z3 z2z2 z1z1 w4w4 w3w3 w2w2 w1w1  For each document,  Choose ~Dirichlet()  For each of the N words wn: Choose a topic z n » Multinomial() Choose a word w n from p(w n |z n,), a multinomial probability conditioned on the topic z n.

The LDA Model For each document,  Choose  » Dirichlet()  For each of the N words w n : Choose a topic z n » Multinomial() Choose a word w n from p(w n |z n,), a multinomial probability conditioned on the topic z n.

Inference The inference problem in LDA is to compute the posterior of the hidden variables given a document and corpus parameters  and . That is, compute p( ,z|w, ,  ). Unfortunately, exact inference is intractable, so we turn to alternatives…

Variational Inference In variational inference, we consider a simplified graphical model with variational parameters ,  and minimize the KL Divergence between the variational and posterior distributions.

Parameter Estimation  Given a corpus of documents, we would like to find the parameters  and  which maximize the likelihood of the observed data.  Strategy (Variational EM): Lower bound log p(w|,) by a function L(,;,) Repeat until convergence:  Maximize L(,;,) with respect to the variational parameters ,.  Maximize the bound with respect to parameters  and .

Some Results  Given a topic, LDA can return the most probable words.  For the following results, LDA was trained on 10,000 text articles posted to 20 online newsgroups with 40 iterations of EM. The number of topics was set to 50.

Some Results PoliticalTeamSpaceDriveGod PartyGameNASAWindowsJesus BusinessPlayResearchCardHis ConventionYearCenterDOSBible InstituteGamesEarthSCSIChristian CommitteeWinHealthDiskChrist StatesHockeyMedicalSystemHim RightsSeasonGovMemoryChristians “politics”“sports”“space”“computers”“christianity”

Extensions/Applications  Multimodal Dirichlet Priors  Correlated Topic Models  Hierarchical Dirichlet Processes  Abstract Tagging in Scientific Journals  Object Detection/Recognition

Visual Words  Idea: Given a collection of images, Think of each image as a document. Think of feature patches of each image as words. Apply the LDA model to extract topics. (J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, W. T. Freeman. Discovering object categories in image collections. MIT AI Lab Memo AIM , February, )

Visual Words Examples of ‘visual words’

Visual Words

Thanks!  Questions?  References: Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Finding Scientific Topics. Griffiths, T., & Steyvers, M. (2004). Proceedings of the National Academy of Sciences, 101 (suppl. 1), Hierarchical topic models and the nested Chinese restaurant process. D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum In S. Thrun, L. Saul, and B. Scholkopf, editors, Advances in Neural Information Processing Systems (NIPS) 16, Cambridge, MA, MIT Press. Discovering object categories in image collections. J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, W. T. Freeman. MIT AI Lab Memo AIM , February, 2005.