Online Learning for Latent Dirichlet Allocation

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Hong Jiao, George Macredy, Junhui Liu, & Youngmi Cho (2012)
Information retrieval – LSI, pLSI and LDA
Hierarchical Dirichlet Processes
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Title: The Author-Topic Model for Authors and Documents
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
LDA Training System 8/22/2012.
CS 599: Social Media Analysis University of Southern California1 Elementary Text Analysis & Topic Modeling Kristina Lerman University of Southern California.
Probabilistic Clustering-Projection Model for Discrete Data
Final Project Presentation Name: Samer Al-Khateeb Instructor: Dr. Xiaowei Xu Class: Information Science Principal/ Theory (IFSC 7321) TOPIC MODELING FOR.
Statistical Topic Modeling part 1
Latent Dirichlet Allocation (LDA)
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Parameter Expanded Variational Bayesian Methods Yuan (Alan) Qi and Tommi S. Jaakkola, MIT NIPS 2006 Presented by: John Paisley Duke University, ECE 3/13/2009.
Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April TexPoint fonts used in EMF. Read the TexPoint manual before.
Bayesian Nonparametric Matrix Factorization for Recorded Music Reading Group Presenter: Shujie Hou Cognitive Radio Institute Friday, October 15, 2010 Authors:
Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John.
Generative Topic Models for Community Analysis
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Latent Dirichlet Allocation a generative model for text
British Museum Library, London Picture Courtesy: flickr.
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Integrating Topics and Syntax Paper by Thomas Griffiths, Mark Steyvers, David Blei, Josh Tenenbaum Presentation by Eric Wang 9/12/2008.
Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
27. May Topic Models Nam Khanh Tran L3S Research Center.
Integrating Topics and Syntax -Thomas L
Fast sampling for LDA William Cohen. MORE LDA SPEEDUPS FIRST - RECAP LDA DETAILS.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
An Introduction to Latent Dirichlet Allocation (LDA)
Project 2 Latent Dirichlet Allocation 2014/4/29 Beom-Jin Lee.
Latent Dirichlet Allocation
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1.
Scaling up LDA (Monday’s lecture). What if you try and parallelize? Split document/term matrix randomly and distribute to p processors.. then run “Approximate.
Web-Mining Agents Topic Analysis: pLSI and LDA
NIPS 2013 Michael C. Hughes and Erik B. Sudderth
Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003.
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Online Multiscale Dynamic Topic Models
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
Trevor Savage, Bogdan Dit, Malcom Gethers and Denys Poshyvanyk
People-LDA using Face Recognition
Hierarchical Topic Models and the Nested Chinese Restaurant Process
Topic Modeling Nick Jordan.
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Topic models for corpora and for graphs
Bayesian Inference for Mixture Language Models
Michal Rosen-Zvi University of California, Irvine
Latent Dirichlet Allocation
Junghoo “John” Cho UCLA
Topic models for corpora and for graphs
Topic Models in Text Processing
Presentation transcript:

Online Learning for Latent Dirichlet Allocation Matthew D. Hoffman, David M. Blei and Francis Bach NIPS 2010 Presented by Lingbo Li

Latent Dirichlet Allocation (LDA) Draw each topic For each document: Draw topic proportions For each word: Draw

Batch variational Bayes for LDA For a collection of documents, infer: Per-word topic assignment Per-document topic proportion topic distributions True posterior is approximated by Optimize over the variational parameters

Online variational inference for LDA Mini-batches: Hyperparameter estimation:

Analysis of convergence

Analysis of convergence Multiply the gradients by the inverse of an appropriate positive definite matrix H to speed up stochastic gradient algorithms. H: the Fisher information matrix of the variational distribution q

Experiments Use perplexity on held-out data as a measure of model: are fit using the E step in algorithm 2;

Evaluating learning parameters Two corpora: 352,549 documents from the journal Nature, and 100,000 documents from the English version Wikipedia. For each corpus, set aside a 1,000-document test set and a separate 1,000-document validation set. Run online LDA for five hours on the remaining documents from each corpus for

Compare batch and online on fixed corpora:

True online