Bayesian Inference for Mixture Language Models

Slides:



Advertisements
Similar presentations
Topic models Source: Topic models, David Blei, MLSS 09.
Advertisements

Xiaolong Wang and Daniel Khashabi
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
Probabilistic Clustering-Projection Model for Discrete Data
Statistical Topic Modeling part 1
Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.
Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April TexPoint fonts used in EMF. Read the TexPoint manual before.
Markov Networks.
Generative Topic Models for Community Analysis
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Latent Dirichlet Allocation a generative model for text
British Museum Library, London Picture Courtesy: flickr.
Multiscale Topic Tomography Ramesh Nallapati, William Cohen, Susan Ditmore, John Lafferty & Kin Ung (Johnson and Johnson Group)
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Latent Dirichlet Allocation (LDA) Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University and Arvind Ramanathan of Oak Ridge National.
Statistical Modeling of Text (Can be seen as an application of probabilistic graphical models) Best reading supplement for just LDA :
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Probabilistic Topic Models
27. May Topic Models Nam Khanh Tran L3S Research Center.
Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
Integrating Topics and Syntax -Thomas L
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Randomized Algorithms for Bayesian Hierarchical Clustering
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Storylines from Streaming Text The Infinite Topic Cluster Model Amr Ahmed, Jake Eisenstein, Qirong Ho Alex Smola, Choon Hui Teo, Eric Xing Carnegie Mellon.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Latent Dirichlet Allocation
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Web-Mining Agents Topic Analysis: pLSI and LDA
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
NIPS 2013 Michael C. Hughes and Erik B. Sudderth
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
A Study of Poisson Query Generation Model for Information Retrieval
Probabilistic Topic Models Hongning Wang Outline 1.General idea of topic models 2.Basic topic models -Probabilistic Latent Semantic Analysis (pLSA)
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Essential Probability & Statistics
B. Freeman, Tomasz Malisiewicz, Tom Landauer and Peter Foltz,
Online Multiscale Dynamic Topic Models
The topic discovery models
Hidden Markov Models (HMMs)
The topic discovery models
Markov Networks.
Latent Dirichlet Analysis
Bayesian Models in Machine Learning
The topic discovery models
Pairwise Sequence Alignment (cont.)
Stochastic Optimization Maximization for Latent Variable Models
Topic models for corpora and for graphs
Bayesian Inference for Mixture Language Models
Michal Rosen-Zvi University of California, Irvine
Expectation-Maximization & Belief Propagation
Latent Dirichlet Allocation
LDA AND OTHER DIRECTED MODELS FOR MODELING TEXT
Junghoo “John” Cho UCLA
Topic models for corpora and for graphs
Topic Models in Text Processing
Markov Networks.
Language Models for TR Rong Jin
Presentation transcript:

Bayesian Inference for Mixture Language Models Chase Geigle, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

Deficiency of PLSA Not a generative model Can’t compute probability of a new document (why do we care about this?) Heuristic workaround is possible, though Many parameters  high complexity of models Many local maxima Prone to overfitting Overfitting is not necessarily a problem for text mining (only interested in fitting the “training” documents)

Latent Dirichlet Allocation (LDA) Make PLSA a generative model by imposing a Dirichlet prior on the model parameters  LDA = Bayesian version of PLSA Parameters are regularized Can achieve the same goal as PLSA for text mining purposes Topic coverage and topic word distributions can be inferred using Bayesian inference

… … w PLSA  LDA Both word distributions and topic choices are free in PLSA p(w|1) government 0.3 response 0.2 ... Topic 1 p(1)=d,1 p(w|2) city 0.2 new 0.1 orleans 0.05 ... Topic 2 w LDA imposes a prior on both … p(2)=d,2 … donate 0.1 relief 0.05 help 0.02 ... p(w|k) Topic k p(k)=d,k

Likelihood Functions for PLSA vs. LDA Core assumption in all topic models PLSA component LDA Added by LDA

Parameter Estimation and Inferences in LDA Parameters can be estimated using ML estimator However, {j} and {d,j} must now be computed using posterior inference Computationally intractable Must resort to approximate inference Many different inference methods are available How many parameters in LDA vs. PLSA?

Posterior Inferences in LDA Computationally intractable, must resort to approximate inference!

Illustration of Bayesian Estimation Bayesian inference: f()=? Posterior: p(|X) p(X|)p() Likelihood: p(X|) X=(x1,…,xN) Posterior Mean Prior: p()  0: prior mode 1: posterior mode ml: ML estimate

LDA as a graph model [Blei et al. 03a]  Dirichlet priors distribution over topics for each document (same as d on the previous slides)  (d)   (d)  Dirichlet() distribution over words for each topic (same as  j on the previous slides) topic assignment for each word zi zi  Discrete( (d) )  (j) T  (j)  Dirichlet() word generated from assigned topic wi wi  Discrete( (zi) ) Nd D Most approximate inference algorithms aim to infer from which other interesting variables can be easily computed

Approximate Inferences for LDA Many different ways; each has its pros & cons Deterministic approximation variational EM [Blei et al. 03a] expectation propagation [Minka & Lafferty 02] Markov chain Monte Carlo full Gibbs sampler [Pritchard et al. 00] collapsed Gibbs sampler [Griffiths & Steyvers 04] Most efficient, and quite popular, but can only work with conjugate prior

The collapsed Gibbs sampler [Griffiths & Steyvers 04] Using conjugacy of Dirichlet and multinomial distributions, integrate out continuous parameters Defines a distribution on discrete ensembles z

The collapsed Gibbs sampler [Griffiths & Steyvers 04] Sample each zi conditioned on z-i This is nicer than your average Gibbs sampler: memory: counts can be cached in two sparse matrices optimization: no special functions, simple arithmetic the distributions on  and  are analytic given z and w, and can later be found for each sample

Gibbs sampling in LDA iteration 1

Gibbs sampling in LDA iteration 1 2

Gibbs sampling in LDA words in di assigned with topic j iteration 1 2 words in di assigned with topic j words in di assigned with any topic Count of instances where wi is assigned with topic j Count of all words

Gibbs sampling in LDA What’s the most likely topic for wi in di? iteration 1 2 What’s the most likely topic for wi in di? How likely would di choose topic j? How likely would topic j generate word wi ?

Gibbs sampling in LDA iteration 1 2

Gibbs sampling in LDA iteration 1 2

Gibbs sampling in LDA iteration 1 2

Gibbs sampling in LDA iteration 1 2

Gibbs sampling in LDA iteration 1 2 … 1000