Topic models for corpora and for graphs

Slides:



Advertisements
Similar presentations
Topic models Source: Topic models, David Blei, MLSS 09.
Advertisements

A Tutorial on Learning with Bayesian Networks
Image Modeling & Segmentation
Mixture Models and the EM Algorithm
Hierarchical Dirichlet Processes
Expectation Maximization
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
Statistical Topic Modeling part 1
Generative Topic Models for Community Analysis
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Latent Dirichlet Allocation a generative model for text
Visual Recognition Tutorial
Bayesian Networks Alan Ritter.
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Latent Dirichlet Allocation (LDA) Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University and Arvind Ramanathan of Oak Ridge National.
Online Learning for Latent Dirichlet Allocation
Lecture 19: More EM Machine Learning April 15, 2010.
Integrating Topics and Syntax -Thomas L
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
CS Statistical Machine learning Lecture 24
An Introduction to Latent Dirichlet Allocation (LDA)
Topic Models.
Lecture 2: Statistical learning primer for biologists
Latent Dirichlet Allocation
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Techniques for Dimensionality Reduction
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Web-Mining Agents Topic Analysis: pLSI and LDA
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Analysis of Social Media MLD , LTI William Cohen
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Latent Dirichlet Allocation (LDA)
Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003.
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)
Analysis of Social Media MLD , LTI William Cohen
Probabilistic models for corpora and graphs. Review: some generative models Multinomial Naïve Bayes C W1W1 W2W2 W3W3 ….. WNWN  M  For each document.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
B. Freeman, Tomasz Malisiewicz, Tom Landauer and Peter Foltz,
Learning Deep Generative Models by Ruslan Salakhutdinov
Online Multiscale Dynamic Topic Models
Latent Dirichlet Allocation
Shuang-Hong Yang, Hongyuan Zha, Bao-Gang Hu NIPS2009
Classification of unlabeled data:
Speeding up LDA.
Statistical Models for Automatic Speech Recognition
Multimodal Learning with Deep Boltzmann Machines
Exam Review Session William Cohen.
Latent Variables, Mixture Models and EM
Latent Dirichlet Analysis
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Michal Rosen-Zvi University of California, Irvine
Latent Dirichlet Allocation
LDA AND OTHER DIRECTED MODELS FOR MODELING TEXT
CS246: Latent Dirichlet Analysis
Junghoo “John” Cho UCLA
Topic models for corpora and for graphs
10-405: LDA Lecture 2.
Topic Models in Text Processing
Markov Networks.
Presentation transcript:

Topic models for corpora and for graphs

Social graphs seem to have Motivation Social graphs seem to have some aspects of randomness small diameter, giant connected components,.. some structure homophily, scale-free degree dist?

“Stochastic block model”, aka “Block-stochastic matrix”: More terms “Stochastic block model”, aka “Block-stochastic matrix”: Draw ni nodes in block i With probability pij, connect pairs (u,v) where u is in block i, v is in block j Special, simple case: pii=qi, and pij=s for all i≠j

Not? football

Not? books Artificial graph

Stochastic blockmodel graphs Alternative to spectral methods: can you directly maximize Pr(structure,parameters|data) for a reasonable model?

Outline Stochastic block models & inference question Review of text models Mixture of multinomials & EM LDA and Gibbs (or variational EM) Block models and inference Mixed-membership block models Multinomial block models and inference w/ Gibbs

Review – supervised Naïve Bayes Naïve Bayes Model: Compact representation   C C ….. WN W1 W2 W3 W M N b M b

Review – supervised Naïve Bayes Multinomial Naïve Bayes  For each document d = 1,, M Generate Cd ~ Mult( | ) For each position n = 1,, Nd Generate wn ~ Mult( |,Cd) C ….. WN W1 W2 W3 M b

Review – supervised Naïve Bayes Multinomial naïve Bayes: Learning Maximize the log-likelihood of observed variables w.r.t. the parameters: Convex function: global optimum Solution:

Review – unsupervised Naïve Bayes Mixture model: unsupervised naïve Bayes model Joint probability of words and classes: But classes are not visible:  Z C W N M b

Review – unsupervised Naïve Bayes Mixture model: learning Not a convex function No global optimum solution Solution: Expectation Maximization Iterative algorithm Finds local optimum Guaranteed to maximize a lower-bound on the log-likelihood of the observed data

Review – unsupervised Naïve Bayes Mixture model: EM solution E-step: M-step: Key capability: estimate distribution of latent variables given observed variables

Review - LDA

Review - LDA Motivation  Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) For each document d = 1,,M Generate d ~ D1(…) For each word n = 1,, Nd generate wn ~ D2( | θdn) Now pick your favorite distributions for D1, D2  w N M

Latent Dirichlet Allocation “Mixed membership” Latent Dirichlet Allocation  For each document d = 1,,M Generate d ~ Dir( | ) For each position n = 1,, Nd generate zn ~ Mult( | d) generate wn ~ Mult( | zn) a z w N M  K

LDA’s view of a document

LDA topics

Latent Dirichlet Allocation Review - LDA Latent Dirichlet Allocation Parameter learning: Variational EM Numerical approximation using lower-bounds Results in biased solutions Convergence has numerical guarantees Gibbs Sampling Stochastic simulation unbiased solutions Stochastic convergence

Review - LDA Gibbs sampling Applicable when joint distribution is hard to evaluate but conditional distribution is known Sequence of samples comprises a Markov Chain Stationary distribution of the chain is the joint distribution Key capability: estimate distribution of one latent variables given the other latent variables and observed variables.

Why does Gibbs sampling work? What’s the fixed point? Stationary distribution of the chain is the joint distribution When will it converge (in the limit)? Graph defined by the chain is connected How long will it take to converge? Depends on second eigenvector of that graph

Called “collapsed Gibbs sampling” since you’ve marginalized away some variables Fr: Parameter estimation for text analysis - Gregor Heinrich

Latent Dirichlet Allocation Review - LDA “Mixed membership” Latent Dirichlet Allocation  Randomly initialize each zm,n Repeat for t=1,…. For each doc m, word n Find Pr(zmn=k|other z’s) Sample zmn according to that distr. a z w N M 

Outline Stochastic block models & inference question Review of text models Mixture of multinomials & EM LDA and Gibbs (or variational EM) Block models and inference Mixed-membership block models Multinomial block models and inference w/ Gibbs Beastiary of other probabilistic graph models Latent-space models, exchangeable graphs, p1, ERGM

Review - LDA Motivation  Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) For each document d = 1,,M Generate d ~ D1(…) For each word n = 1,, Nd generate wn ~ D2( | θdn) Docs and words are exchangeable.  w N M

Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks zp,zq are exchangeable a b zp zp zq p apq N N2

Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks zp,zq are exchangeable a Gibbs sampling: Randomly initialize zp for each node p. For t = 1… For each node p Compute zp given other z’s Sample zp b zp zp zq p apq N N2 See: Snijders & Nowicki, 1997, Estimation and Prediction for Stochastic Blockmodels for Groups with Latent Graph Structure

Mixed Membership Stochastic Block models q p zp. z.q p apq N N2 Airoldi et al, JMLR 2008

Mixed Membership Stochastic Block models

Mixed Membership Stochastic Block models

Parkkinen et al paper

Another mixed membership block model

Another mixed membership block model z=(zi,zj) is a pair of block ids nz = #pairs z qz1,i = #links to i from block z1 qz1,. = #outlinks in block z1 δ = indicator for diagonal M = #nodes

Another mixed membership block model

Another mixed membership block model

Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010 + lots of synthetic data Balasubramanyan, Lin, Cohen, NIPS w/s 2010

Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010

Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010