Topic models for corpora and for graphs

Slides:



Advertisements
Similar presentations
Topic models Source: Topic models, David Blei, MLSS 09.
Advertisements

Mixture Models and the EM Algorithm
Hierarchical Dirichlet Processes
Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Expectation Maximization
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Statistical Topic Modeling part 1
Generative Topic Models for Community Analysis
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Latent Dirichlet Allocation a generative model for text
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Visual Recognition Tutorial
Bayesian Networks Alan Ritter.
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Latent Dirichlet Allocation (LDA) Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University and Arvind Ramanathan of Oak Ridge National.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Lecture 19: More EM Machine Learning April 15, 2010.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
CS Statistical Machine learning Lecture 24
Topic Models.
Lecture 2: Statistical learning primer for biologists
Latent Dirichlet Allocation
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Techniques for Dimensionality Reduction
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Web-Mining Agents Topic Analysis: pLSI and LDA
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Analysis of Social Media MLD , LTI William Cohen
Latent Dirichlet Allocation (LDA)
Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003.
Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)
Analysis of Social Media MLD , LTI William Cohen
Probabilistic models for corpora and graphs. Review: some generative models Multinomial Naïve Bayes C W1W1 W2W2 W3W3 ….. WNWN  M  For each document.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
B. Freeman, Tomasz Malisiewicz, Tom Landauer and Peter Foltz,
Learning Deep Generative Models by Ruslan Salakhutdinov
Latent Dirichlet Allocation
Classification of unlabeled data:
Speeding up LDA.
Statistical Models for Automatic Speech Recognition
Multimodal Learning with Deep Boltzmann Machines
Exam Review Session William Cohen.
Latent Variables, Mixture Models and EM
Latent Dirichlet Analysis
Advanced Artificial Intelligence
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Topic models for corpora and for graphs
Michal Rosen-Zvi University of California, Irvine
Latent Dirichlet Allocation
Pegna, J.M., Lozano, J.A., and Larragnaga, P.
LDA AND OTHER DIRECTED MODELS FOR MODELING TEXT
Junghoo “John” Cho UCLA
10-405: LDA Lecture 2.
Topic Models in Text Processing
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
Sequential Learning with Dependency Nets
Presentation transcript:

Topic models for corpora and for graphs

More “real” network datasets More experiments More “real” network datasets

More experiments

More experiments Varying the number of clusters for PIC and PIC4 (starts with random 4-d point rather than a random 1-d point).

More experiments Varying the amount of noise for PIC and PIC4 (starts with random 4-d point rather than a random 1-d point).

Social graphs seem to have Motivation Social graphs seem to have some aspects of randomness small diameter, giant connected components,.. some structure homophily, scale-free degree dist?

“Stochastic block model”, aka “Block-stochastic matrix”: More terms “Stochastic block model”, aka “Block-stochastic matrix”: Draw ni nodes in block i With probability pij, connect pairs (u,v) where u is in block i, v is in block j Special, simple case: pii=qi, and pij=s for all i≠j

Not? football

Not? books Artificial graph

Stochastic blockmodel graphs Alternative to spectral methods: can you directly maximize Pr(structure,parameters|data) for a reasonable model?

We’ve looked in depth at one generative model so far…. The next big question Alternative to spectral methods: can you directly maximize Pr(structure,parameters|data) for a reasonable model? How expensive is it? How well does it work? What’s the “reasonable model”? and how reasonable does it have to be? We’ve looked in depth at one generative model so far….

Outline Stochastic block models & inference question Review of text models Mixture of multinomials & EM LDA and Gibbs (or variational EM) Block models and inference Mixed-membership block models Multinomial block models and inference w/ Gibbs

Review – supervised Naïve Bayes Naïve Bayes Model: Compact representation   C C ….. WN W1 W2 W3 W M N b M b

Review – supervised Naïve Bayes Multinomial Naïve Bayes  For each document d = 1,, M Generate Cd ~ Mult( .| ) For each position n = 1,, Nd Generate wn ~ Mult(.|,Cd) C ….. WN W1 W2 W3 M b

Review – supervised Naïve Bayes Multinomial naïve Bayes: Learning Maximize the log-likelihood of observed variables w.r.t. the parameters: Convex function: global optimum Solution:

Review – unsupervised Naïve Bayes Mixture model: unsupervised naïve Bayes model Joint probability of words and classes: But classes are not visible:  Z C W N M b

Review – unsupervised Naïve Bayes Mixture model: learning Not a convex function No global optimum solution Solution: Expectation Maximization Iterative algorithm Finds local optimum Guaranteed to maximize a lower-bound on the log-likelihood of the observed data

Review – unsupervised Naïve Bayes Mixture model: EM solution P(label for d is k|…) E-step: Pr(label k|γ’s) M-step: Pr(word w|k,γ’s) Key capability: estimate distribution of latent variables given observed variables

Review - LDA

Review - LDA Motivation  Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) For each document d = 1,,M Generate d ~ D1(…) For each word n = 1,, Nd generate wn ~ D2( ¢ | θdn) Now pick your favorite distributions for D1, D2  w N M

Latent Dirichlet Allocation “Mixed membership” Latent Dirichlet Allocation  For each document d = 1,,M Generate d ~ Dir(¢ | ) For each position n = 1,, Nd generate zn ~ Mult( . | d) generate wn ~ Mult( . | zn) a z w N M  K

LDA’s view of a document

LDA topics

Latent Dirichlet Allocation Review - LDA Latent Dirichlet Allocation Parameter learning: Variational EM Numerical approximation using lower-bounds Results in biased solutions Convergence has numerical guarantees Gibbs Sampling Stochastic simulation unbiased solutions Stochastic convergence

Review - LDA Gibbs sampling Applicable when joint distribution is hard to evaluate but conditional distribution is known Sequence of samples comprises a Markov Chain Stationary distribution of the chain is the joint distribution Key capability: estimate distribution of one latent variables given the other latent variables and observed variables.

Why does Gibbs sampling work? What’s the fixed point? Stationary distribution of the chain is the joint distribution When will it converge (in the limit)? Graph defined by the chain is connected How long will it take to converge? Depends on second eigenvector of that graph

(k) Called “collapsed Gibbs sampling” since you’ve marginalized away some variables Fr: Parameter estimation for text analysis - Gregor Heinrich

Latent Dirichlet Allocation Review - LDA “Mixed membership” Latent Dirichlet Allocation  Randomly initialize each zm,n Repeat for t=1,…. For each doc m, word n Find Pr(zmn=k|other z’s) Sample zmn according to that distr. a 30? 100? z w N M (k) 

Outline Stochastic block models & inference question Review of text models Mixture of multinomials & EM LDA and Gibbs (or variational EM) Block models and inference Mixed-membership block models Multinomial block models and inference w/ Gibbs Beastiary of other probabilistic graph models Latent-space models, exchangeable graphs, p1, ERGM

Review - LDA Motivation  Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) For each document d = 1,,M Generate d ~ D1(…) For each word n = 1,, Nd generate wn ~ D2( ¢ | θdn) Docs and words are exchangeable.  w N M

Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks zp,zq are exchangeable a b zp zp zq p apq N N2

Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks zp,zq are exchangeable a Gibbs sampling: Randomly initialize zp for each node p. For t = 1… For each node p Compute zp given other z’s Sample zp b zp zp zq p apq N N2 See: Snijders & Nowicki, 1997, Estimation and Prediction for Stochastic Blockmodels for Groups with Latent Graph Structure

Mixed Membership Stochastic Block models q p zp. z.q p apq N N2 Airoldi et al, JMLR 2008

originally - PSK MLG 2009

Another mixed membership block model

Another mixed membership block model z=(zi,zj) is a pair of block ids nz = #pairs z qz1,i = #links to i from block z1 qz1,. = #outlinks in block z1 δ = indicator for diagonal M = #nodes

Another mixed membership block model

Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010 + lots of synthetic data Balasubramanyan, Lin, Cohen, NIPS w/s 2010

Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010

Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010

Much faster to learn from a large sparse graph than the MMSB model Other observations… Much faster to learn from a large sparse graph than the MMSB model Easy to combine with other generative components E.g., in order to model documents + graph structure