Topic models for corpora and for graphs

Slides:

Advertisements

Similar presentations

Topic models Source: Topic models, David Blei, MLSS 09.

Advertisements

A Tutorial on Learning with Bayesian Networks

Image Modeling & Segmentation

Mixture Models and the EM Algorithm

Hierarchical Dirichlet Processes

Expectation Maximization

An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.

Statistical Topic Modeling part 1

Generative Topic Models for Community Analysis

Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.

Latent Dirichlet Allocation a generative model for text

Visual Recognition Tutorial

Bayesian Networks Alan Ritter.

Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Latent Dirichlet Allocation (LDA) Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University and Arvind Ramanathan of Oak Ridge National.

Online Learning for Latent Dirichlet Allocation

Lecture 19: More EM Machine Learning April 15, 2010.

Integrating Topics and Syntax -Thomas L

Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.

CS Statistical Machine learning Lecture 24

An Introduction to Latent Dirichlet Allocation (LDA)

Lecture 2: Statistical learning primer for biologists

Latent Dirichlet Allocation

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

Techniques for Dimensionality Reduction

CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.

Web-Mining Agents Topic Analysis: pLSI and LDA

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Analysis of Social Media MLD , LTI William Cohen

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Latent Dirichlet Allocation (LDA)

Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003.

Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li

Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)

Analysis of Social Media MLD , LTI William Cohen

Probabilistic models for corpora and graphs. Review: some generative models Multinomial Naïve Bayes C W1W1 W2W2 W3W3 ….. WNWN  M  For each document.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

B. Freeman, Tomasz Malisiewicz, Tom Landauer and Peter Foltz,

Learning Deep Generative Models by Ruslan Salakhutdinov

Online Multiscale Dynamic Topic Models

Latent Dirichlet Allocation

Shuang-Hong Yang, Hongyuan Zha, Bao-Gang Hu NIPS2009

Classification of unlabeled data:

Speeding up LDA.

Statistical Models for Automatic Speech Recognition

Multimodal Learning with Deep Boltzmann Machines

Exam Review Session William Cohen.

Latent Variables, Mixture Models and EM

Latent Dirichlet Analysis

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

Bayesian Inference for Mixture Language Models

Stochastic Optimization Maximization for Latent Variable Models

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Michal Rosen-Zvi University of California, Irvine

Latent Dirichlet Allocation

LDA AND OTHER DIRECTED MODELS FOR MODELING TEXT

CS246: Latent Dirichlet Analysis

Junghoo “John” Cho UCLA

Topic models for corpora and for graphs

10-405: LDA Lecture 2.

Topic Models in Text Processing

Markov Networks.

Presentation transcript:

Topic models for corpora and for graphs

Social graphs seem to have Motivation Social graphs seem to have some aspects of randomness small diameter, giant connected components,.. some structure homophily, scale-free degree dist?

“Stochastic block model”, aka “Block-stochastic matrix”: More terms “Stochastic block model”, aka “Block-stochastic matrix”: Draw ni nodes in block i With probability pij, connect pairs (u,v) where u is in block i, v is in block j Special, simple case: pii=qi, and pij=s for all i≠j

Not? football

Not? books Artificial graph

Stochastic blockmodel graphs Alternative to spectral methods: can you directly maximize Pr(structure,parameters|data) for a reasonable model?

Outline Stochastic block models & inference question Review of text models Mixture of multinomials & EM LDA and Gibbs (or variational EM) Block models and inference Mixed-membership block models Multinomial block models and inference w/ Gibbs

Review – supervised Naïve Bayes Naïve Bayes Model: Compact representation   C C ….. WN W1 W2 W3 W M N b M b

Review – supervised Naïve Bayes Multinomial Naïve Bayes  For each document d = 1,, M Generate Cd ~ Mult( | ) For each position n = 1,, Nd Generate wn ~ Mult( |,Cd) C ….. WN W1 W2 W3 M b

Review – supervised Naïve Bayes Multinomial naïve Bayes: Learning Maximize the log-likelihood of observed variables w.r.t. the parameters: Convex function: global optimum Solution:

Review – unsupervised Naïve Bayes Mixture model: unsupervised naïve Bayes model Joint probability of words and classes: But classes are not visible:  Z C W N M b

Review – unsupervised Naïve Bayes Mixture model: learning Not a convex function No global optimum solution Solution: Expectation Maximization Iterative algorithm Finds local optimum Guaranteed to maximize a lower-bound on the log-likelihood of the observed data

Review – unsupervised Naïve Bayes Mixture model: EM solution E-step: M-step: Key capability: estimate distribution of latent variables given observed variables

Review - LDA

Review - LDA Motivation  Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) For each document d = 1,,M Generate d ~ D1(…) For each word n = 1,, Nd generate wn ~ D2( | θdn) Now pick your favorite distributions for D1, D2  w N M

Latent Dirichlet Allocation “Mixed membership” Latent Dirichlet Allocation  For each document d = 1,,M Generate d ~ Dir( | ) For each position n = 1,, Nd generate zn ~ Mult( | d) generate wn ~ Mult( | zn) a z w N M  K

LDA’s view of a document

LDA topics

Latent Dirichlet Allocation Review - LDA Latent Dirichlet Allocation Parameter learning: Variational EM Numerical approximation using lower-bounds Results in biased solutions Convergence has numerical guarantees Gibbs Sampling Stochastic simulation unbiased solutions Stochastic convergence

Review - LDA Gibbs sampling Applicable when joint distribution is hard to evaluate but conditional distribution is known Sequence of samples comprises a Markov Chain Stationary distribution of the chain is the joint distribution Key capability: estimate distribution of one latent variables given the other latent variables and observed variables.

Why does Gibbs sampling work? What’s the fixed point? Stationary distribution of the chain is the joint distribution When will it converge (in the limit)? Graph defined by the chain is connected How long will it take to converge? Depends on second eigenvector of that graph

Called “collapsed Gibbs sampling” since you’ve marginalized away some variables Fr: Parameter estimation for text analysis - Gregor Heinrich

Latent Dirichlet Allocation Review - LDA “Mixed membership” Latent Dirichlet Allocation  Randomly initialize each zm,n Repeat for t=1,…. For each doc m, word n Find Pr(zmn=k|other z’s) Sample zmn according to that distr. a z w N M 

Outline Stochastic block models & inference question Review of text models Mixture of multinomials & EM LDA and Gibbs (or variational EM) Block models and inference Mixed-membership block models Multinomial block models and inference w/ Gibbs Beastiary of other probabilistic graph models Latent-space models, exchangeable graphs, p1, ERGM

Review - LDA Motivation  Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) For each document d = 1,,M Generate d ~ D1(…) For each word n = 1,, Nd generate wn ~ D2( | θdn) Docs and words are exchangeable.  w N M

Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks zp,zq are exchangeable a b zp zp zq p apq N N2

Stochastic Block models: assume 1) nodes w/in a block z and 2) edges between blocks zp,zq are exchangeable a Gibbs sampling: Randomly initialize zp for each node p. For t = 1… For each node p Compute zp given other z’s Sample zp b zp zp zq p apq N N2 See: Snijders & Nowicki, 1997, Estimation and Prediction for Stochastic Blockmodels for Groups with Latent Graph Structure

Mixed Membership Stochastic Block models q p zp. z.q p apq N N2 Airoldi et al, JMLR 2008

Mixed Membership Stochastic Block models

Mixed Membership Stochastic Block models

Parkkinen et al paper

Another mixed membership block model

Another mixed membership block model z=(zi,zj) is a pair of block ids nz = #pairs z qz1,i = #links to i from block z1 qz1,. = #outlinks in block z1 δ = indicator for diagonal M = #nodes

Another mixed membership block model

Another mixed membership block model

Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010 + lots of synthetic data Balasubramanyan, Lin, Cohen, NIPS w/s 2010

Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010

Experiments Balasubramanyan, Lin, Cohen, NIPS w/s 2010