Topic Modeling with Network Regularization Md Mustafizur Rahman.

Slides:

Advertisements

Similar presentations

Topic models Source: Topic models, David Blei, MLSS 09.

Advertisements

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.

Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.

Mixture Models and the EM Algorithm

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Learning HMM parameters

Expectation Maximization

Supervised Learning Recap

The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

Segmentation and Fitting Using Probabilistic Methods

Statistical Topic Modeling part 1

. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

A Statistical Model for Domain- Independent Text Segmentation Masao Utiyama and Hitoshi Isahura Presentation by Matthew Waymost.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Generative Topic Models for Community Analysis

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.

Lecture 5: Learning models using EM

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

Expectation Maximization Algorithm

1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.

Expectation-Maximization

What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.

Learning Bayesian Networks

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.

EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Gaussian Mixture Models and Expectation Maximization.

Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará

1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

EM and expected complete log-likelihood Mixture of Experts

Text Classification, Active/Interactive learning.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Hidden Markov Models Usman Roshan CS 675 Machine Learning.

Probabilistic Topic Models

27. May Topic Models Nam Khanh Tran L3S Research Center.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Lecture 17 Gaussian Mixture Models and Expectation Maximization

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.

CS Statistical Machine learning Lecture 24

Lecture 2: Statistical learning primer for biologists

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

CSE 517 Natural Language Processing Winter 2015

Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.

Logistic Regression William Cohen.

Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Statistical Models for Automatic Speech Recognition

Latent Variables, Mixture Models and EM

Hidden Markov Models Part 2: Algorithms

Bayesian Models in Machine Learning

Stochastic Optimization Maximization for Latent Variable Models

Topic models for corpora and for graphs

Speech recognition, machine learning

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

GhostLink: Latent Network Inference for Influence-aware Recommendation

Speech recognition, machine learning

Presentation transcript:

Topic Modeling with Network Regularization Md Mustafizur Rahman

Outline  Introduction  Topic Models  Findings & Ideas  Methodologies  Experimental Analysis

Making sense of text  Suppose you want to learn something about a corpus that’s too big to read need to make sense of… What topics are trending today on Twitter? half a billion tweets daily What research topics receive grant funding (and from whom)? 80,000 active NIH grants What issues are considered by Congress (and which politicians are interested in which topic)? hundreds of bills each year Are certain topics discussed more in certain languages on Wikipedia? Wikipedia (it’s big) Why don’t we just throw all these documents at the computer and see what interesting patterns it finds?

Preview  Topic models can help you automatically discover patterns in a corpus  unsupervised learning  Topic models automatically…  group topically-related words in “topics”  associate tokens and documents with those topics

Twitter topics

So what is “topic”?  Loose idea: a grouping of words that are likely to appear in the same context  A hidden structure that helps determine what words are likely to appear in a corpus  e.g. if “war” and “military” appear in a document, you probably won’t be surprised to find that “troops” appears later on  why? it’s not because they’re all nouns  …though you might say they all belong to the same topic

You’ve seen these ideas before  Most of NLP is about inferring hidden structures that we assume are behind the observed text  parts of speech(POS), syntax trees  Hidden Markov models (HMM) for POS  the probability of the word token depends on the state  the probability of that token’s state depends on the state of the previous token (in a 1st order model)  The states are not observed, but you can infer them using the forward-backward/viterbi algorithm

Topic models  Take an HMM, but give every document its own transition probabilities (rather than a global parameter of the corpus)  This let’s you specify that certain topics are more common in certain documents  whereas with parts of speech, you probably assume this doesn’t depend on the specific document  We’ll also assume the hidden state of a token doesn’t actually depend on the previous tokens  “0th order”  individual documents probably don’t have enough data to estimate full transitions  plus our notion of “topic” doesn’t care about local interactions

Topic models  The probability of a token is the joint probability of the word and the topic label  P(word=Apple, topic=1 | θ d, β 1) = P(word=Apple | topic=1, β 1) P(topic=1 | θ d) each topic has distribution, β k over words (the emission probabilities) global across all documents each document has distribution θ d over topics (the 0th order “transition” probabilities) local to each document

Estimating the parameters ( θ, β )  Need to estimate the parameters θ, β  want to pick parameters that maximize the likelihood of the observed data  This is easy if all the tokens were labeled with topics (observed variables)  just counting  But we don’t actually know the (hidden) topic assignments  Expectation Maximization (EM)  1. Compute the expected value of the variables, given the current model parameters  2. Pretend these expected counts are real and update the parameters based on these  now parameter estimation is back to “just counting”  3. Repeat until convergence

Topic Models  Probabilistic Latent Semantics Analysis (PLSA)  Latent Dirichlet Allocation(LDA)

Probabilistic Latent Semantic Analysis (PLSA) d z w  M Select document d ~ Mult(  ) For each position n = 1, , N d generate z n ~ Mult( ¢ |  d ) generate w n ~ Mult( ¢ |  z n ) dd  N Topic distribution

Parameter estimation in PLSA E-Step: Word w in doc d is generated - from topic j - from background Posterior: application of Bayes rule M-Step: Re-estimate - mixing weights - word-topic distribution Fractional counts contributing to - using topic j in generating d - generating w from topic j Sum over all docs in the collection

Likelihood of PLSA β θdθd Count of word w in document d

Graph (Revisited)  A network associated with text collection C is a graph G = {V, E}, where V is a set of vertices and E is set of edges  Vertex v as a subset of document D v  In author graph, a vertex is all the documents a author published, that is a vertex is set of documents  Edge {u, v} is a binary relation between to vertices u and v  If two authors contributes to a paper/document

Observation  Collection of data with network structure attached  Author-topic analysis  Spatial Topic

Findings  In a network like author-topic graph,  Vertices which are connected to each other should have similar topic assignment Idea  Apply some kind of regularization on the topic models  Tweak the log likelihood of the PLSA L(C)

Regularized Topic Model  Likelihood L(C) from PLSA  Regularized data likelihood will be  Minimizing the O(C, G) will give us the topics that best fit the collection C

Regularized Topic Model  Regularizer  A harmonic function  Where f( θ,u ) is a weighting function of topics on vertex u

Parameter Estimation  When λ = 0, the O(C, G) boils down to L(C)  So, simply apply the parameter estimation of PLSA  E Step

Parameter Estimation  When λ = 0, the O(C, G) boils down to L(C)  So, simply apply the parameter estimation of PLSA  M Step

Parameter Estimation (M-Step)  When λ != 0, the complete expected data likelihood Lagrange Multipliers

Parameter Estimation (M-Step)  The estimation of P(w| θ j ) does not rely on the regularizer  Calculation is same as when λ = 0  The estimation of P( θ j |d) relies on the regularizer  Not same as when λ = 0  No closed form  Way-1: Apply Newton Raphson Method  Way-2: Solve the linear equations

Experimental Analysis  Two set of experiments  DBLP Author-Topic Analysis  Geographic Topic Analysis  Baseline  PLSA  DataSet  Conference proceedings from 4 conferences (WWW, SIGIR,KDD, NIPS)  Blogset from Google blog

Experimental Analysis

Topical Communities Analysis (Graph Methods) Spring EmbedderGower Metric Scaling

Topical Communities Analysis (Regularized PLSA)

Topic Mapping

Geographical Topic Analysis

Conclusion  Regularize a topic modeling  Using a network structure from graph  Develop a method to solve the constrained optimization problem  Perform exhaustive analysis  Comparison against PLSA

Courtesy  Some of the slides in the presentation are borrowed from  Prof. Hongning Wang, University of Virginia  Prof. Michael Paul, John Hopkins University