Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

Slides:



Advertisements
Similar presentations
Topic models Source: Topic models, David Blei, MLSS 09.
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Statistical Topic Modeling part 1
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John.
Generative Topic Models for Community Analysis
Topic Modeling with Network Regularization Md Mustafizur Rahman.
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Latent Dirichlet Allocation a generative model for text
Temporal Causal Modeling with Graphical Granger Methods
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Slide 1 EE3J2 Data Mining EE3J2 Data Mining - revision Martin Russell.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Multiscale Topic Tomography Ramesh Nallapati, William Cohen, Susan Ditmore, John Lafferty & Kin Ung (Johnson and Johnson Group)
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Scalable Text Mining with Sparse Generative Models
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Topic Modeling with Network Regularization Qiaozhu Mei, Deng Cai, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Information Retrieval in Practice
Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Conditional Topic Random Fields Jun Zhu and Eric P. Xing ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011.
Graphical models for part of speech tagging
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Sparse Inverse Covariance Estimation with Graphical LASSO J. Friedman, T. Hastie, R. Tibshirani Biostatistics, 2008 Presented by Minhua Chen 1.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Chapter 23: Probabilistic Language Models April 13, 2004.
CMU at TDT 2004 — Novelty Detection Jian Zhang and Yiming Yang Carnegie Mellon University.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Latent Dirichlet Allocation
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences Lu Bai,
Automatic Labeling of Multinomial Topic Models
NTU & MSRA Ming-Feng Tsai
Analysis of Social Media MLD , LTI William Cohen
Introduction to Gaussian Process CS 478 – INTRODUCTION 1 CS 778 Chris Tensmeyer.
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
KNN & Naïve Bayes Hongning Wang
Learning Deep Generative Models by Ruslan Salakhutdinov
The topic discovery models
Linear Regression (continued)
Multimodal Learning with Deep Boltzmann Machines
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
The topic discovery models
Compact Query Term Selection Using Topically Related Text
Estimating Networks With Jumps
The topic discovery models
John Lafferty, Chengxiang Zhai School of Computer Science
Topic models for corpora and for graphs
Latent Dirichlet Allocation
Topic models for corpora and for graphs
Topic Models in Text Processing
What is Artificial Intelligence?
Presentation transcript:

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William Cohen and Eric Xing Machine Learning Department Carnegie Mellon University

8/28/2007/9:30amICDM’07 HPDM workskop2/28 Introduction Statistical topic modeling: an attractive framework for topic discovery –Completely unsupervised –Models text very well Lower perplexity compared to unigram models –Reveals meaningful semantic patterns –Can help summarize and visualize document collections –e.g.: PLSA, LDA, DPM, DTM, CTM, PA

8/28/2007/9:30amICDM’07 HPDM workskop3/28 Introduction A common assumption in all the variants: –Exchangeability: “bag of words” assumption –Topics represented as a ranked list of words Consequences: –Word Correlation information is lost e.g.: “white-house” vs. “white” and “house” Long distance correlations

8/28/2007/9:30amICDM’07 HPDM workskop4/28 Introduction Objective: –To capture correlations between words within topics Motivation: –More interpretable representation of topics as a network of words rather than a list –Helps better visualize and summarize document collections –May reveal unexpected relationships and patterns within topics

8/28/2007/9:30amICDM’07 HPDM workskop5/28 Past Work: Topic Models Bigram topic models [Wallach, ICML 2006] Requires KV(K-1) parameters Only captures local dependencies Does not model sparsity of correlations Does not capture “within-topic” correlations

8/28/2007/9:30amICDM’07 HPDM workskop6/28 Past work: Other approaches Hyperspace Analog to Language (HAL) [Lund and Burges, Cog. Sci., ‘96] –Word pair correlation measured as a weighted count of number of times they occur within a fixed length window – Weight of an occurrence / 1/(mutual distance)

8/28/2007/9:30amICDM’07 HPDM workskop7/28 Past work: Other approaches Hyperspace Analog to Language (HAL) [Lund and Burges, Cog. Sci., ‘96] –Plusses: Sparse solutions, scalability –Minuses: Only unearths global correlations, not semantic correlations –E.g.: “river – bank”, “bank – check” Only local dependencies

8/28/2007/9:30amICDM’07 HPDM workskop8/28 Past work: Other approaches Query expansion in IR –Similar in spirit: finds words that highly co- occur with the query words –However, not a corpus visualization tool: requires a context to operate on Wordnet –Semantic networks –Human labeled: not directly related to our goal

8/28/2007/9:30amICDM’07 HPDM workskop9/28 Our approach L 1 norm regularization –Known to enforce sparse solutions Sparsity permits scalability –Convex optimization problem Globally optimal solutions – Recent advances in learning structure of graphical models: L 1 regularization framework asymptotically leads to true structure

8/28/2007/9:30amICDM’07 HPDM workskop10/28 Background:LASSO Example: linear regression Regularization used to improve generalizability –E.g.1: Ridge regression: L 2 norm regularization –E.g.2: Lasso: L 1 norm regularization

8/28/2007/9:30amICDM’07 HPDM workskop11/28 Background: LASSO Lasso encourages sparse solutions

8/28/2007/9:30amICDM’07 HPDM workskop12/28 Background: Gaussian Random Fields Multivariate Gaussian distribution Random field structure: G = (V,E) –V: set of all variables {X 1, ,X p} –(s,t) 2 E,  -1 st  0 –X s ? X u | X N(s) where u  N(s)

8/28/2007/9:30amICDM’07 HPDM workskop13/28 Background: Gaussian Random Fields Estimating the graph structure of GRF from data [Meinshausen and Buhlmann, Annals. Stats., 2006] –Regress each variable onto others imposing L 1 penalty to encourage sparsity –Estimated neighborhood:

8/28/2007/9:30amICDM’07 HPDM workskop14/28 Background: Gaussian Random Fields True Graph Estimated graph Courtesy: [Meinshausen and Buhlmann, Annals. Stats., 2006]

8/28/2007/9:30amICDM’07 HPDM workskop15/28 Background: Gaussian Random Fields Application to topic models: CTM [Blei and Lafferty, NIPS, 2006]

8/28/2007/9:30amICDM’07 HPDM workskop16/28 Background: Gaussian Random Fields Application to CTM :[Blei & Lafferty, Annals. Appl. Stats., ‘07]

8/28/2007/9:30amICDM’07 HPDM workskop17/28 Structure learning of an MRF Ising model L 1 regularized conditional likelihood learns true structure asymptotically [Wainwright, Ravikumar and Lafferty, NIPS’06]

8/28/2007/9:30amICDM’07 HPDM workskop18/28 Structure learning of an MRF Courtesy: [Wainwright, Ravikumar and Lafferty, NIPS’06]

8/28/2007/9:30amICDM’07 HPDM workskop19/28 Sparse Word Graphs Algorithm –Run LDA on the document collection and obtain topic assignments –Convert topic assignments for each document into K binary vectors X: –Assume an MRF for each topic with X as underlying data –Apply structure learning for MRF using regularized conditional likelihood

8/28/2007/9:30amICDM’07 HPDM workskop20/28 Sparse Word Graphs

8/28/2007/9:30amICDM’07 HPDM workskop21/28 Sparse Word Graphs: Scalability We still run V logistic regression problems, each of size V for each topic: O(KV 2 ) ! –However, each example is very sparse –L 1 penalty results in sparse solutions –Can run each topic in parallel –Efficient interior point based L 1 regularized logistic regression [Koh, Kim & Boyd, JMLR,’07]

8/28/2007/9:30amICDM’07 HPDM workskop22/28 Experiments Small AP corpus –2.2K Docs, 10.5K unique words Ran 10 topic LDA model Used  = 0.1 in L 1 logistic regression Took just 45 min. per topic Very sparse solutions –Computes only under 0.1% of the total number of possible edges

8/28/2007/9:30amICDM’07 HPDM workskop23/28 Topic “Business”: neighborhood of top LDA terms

8/28/2007/9:30amICDM’07 HPDM workskop24/28 Topic “Business”: neighborhood of top edges

8/28/2007/9:30amICDM’07 HPDM workskop25/28 Topic “War”: neighborhood of top LDA terms

8/28/2007/9:30amICDM’07 HPDM workskop26/28 Topic “War”: neighborhood of top edges

8/28/2007/9:30amICDM’07 HPDM workskop27/28 Concluding remarks Pros –A highly scalable algorithm for capturing within topic word correlations –Captures both short distance and long distance correlations –Makes topics more interpretable Cons –Not a complete probabilistic model Significant modeling challenge since the correlations are latent

8/28/2007/9:30amICDM’07 HPDM workskop28/28 Concluding remarks Applications of Sparse Word Graphs –Better document summarization and visualization tool –Word sense disambiguation –Semantic query expansion Future Work –Evaluation on a “real task” –Build a unified statistical model