Download presentation
Presentation is loading. Please wait.
1
Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William Cohen and Eric Xing Machine Learning Department Carnegie Mellon University
2
8/28/2007/9:30amICDM’07 HPDM workskop2/28 Introduction Statistical topic modeling: an attractive framework for topic discovery –Completely unsupervised –Models text very well Lower perplexity compared to unigram models –Reveals meaningful semantic patterns –Can help summarize and visualize document collections –e.g.: PLSA, LDA, DPM, DTM, CTM, PA
3
8/28/2007/9:30amICDM’07 HPDM workskop3/28 Introduction A common assumption in all the variants: –Exchangeability: “bag of words” assumption –Topics represented as a ranked list of words Consequences: –Word Correlation information is lost e.g.: “white-house” vs. “white” and “house” Long distance correlations
4
8/28/2007/9:30amICDM’07 HPDM workskop4/28 Introduction Objective: –To capture correlations between words within topics Motivation: –More interpretable representation of topics as a network of words rather than a list –Helps better visualize and summarize document collections –May reveal unexpected relationships and patterns within topics
5
8/28/2007/9:30amICDM’07 HPDM workskop5/28 Past Work: Topic Models Bigram topic models [Wallach, ICML 2006] Requires KV(K-1) parameters Only captures local dependencies Does not model sparsity of correlations Does not capture “within-topic” correlations
6
8/28/2007/9:30amICDM’07 HPDM workskop6/28 Past work: Other approaches Hyperspace Analog to Language (HAL) [Lund and Burges, Cog. Sci., ‘96] –Word pair correlation measured as a weighted count of number of times they occur within a fixed length window – Weight of an occurrence / 1/(mutual distance)
7
8/28/2007/9:30amICDM’07 HPDM workskop7/28 Past work: Other approaches Hyperspace Analog to Language (HAL) [Lund and Burges, Cog. Sci., ‘96] –Plusses: Sparse solutions, scalability –Minuses: Only unearths global correlations, not semantic correlations –E.g.: “river – bank”, “bank – check” Only local dependencies
8
8/28/2007/9:30amICDM’07 HPDM workskop8/28 Past work: Other approaches Query expansion in IR –Similar in spirit: finds words that highly co- occur with the query words –However, not a corpus visualization tool: requires a context to operate on Wordnet –Semantic networks –Human labeled: not directly related to our goal
9
8/28/2007/9:30amICDM’07 HPDM workskop9/28 Our approach L 1 norm regularization –Known to enforce sparse solutions Sparsity permits scalability –Convex optimization problem Globally optimal solutions – Recent advances in learning structure of graphical models: L 1 regularization framework asymptotically leads to true structure
10
8/28/2007/9:30amICDM’07 HPDM workskop10/28 Background:LASSO Example: linear regression Regularization used to improve generalizability –E.g.1: Ridge regression: L 2 norm regularization –E.g.2: Lasso: L 1 norm regularization
11
8/28/2007/9:30amICDM’07 HPDM workskop11/28 Background: LASSO Lasso encourages sparse solutions
12
8/28/2007/9:30amICDM’07 HPDM workskop12/28 Background: Gaussian Random Fields Multivariate Gaussian distribution Random field structure: G = (V,E) –V: set of all variables {X 1, ,X p} –(s,t) 2 E, -1 st 0 –X s ? X u | X N(s) where u N(s)
13
8/28/2007/9:30amICDM’07 HPDM workskop13/28 Background: Gaussian Random Fields Estimating the graph structure of GRF from data [Meinshausen and Buhlmann, Annals. Stats., 2006] –Regress each variable onto others imposing L 1 penalty to encourage sparsity –Estimated neighborhood:
14
8/28/2007/9:30amICDM’07 HPDM workskop14/28 Background: Gaussian Random Fields True Graph Estimated graph Courtesy: [Meinshausen and Buhlmann, Annals. Stats., 2006]
15
8/28/2007/9:30amICDM’07 HPDM workskop15/28 Background: Gaussian Random Fields Application to topic models: CTM [Blei and Lafferty, NIPS, 2006]
16
8/28/2007/9:30amICDM’07 HPDM workskop16/28 Background: Gaussian Random Fields Application to CTM :[Blei & Lafferty, Annals. Appl. Stats., ‘07]
17
8/28/2007/9:30amICDM’07 HPDM workskop17/28 Structure learning of an MRF Ising model L 1 regularized conditional likelihood learns true structure asymptotically [Wainwright, Ravikumar and Lafferty, NIPS’06]
18
8/28/2007/9:30amICDM’07 HPDM workskop18/28 Structure learning of an MRF Courtesy: [Wainwright, Ravikumar and Lafferty, NIPS’06]
19
8/28/2007/9:30amICDM’07 HPDM workskop19/28 Sparse Word Graphs Algorithm –Run LDA on the document collection and obtain topic assignments –Convert topic assignments for each document into K binary vectors X: –Assume an MRF for each topic with X as underlying data –Apply structure learning for MRF using regularized conditional likelihood
20
8/28/2007/9:30amICDM’07 HPDM workskop20/28 Sparse Word Graphs
21
8/28/2007/9:30amICDM’07 HPDM workskop21/28 Sparse Word Graphs: Scalability We still run V logistic regression problems, each of size V for each topic: O(KV 2 ) ! –However, each example is very sparse –L 1 penalty results in sparse solutions –Can run each topic in parallel –Efficient interior point based L 1 regularized logistic regression [Koh, Kim & Boyd, JMLR,’07]
22
8/28/2007/9:30amICDM’07 HPDM workskop22/28 Experiments Small AP corpus –2.2K Docs, 10.5K unique words Ran 10 topic LDA model Used = 0.1 in L 1 logistic regression Took just 45 min. per topic Very sparse solutions –Computes only under 0.1% of the total number of possible edges
23
8/28/2007/9:30amICDM’07 HPDM workskop23/28 Topic “Business”: neighborhood of top LDA terms
24
8/28/2007/9:30amICDM’07 HPDM workskop24/28 Topic “Business”: neighborhood of top edges
25
8/28/2007/9:30amICDM’07 HPDM workskop25/28 Topic “War”: neighborhood of top LDA terms
26
8/28/2007/9:30amICDM’07 HPDM workskop26/28 Topic “War”: neighborhood of top edges
27
8/28/2007/9:30amICDM’07 HPDM workskop27/28 Concluding remarks Pros –A highly scalable algorithm for capturing within topic word correlations –Captures both short distance and long distance correlations –Makes topics more interpretable Cons –Not a complete probabilistic model Significant modeling challenge since the correlations are latent
28
8/28/2007/9:30amICDM’07 HPDM workskop28/28 Concluding remarks Applications of Sparse Word Graphs –Better document summarization and visualization tool –Word sense disambiguation –Semantic query expansion Future Work –Evaluation on a “real task” –Build a unified statistical model
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.