Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará Carlotta Domeniconi Department of Computer Science George Mason University SDM 2009
Outline Introduction and related work Online LDA (OLDA) Parameter Generation Sliding history window Contribution weights Experiments Conclusion and future work
Introduction When a topic is observed at a certain time, it is more likely to appear in the future previously discovered topics hold important information about the underlying structure of data Incorporating such information in future knowledge discovery can enhance the inferred topics
Related Work Q. Sun, R. Li et al. ACL LDA-based Fisher kernel to measure the text semantic similarity between blocks of LDA documents X. Wang et al. ICDM 2007 Topical N-Gram model that automatically identified feasible N-grams based on the context that surround it X. Phan et al. IW3C a classifier on both a small set of labeled documents in addition to an LDA topic model estimated from Wikipedia.
Tracking Topics NdNd M t K z t i w t i t t t t t Time (time between t & t+1 = ε ) StSt Topic Evolution Tracking Priors Construction Emerging Topic Detection t t t t+ 1 NdNd M t+ 1 K z i t+ 1 w i t+ 1 t+ 1 t+ 1 t+ 1 t+ 1 S t+ 1 Emerging Topic List t+ 1 t+ 1 t+1 t+1 t + 1 Online LDA (OLDA)
Inference Process Current stream Historic observations Parameter Generation Simple inference problem Gibbs Sampling Current stream Historic observations
Topic Evolution Tracking Topic alignment over time Handles changes in lexicon, topic drift Topic 1 (0.65)Bank (0.44), money (0.35), loan (0.21) Topic 2 (0.35)Factory (0.53), production (0.34), labor (0.13) Topic 1 (0.43)Bank (0.5), credit (0.32), money (0.18) Topic 2 (0.57)Factory (0.48), cost (0.32), manufacturing (0.2) t Time t+ 1 P(topic) P(word|topic) Aligned topics over time
Sliding History Window Consider all topic-word distributions within a “ sliding history window ” (δ) Alternatives for keeping track of history at time t full memory, δ= t short memory, δ=1 Intermediate memory, δ= c Matrix Evolution Matrix Dictionary Topic distribution over time
Contribution Control Evolution Tuning Parameters ω Individual weights of models Decaying history: ω 1 < ω 2 < … < ω δ Equal contributions: ω 1 = ω 2 = … = ω δ Total weight of history (vs. weight of new observations) Balanced weights (sum=1) Biased toward the past (sum>1) Biased toward the future (sum<1)
Parameter Generation Priors of Topic distribution over words at time t+1 Generate topic distribution
Experimental Design “Matlab Topic Modeling Toolbox”, by Mark Steyvers and Tom Griffiths Datasets: NIPS Proceedings from 1,740 papers, 13,649 unique words, 2,301,375 word tokens 13 streams, size from 90 to 250 doc’s per stream Reuters News from 26-FEB-1987 to 19-OCT-1987 10,337 documents; 12,112 unique words; 793,936 word tokens 30 streams (29/340 doc’s, 1/517 doc’s) Baselines: OLDAfixed: no memory OLDA (ω(1) ): short memory Performance Evaluation measure: Perplexity Test set: documents of next year or stream
Reuters OLDA with fixed β vs. OLDA with semantic β No memory
Reuters OLDA with different window size and weights Increasing window size enhanced prediction Incremental history information (δ>1,sum>1) did not improve topic estimation at all Increase window size short memory Equal contribution Incremental History Information
NIPS OLDA with Different Window No memory Short memory Increasing window size enhanced prediction w.r.t. short memory Window size greater than 3 enhanced prediction Effect of total weight
NIPS OLDA with Different Total Weight No memory Sum of weight = 1 Decrease sum of weights Models with lower total weight resulted in better prediction
NIPS & Reuters OLDA with Different Total Weight Variable sum(ω) δ = 2 Decrease total sum of weights Increase total sum of weights
NIPS OLDA with Equal vs Decaying History Contribution
Conclusions the effect of embedding semantic information in LDA topic modeling of text streams Parameter generation based on topical structures inferred in the past Semantic embedding enhances OLDA prediction Effect of Total influence of history, History window size, and Equal or decaying contributions Future work use of prior-knowledge effect of embedded historic semantics on detecting emerging and/or periodic topics