1 A Probabilistic Model for Bursty Topic Discovery in Microblogs Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng CAS Key Laboratory of Web Data.

Slides:



Advertisements
Similar presentations
Alexander Kotov, ChengXiang Zhai, Richard Sproat University of Illinois at Urbana-Champaign.
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Face Alignment by Explicit Shape Regression
Mixture Models and the EM Algorithm
One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16.
Statistical Topic Modeling part 1
A Joint Model of Text and Aspect Ratings for Sentiment Summarization Ivan Titov (University of Illinois) Ryan McDonald (Google Inc.) ACL 2008.
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
Generative Topic Models for Community Analysis
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Caimei Lu et al. (KDD 2010) Presented by Anson Liang.
Topic Modeling with Network Regularization Md Mustafizur Rahman.
Mining Cross-network Association for YouTube Video Promotion Ming Yan Institute of Automation, C hinese Academy of Sciences May 15, 2014.
TwitterSearch : A Comparison of Microblog Search and Web Search
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Rui Yan, Yan Zhang Peking University
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Mining Cross-network Association for YouTube Video Promotion Ming Yan, Jitao Sang, Changsheng Xu*. 1 Institute of Automation, Chinese Academy of Sciences,
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Universit at Dortmund, LS VIII
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Microblogs: Information and Social Network Huang Yuxin.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Cognitive Processes Chapter 8. Studying CognitionLanguage UseVisual CognitionProblem Solving and ReasoningJudgment and Decision MakingRecapping Main Points.
Building Topic Models in a Federated Digital Library Through Selective Document Exclusion ASIST 2011 New Orleans, LA October 10, 2011 Miles Efron Peter.
Anant Pradhan PET: A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC)
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
What Is Text Mining? Also known as Text Data Mining Process of examining large collections of unstructured textual resources in order to generate new.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.
Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences Lu Bai,
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
Link Distribution on Wikipedia [0407]KwangHee Park.
Automatic Labeling of Multinomial Topic Models
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Modeling and Visualizing Information Propagation in Microblogging Platforms Chien-Tung Ho, Cheng-Te Li, and Shou-De Lin National Taiwan University ASONAM.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Extracting Mobile Behavioral Patterns with the Distant N-Gram Topic Model Lingzi Hong Feb 10th.
Artificial Intelligence
Advanced Artificial Intelligence Evolutionary Search Algorithm
CS 188: Artificial Intelligence
Web Mining Department of Computer Science and Engg.
Michal Rosen-Zvi University of California, Irvine
Junghoo “John” Cho UCLA
Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu
Presentation transcript:

1 A Probabilistic Model for Bursty Topic Discovery in Microblogs Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng CAS Key Laboratory of Web Data Science and Technology

Bursty Topics in Microblogs 2 Bursty topics: novel topics attracting wide interest (hot events, activities, discussions) public opinion analysis business intelligence news clues tracking … Message recommendation Valuable information

Problems & Challenges Microblog Posts are very short Conventional topic models (e.g., LDA and PLSA) are not effective over short texts How to discover topics in such short texts? Microblog Posts are very diverse and noisy Lots of pointless babbles, daily chatting and other non-bursty content How to distinguish bursty topics from other topics? 3

Our Work We propose a probabilistic model to solve the two challenges in a principled and effective way 4 Exploit the rich global word co-occurrence to learn topics (following our previous work biterm topic model) How to learn topics over short texts? Exploit the burstiness of biterms as prior knowledge for bursty topic discovery How to distinguish bursty topics from non-bursty content?

5 Biterm Topic Model (BTM) for Short Texts (Yan et.al WWW’13) BTM BTM can better learn topics over short texts Directly model the word co-occurrence fully exploit the rich global word co-occurrence to overcome the data sparsity problem in short documents But BTM learns general topics rather than bursty topics  Co-occurring word pair

Observations A bursty biterm is more likely to be generated from some bursty topic A non-bursty biterm is less likely to be generated from any bursty topics 6 “world cup” bursty topic about the World Cup 2014 “good day” Non-bursty topic

Bursty Probability of a Biterm 7 How likely a biterm will be generated from some bursty topic? Count in non- bursty topics Count in bursty topics Actual count

Bursty Biterm Topic Model (BBTM) 8 Topic type

Randomly assign a topic for each biterm Repeatedly update the topic for each biterm in a sequential way until convergence Parameter Inference by Gibbs Sampling 9 Popularity of the topic Closeness between the two words and the bursty topic BBTM always choose the bursty and relevant biterms to construct bursty topics

Experiments 10

Accuracy of Bursty Topics 11

Novelty More words are overlap between two time slices, the novelty is lower Coherence and Novelty of Bursty Topics Coherence More relevant the top words in topics, the coherence is higher 12

Efficiency 13 BBTM-S costs much less time than other methods since it only used a subset of biterms for training BBTM is more efficient than IBTM and UTM. since they waste time to learn non-bursty topics

Summary We propose the bursty biterm topic model for bursty topic discovery in microblogs It exploits the rich global word co-occurrence for better topic learning over short texts It exploits the burstiness of biterms to distill bursty topics automatically Furture works Improve the estimation of bursty probability Improve topic representations with external data 14

15 Thank You ! Code: Our related work: