On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
CLEar (Clairaudient Ear) A Realtime Online Observatory for Bursty and Viral Events A demonstration of CLEar System.
By Lee Betancourt Director of Communications and Public Relations Jane Myers Public Relations, Communications and Social Media Coordinator Social Media.
Personalized News Josh Alspector, Alek Kolcz - University of Colorado at Colorado Springs.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Search Engines and Information Retrieval
MARS: Applying Multiplicative Adaptive User Preference Retrieval to Web Search Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
Social networking FACEBOOK AND TWITTER. Then In the beginning of Facebook, there were very few features. There were no status updates, messages, photo.
Emerging Topic Detection on Twitter (Cataldi et al., MDMKDD 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
1 Opinion Spam and Analysis (WSDM,08)Nitin Jindal and Bing Liu Date: 04/06/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling.
Search Engines and Information Retrieval Chapter 1.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Date: 2012/10/18 Author: Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka Source: World Wide Web conference (WWW "12) Advisor: Jia-ling, Koh Speaker: Jiun.
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Information Filtering LBSC 796/INFM 718R Douglas W. Oard Session 10, April 13, 2011.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Automatic Selection of Social Media Responses to News Date : 2013/10/02 Author : Tadej Stajner, Bart Thomee, Ana-Maria Popescu, Marco Pennacchiotti and.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Microblogs: Information and Social Network Huang Yuxin.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
OCLC Online Computer Library Center 1 Social Media and Advocacy.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
1 Blog site search using resource selection 2008 ACM CIKM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Reporter: Shau-Shiang Hung( 洪紹祥 ) Adviser:Shu-Chen Cheng( 鄭淑真 ) Date:99/06/15.
PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:
Relevance Feedback Hongning Wang
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS Date: 2012/11/22 Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan Szpektor Source:
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science.
{ Adaptive Relevance Feedback in Information Retrieval Yuanhua Lv and ChengXiang Zhai (CIKM ‘09) Date: 2010/10/12 Advisor: Dr. Koh, Jia-Ling Speaker: Lin,
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Learning Literature Search Models from Citation Behavior
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.
Topic: Semantic Text Mining
Presentation transcript:

On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan Yeh

Outline  Introduction  Adaptive Filtering of Tweets  Handling Sparsity  Topic Drifting  Experiments  Conclusions 2

Introduction  Social media have grown as massive networks of information publishers and consumers. (ex: Twitter)  Consumers may have difficulties to keep up with the vast amounts of real-time information.  Publishers have no way to ensure that their content can reach their targeted audience.  Information filtering (IF) can help both publishers and consumers by ensuring that only relevant information is delivered to the right audiences. 3

Introduction  In this paper, we study the problem of real-time filtering in Twitter.  Traditional state-of-the-art news filtering techniques are not as effective when applied on tweets.  Challenge: 1.Sparsity  The acute sparsity issue in filtering tweets is a unique challenge caused by the shortness of tweets. 2.Drift  Different aspects (subtopics) of the original topic that become more popular over time,  Certain events that occurred and drifted the topics into new aspects. 4

Introduction  We devise a solution by building on an effective news filtering technique that is based on the text classification approach of Incremental Rocchio.  Solution: 1.Sparsity  Use a query expansion (QE) approach to enrich the representation of the user's profile (the explicit relevant judgments of the user) during the filtering process. 2.Drift  Modify the classifier such that it recognizes short-term interests (emerging subtopics).  Balances between the importance of short-term interests and the long-term interests in the overall topic. 5

Outline  Introduction  Adaptive Filtering of Tweets  Incremental Ricchio  Regularised Logistic Regression  Handling Sparsity  Topic Drifting  Experiments  Conclusions 6

Adaptive Filtering of Tweets  Incremental Rocchio (RC) 7 New tweets User judged relevant New tweets update Topic : Football World Cup

Adaptive Filtering of Tweets 8 Profile tweets New tweets

Outline  Introduction  Adaptive Filtering of Tweets  Handling Sparsity  Topic Drifting  Experiments  Conclusions 9

Handling Sparsity  Use a query expansion (QE) approach to enrich the representation of the user's profile. 10 Timeline 10:00 am Topic : Football World Cup Tweets that are search result of “Football World Cup.” New tweets … Pseudo-relevant tweets

Outline  Introduction  Adaptive Filtering of Tweets  Handling Sparsity  Topic Drifting  Experiments  Conclusions 11

Topic Drifting  Our idea is to dynamically change the centroid over time by introducing a decay factor that balances between short-term and long-term interests.  Long-term interests  The overall topic.  Ex: Football World Cup  Short-term interests  Emerging subtopics.  Ex: player, goal 12

Topic Drifting 13 n = 3 Tweet that post in the current calendar day

Topic Drifting C.Event detection 14 Timeline 9:50 am9:40 am 9:30 am 9:20 am 9:10 am 10:00 am 10:10 am Step 3 : Use Grubb’s test to determines if the tweeting rate about the topic at the current time is an outlier. Step 1 : Use DFReekLIM weighting model to score individual tweets for a topic. Step 2 : Use CompSUM voting technique to estimate the final score of the tweets set.

Outline  Introduction  Adaptive Filtering of Tweets  Handling Sparsity  Topic Drifting  Experiments  Conclusions 15

Experiments  Tweet 2011 Jan 23 to Feb 8  tweets  Use Dirichlet language model for weighting the terms in the vector.  h1: tweets that do not contain at least one query term are not considered for similarity computation and are regarded as irrelevant.  h2: tweets that contain at least one term in either the query or the first positive example.  Precision, recall, F_0.5, T11SU 16

Experiments 17

Experiments 18

Experiments 19

Outline  Introduction  Adaptive Filtering of Tweets  Handling Sparsity  Topic Drifting  Experiments  Conclusions 20

Conclusion  In this paper, we approach the problem of real-time filtering in the Twitter Microblogging platform.  To tackle the acute sparsity problem, we apply query expansion to derive terms or related tweets for a richer initialisation of the user interests within the profile.  To deal with drift, we modify the user profile to balance between the importance of the short-term interests, i.e. emerging subtopics, and the long-term interests in the overall topic. 21