CLEar (Clairaudient Ear) A Realtime Online Observatory for Bursty and Viral Events A demonstration of CLEar System.

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
DISCOVERING EVENT EVOLUTION GRAPHS FROM NEWSWIRES Christopher C. Yang and Xiaodong Shi Event Evolution and Event Evolution Graph: We define event evolution.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Role of Online Social Networks during disasters & political movements Saptarshi Ghosh Department of Computer Science and Technology Bengal Engineering.
Integrated Digital Event Web Archive and Library (IDEAL) and Aid for Curators Archive-It Partner Meeting Montgomery, Alabama Mohamed Farag & Prashant Chandrasekar.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Presenter: Liu, Ya Tian, Yujia Pham, Anh TwitterMonitor: Trend Detection over the Twitter Stream EvenTweet: Online Localized Event Detection from Twitter.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
SNA: Research Dr. Nawaporn Wisitpongphan 1. Michael Mathioudakis, Nick Koudas TwitterMonitor: Trend Detection over the Twitter Stream Michael Mathioudakis,
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Automatic Discovery of Useful Facet Terms Wisam Dakka – Columbia University Rishabh Dayal – Columbia University Panagiotis G. Ipeirotis – NYU.
SNOW Workshop, 8th April 2014 Real-time topic detection with bursty ngrams: RGU participation in SNOW 2014 challenge Carlos Martin and Ayse Goker (Robert.
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
IVITA Workshop Summary Session 1: interactive text analytics (Session chair: Professor Huamin Qu) a) HARVEST: An Intelligent Visual Analytic Tool for the.
Information Retrieval Review
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Blogosphere  What is blogosphere?  Why do we need to study Blog-space or Blogosphere?
Ch 4: Information Retrieval and Text Mining
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Business Intelligence
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
Proposal for Term Project J. H. Wang Mar. 2, 2015.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Microblogs: Information and Social Network Huang Yuxin.
Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches John HannonJohn Hannon, Mike Bennett, Barry SmythBarry Smyth.
Chapter 6: Information Retrieval and Web Search
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
You Are What You Tag Yi-Ching Huang and Chia-Chuan Hung and Jane Yung-jen Hsu Department of Computer Science and Information Engineering Graduate Institute.
Management of Digital Content in Business Environments Constantine D. Spyropoulos Director of Institute of Informatics & Telecommunications NCSR “Demokritos”
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Anant Pradhan PET: A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC)
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Jointly Modeling Topics, Events and User Interests on Twitter Qiming DiaoJing Jiang School of Information Systems Singapore Management University.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
A code-centric cluster-based approach for searching online support forums for programmers Christopher Scaffidi, Christopher Chambers, Sheela Surisetty.
On Frequent Chatters Mining Claudio Lucchese 1 st HPC Lab Workshop 6/15/12 1st HPC Workshp - Claudio Lucchese.
Unsupervised Streaming Feature Selection in Social Media
TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
ENHANCING CLUSTER LABELING USING WIKIPEDIA David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab SIGIR’09.
A Nonparametric Method for Early Detection of Trending Topics Zhang Advisor: Prof. Aravind Srinivasan.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Topic Modeling for Short Texts with Auxiliary Word Embeddings
User Modeling for Personal Assistant
Proposal for Term Project
DM-Group Meeting Liangzhe Chen, Nov
Representing Documents Through Their Readers
On Using Semantic Complex Event Processing for Dynamic Demand Response
#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),
Chapter 5: Information Retrieval and Web Search
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Building Topic/Trend Detection System based on Slow Intelligence
Connecting the Dots Between News Article
Presentation transcript:

CLEar (Clairaudient Ear) A Realtime Online Observatory for Bursty and Viral Events A demonstration of CLEar System

The Architecture of CLEar To sum up, an meaningful event observatory should equip the following functions:  Detection of a bursty topic as soon as it emerges;  Early prediction if the bursty topic is likely to go viral;  Summarization of related bursty topics into semantically coherent events that can be monitored;  Contextualization of the events with its temporal evolution and corresponding coverage across other news media. 1/13

Recommended Materials A Tutorial at WWW 2014 : Towards a Social Media Analytics Platform: Event Detection and User Profiling for Twitter 2/13 A Tutorial at KDD 2009: Tutorial on Event Detection Hila Becker Hila Becker

Why Bursty and Viral?  Compared against traditional news media, Twitter have been recognized as much more responsive and reliable sources to pick up bursty events.  Trigger a surge of public interest within a short period of time.  Capable of handling both planned and unplanned event. 3/13

Topic Detection in Social Media 4/13  Document-pivot : for a new tweet, assign it to a simliar existing event or take it as a new event if no similar events existed(This tweet is also called the first story of this event).  Sasa Petrovic.etc Streaming first story detection with application to twitter HLT ‘10  Feature-pivot : some bursty features of hidden events would show an sharply increase than excepted when an event is happening.  Chen Lin.etc Generating event storylines from microblogs CIKM’12  Chenliang Li.etc Twevent: segment-based event detection from tweets CIKM’12 Bursty Term Detection Bursty Term Grouping Candidate Event Filtering #MH370 lives Southern #eat Tmr korean Sleep indian save #MH370 lives Southern #eat Tmr korean Sleep indian save MH370 Southern indian Korean save lives #eattmr sleep MH370 Southern indian Korean save lives #eattmr sleep

The shortness of existing Works  Existing works mostly focus on event detection and extraction without any post-processing.  The lack of a well-established analysis for an event limits its utility. 5/13 Many challenging research problems Popularity prediction Topic clustering Event summarization Event contextualization …

Popularity Prediction 6/13  User behaviors like replying and retweeting provide new mechanism for information diffusion.  Topic popularity can be measured by the size of involved users.  Prediction of topic popularity can not only have a recognize of event trends, but also remove noisy and spam bursty topics at an early stage.  The challenges of this problem come from the uncertainty in information diffusion path and insufficient information at the early stage of a burst, offering little clue as to whether the detected bursty topic would sustain its virality or simply die down quickly.

Topic Clustering  Due to the existence of many duplicate and semantically close topics, it is desirable to remove duplicate topics and group together topics to form a coherent event.  A single-pass incremental clustering problem.  Simply based on co-occurrence of bursty keywords likely to be absent because they are much shorter compared to formal document and largely depend on the detection algorithm. 8 7/13 The essential problem of clustering is define a metric to measure the similarity between topic and exiting event(cluster).

 Measure the similarity between topic and event from the following perspective: Content Similarity  An intuitive approach to combine those individual similarities is using different weights. However, the number of different weight combination is huge and we don’t have some prior knowledge about the weights.  Learning weighting scheme through a classification model to form a unified similarity metric. Topic Clustering cont. 8/13 User Similarity Entity Similarity Volume Similarity How to combine those individual similarities ? Time Similarity

 Traditional summarization methods mainly focus on content summarization to extract representative tweets from an event relevant tweet set.  Besides, we propose to summarize this event from structure and user perspective.  A fundamental problem is Sub-event Detection. Event Summarization 9/13

Sub-event Detection 10/13  An event usually contains some more fine-grained stages and detection algorithms can’t detect all stage of an event generally.  Detection of all possible sub-events provide a basis for study some deeply properties of event.  Both volume [2,3] and content [1] of this event provide a signal to sub-event occurrence. Compared against volume curve, we think that the content is more trustful due to the volume curve is largely depended on the retrieval results and user publish pattern. [1] Akshaya Iyengar.etc Content-based prediction of temporal boundaries for events in twitter. Socialcom 2011 [2] Jeffrey Nichols.etc Summarizing sporting events using twitter IUI’2012 [3] Arkaitz Zubiaga.etc Towards real-time summarization of scheduled events from twitter streams To solve this problem, we should overcome the following two difficulties: Retrieval : How to retrieve high-quality tweets about this event? Sub-event : How to detect all sub-events in a online manner? To solve this problem, we should overcome the following two difficulties: Retrieval : How to retrieve high-quality tweets about this event? Sub-event : How to detect all sub-events in a online manner?

12 1. How to retrieve high-quality tweets about this event? 11/13  Common practice : using event keywords as a query to search in tweet collections.  The following three factors remains a large obstacle to employ standard retrieval methods: -A. Seemingly relevant tweets with good textual quality might not be truly relevant to the event; - B. Tweets highly relevant to the event might not contain any of the query keywords; - C. Query keywords might can’t represent the event comprehensively and even provide a noisy indicator.  To solve A, besides relevance score returned by Elasticsearch, we can integrate other features like tweet-specific features, publisher features to reorder the search result.  To solve B and C, we can use event keyword expansion, take the burstiness of term [1] into consideration besides traditional TF-IDF value during the expand term selection. [1] Metzler D, Cai C, Hovy E. Structured event retrieval over microblog archives[C] ACL 2012:

2. How to detect all sub-events in a online manner?  Topic Model : high complexity and its output are usually a general topic.  Event Boundary Prediction : can only divide this event into before, during, after.  We propose to firstly divide the event duration into equal-sized non-overlapping timespan, then merge adjacent timespans into an sub-event along a chronological order.  Finally, we should verify sub-event’s popularity and reliability to filter spurious sub-events. The reliability can measured by total followers of all publishers while the popularity can reflected by the number of retweets. 12/13

Event Contextualization 13/13  Find a representative picture of this event.  Find some related news about this event.