Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLEar (Clairaudient Ear) A Realtime Online Observatory for Bursty and Viral Events A demonstration of CLEar System.

Similar presentations


Presentation on theme: "CLEar (Clairaudient Ear) A Realtime Online Observatory for Bursty and Viral Events A demonstration of CLEar System."— Presentation transcript:

1 CLEar (Clairaudient Ear) A Realtime Online Observatory for Bursty and Viral Events A demonstration of CLEar System

2 The Architecture of CLEar To sum up, an meaningful event observatory should equip the following functions:  Detection of a bursty topic as soon as it emerges;  Early prediction if the bursty topic is likely to go viral;  Summarization of related bursty topics into semantically coherent events that can be monitored;  Contextualization of the events with its temporal evolution and corresponding coverage across other news media. 1/13

3 Recommended Materials A Tutorial at WWW 2014 : Towards a Social Media Analytics Platform: Event Detection and User Profiling for Twitter 2/13 A Tutorial at KDD 2009: Tutorial on Event Detection Hila Becker Hila Becker http://www.cs.columbia.edu/~hila/http://www.cs.columbia.edu/~hila/

4 Why Bursty and Viral?  Compared against traditional news media, Twitter have been recognized as much more responsive and reliable sources to pick up bursty events.  Trigger a surge of public interest within a short period of time.  Capable of handling both planned and unplanned event. 3/13

5 Topic Detection in Social Media 4/13  Document-pivot : for a new tweet, assign it to a simliar existing event or take it as a new event if no similar events existed(This tweet is also called the first story of this event).  Sasa Petrovic.etc Streaming first story detection with application to twitter HLT ‘10  Feature-pivot : some bursty features of hidden events would show an sharply increase than excepted when an event is happening.  Chen Lin.etc Generating event storylines from microblogs CIKM’12  Chenliang Li.etc Twevent: segment-based event detection from tweets CIKM’12 Bursty Term Detection Bursty Term Grouping Candidate Event Filtering #MH370 lives Southern #eat Tmr korean Sleep indian save #MH370 lives Southern #eat Tmr korean Sleep indian save MH370 Southern indian Korean save lives #eattmr sleep MH370 Southern indian Korean save lives #eattmr sleep

6 The shortness of existing Works  Existing works mostly focus on event detection and extraction without any post-processing.  The lack of a well-established analysis for an event limits its utility. 5/13 Many challenging research problems 1 234 Popularity prediction Topic clustering Event summarization Event contextualization …

7 Popularity Prediction 6/13  User behaviors like replying and retweeting provide new mechanism for information diffusion.  Topic popularity can be measured by the size of involved users.  Prediction of topic popularity can not only have a recognize of event trends, but also remove noisy and spam bursty topics at an early stage.  The challenges of this problem come from the uncertainty in information diffusion path and insufficient information at the early stage of a burst, offering little clue as to whether the detected bursty topic would sustain its virality or simply die down quickly.

8 Topic Clustering  Due to the existence of many duplicate and semantically close topics, it is desirable to remove duplicate topics and group together topics to form a coherent event.  A single-pass incremental clustering problem.  Simply based on co-occurrence of bursty keywords likely to be absent because they are much shorter compared to formal document and largely depend on the detection algorithm. 8 7/13 The essential problem of clustering is define a metric to measure the similarity between topic and exiting event(cluster).

9  Measure the similarity between topic and event from the following perspective: Content Similarity  An intuitive approach to combine those individual similarities is using different weights. However, the number of different weight combination is huge and we don’t have some prior knowledge about the weights.  Learning weighting scheme through a classification model to form a unified similarity metric. Topic Clustering cont. 8/13 User Similarity Entity Similarity Volume Similarity How to combine those individual similarities ? Time Similarity

10  Traditional summarization methods mainly focus on content summarization to extract representative tweets from an event relevant tweet set.  Besides, we propose to summarize this event from structure and user perspective.  A fundamental problem is Sub-event Detection. Event Summarization 9/13

11 Sub-event Detection 10/13  An event usually contains some more fine-grained stages and detection algorithms can’t detect all stage of an event generally.  Detection of all possible sub-events provide a basis for study some deeply properties of event.  Both volume [2,3] and content [1] of this event provide a signal to sub-event occurrence. Compared against volume curve, we think that the content is more trustful due to the volume curve is largely depended on the retrieval results and user publish pattern. [1] Akshaya Iyengar.etc Content-based prediction of temporal boundaries for events in twitter. Socialcom 2011 [2] Jeffrey Nichols.etc Summarizing sporting events using twitter IUI’2012 [3] Arkaitz Zubiaga.etc Towards real-time summarization of scheduled events from twitter streams To solve this problem, we should overcome the following two difficulties: Retrieval : How to retrieve high-quality tweets about this event? Sub-event : How to detect all sub-events in a online manner? To solve this problem, we should overcome the following two difficulties: Retrieval : How to retrieve high-quality tweets about this event? Sub-event : How to detect all sub-events in a online manner?

12 12 1. How to retrieve high-quality tweets about this event? 11/13  Common practice : using event keywords as a query to search in tweet collections.  The following three factors remains a large obstacle to employ standard retrieval methods: -A. Seemingly relevant tweets with good textual quality might not be truly relevant to the event; - B. Tweets highly relevant to the event might not contain any of the query keywords; - C. Query keywords might can’t represent the event comprehensively and even provide a noisy indicator.  To solve A, besides relevance score returned by Elasticsearch, we can integrate other features like tweet-specific features, publisher features to reorder the search result.  To solve B and C, we can use event keyword expansion, take the burstiness of term [1] into consideration besides traditional TF-IDF value during the expand term selection. [1] Metzler D, Cai C, Hovy E. Structured event retrieval over microblog archives[C] ACL 2012: 646-655.

13 2. How to detect all sub-events in a online manner?  Topic Model : high complexity and its output are usually a general topic.  Event Boundary Prediction : can only divide this event into before, during, after.  We propose to firstly divide the event duration into equal-sized non-overlapping timespan, then merge adjacent timespans into an sub-event along a chronological order.  Finally, we should verify sub-event’s popularity and reliability to filter spurious sub-events. The reliability can measured by total followers of all publishers while the popularity can reflected by the number of retweets. 12/13

14 Event Contextualization 13/13  Find a representative picture of this event.  Find some related news about this event.


Download ppt "CLEar (Clairaudient Ear) A Realtime Online Observatory for Bursty and Viral Events A demonstration of CLEar System."

Similar presentations


Ads by Google