Presenter: Liu, Ya Tian, Yujia Pham, Anh TwitterMonitor: Trend Detection over the Twitter Stream EvenTweet: Online Localized Event Detection from Twitter.

Slides:



Advertisements
Similar presentations
SEARCHING THE BLOGOSPHERE
Advertisements

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
DISCOVERING EVENT EVOLUTION GRAPHS FROM NEWSWIRES Christopher C. Yang and Xiaodong Shi Event Evolution and Event Evolution Graph: We define event evolution.
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.
Learning more about Facebook and Twitter. Introduction  What we’ve covered in the Social Media webinar series so far  Agenda for this call Facebook.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
CLEar (Clairaudient Ear) A Realtime Online Observatory for Bursty and Viral Events A demonstration of CLEar System.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
SNA: Research Dr. Nawaporn Wisitpongphan 1. Michael Mathioudakis, Nick Koudas TwitterMonitor: Trend Detection over the Twitter Stream Michael Mathioudakis,
SNOW Workshop, 8th April 2014 Real-time topic detection with bursty ngrams: RGU participation in SNOW 2014 challenge Carlos Martin and Ayse Goker (Robert.
Personalized Abstraction of Broadcasted American Football Video by Highlight Selection Noboru Babaguchi (Professor at Osaka Univ.) Yoshihiko Kawai and.
Flickr Tags Network Mustafa Kilavuz. Tags A tag is a keyword Search, spam detection, reputation systems, personal organization and metadata.
Video Table-of-Contents: Construction and Matching Master of Philosophy 3 rd Term Presentation - Presented by Ng Chung Wing.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
On Burstiness-Aware Search for Document Sequences Theodoros Lappas Benjamin Arai Manolis Platakis Dimitrios Kotsakos Dimitrios Gunopulos SIGKDD 2009.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 Real Time, Online Detection of Abandoned Objects in Public Areas Proceedings of the 2006 IEEE International Conference on Robotics and Automation Authors.
Visualization Tools for Twitter A review and analysis of visualization tools in the Twitter domain By Joseph Vincze.
Media trends - market data correlation Assuming mass media events can have a significant impact to the market environment - service determines how informative.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
TwitterSearch : A Comparison of Microblog Search and Web Search
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Friends and Locations Recommendation with the use of LBSN
MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.
COMPUTER-ASSISTED PLAGIARISM DETECTION PRESENTER: CSCI 6530 STUDENT.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal VideoConference Archives Indexing System.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Prediction of Influencers from Word Use Chan Shing Hei.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
NEW EVENT DETECTION AND TOPIC TRACKING STEPS. PREPROCESSING Removal of check-ins and other redundant data Removal of URL’s maybe Stemming of words using.
1 Clarifying Sensor Anomalies using Social Network feeds * University of Illinois at Urbana Champaign + U.S. Army Research Lab ++ IBM Research, USA Prasanna.
Providing User Context for Mobile and Social Networking Applications A. C. Santos et al., Pervasive and Mobile Computing, vol. 6, no. 1, pp , 2010.
Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Presented by: Idan Aharoni
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Tracking Groups of People for Video Surveillance Xinzhen(Elaine) Wang Advisor: Dr.Longin Latecki.
A Nonparametric Method for Early Detection of Trending Topics Zhang Advisor: Prof. Aravind Srinivasan.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Bump hunting In The Dark: Local Discrepancy Maximization on Graphs
Fast Subsequence Matching in Time-Series Databases.
Using Social Media to Enhance Emergency Situation Awareness
Summary Presented by : Aishwarya Deep Shukla
#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),
Data Integration for Relational Web
Bursty and Hierarchical Structure in Streams
Analyzing social media data to monitor public health trends
Presentation transcript:

Presenter: Liu, Ya Tian, Yujia Pham, Anh TwitterMonitor: Trend Detection over the Twitter Stream EvenTweet: Online Localized Event Detection from Twitter

Michael Mathioudakis, Nick Koudas TwitterMoniter: Trend Detection over the Twitter Stream

INTRODUCTION TwitterMonitor, a system that performs trend detection over the Twitter stream. Identifies emerging topics on Twitter in real time and provides analytics that synthesize and accurate description of each topic. Michael Mathioudakis, Nick Koudas, Nick Koudas, TwitterMonitor: trend detection over the twitter stream., In: SIGMOD Conference, pp , 2010

TREND DETECTION AND ANALYSIS Trend detection in two steps. Analyzes trends in a third step: Identifies ‘bursty’ keywords, Groups bursty keywords into trends, Extracts additional information to discover interesting aspects of it. Michael Mathioudakis, Nick Koudas, Nick Koudas, TwitterMonitor: trend detection over the twitter stream., In: SIGMOD Conference, pp , 2010

Detecting Bursty Keywords Keyword: An unusually high rate in the stream. New topic emerged and seeks to explore in the further. Algorithm: QueueBurst 1) One-pass. 2) Real-time. 3) Adjustable against ‘spurious’ bursts. 4) Adjustable against spam. 5) theoretically sound. Michael Mathioudakis, Nick Koudas, Nick Koudas, TwitterMonitor: trend detection over the twitter stream., In: SIGMOD Conference, pp , 2010

From Bursty Keywords to Trends Michael Mathioudakis, Nick Koudas, Nick Koudas, TwitterMonitor: trend detection over the twitter stream., In: SIGMOD Conference, pp , 2010

Trend Analysis Compose a more accurate description: Identify more keywords associated with it. Context extraction algorithms (PCA, SVD, etc.) search the recent history and reports the most correlated keywords. Grapevine’s entity extractor to identify the entities. Frequently cited sources are added to the trend description. Identifies frequent geographical origins. A chart will be produced for each trend and gets updated. Michael Mathioudakis, Nick Koudas, Nick Koudas, TwitterMonitor: trend detection over the twitter stream., In: SIGMOD Conference, pp , 2010

Architecture Index Michael Mathioudakis, Nick Koudas, Nick Koudas, TwitterMonitor: trend detection over the twitter stream., In: SIGMOD Conference, pp. 1156, 2010

Architecture: Back-End The StreamListener module receives sample which consists 10M out of 50M tweets per day, via the Twitter API. Then seperates tweet information into fields and exports two feeds: Reporting tweets with all their fields to an Index module Reporting only the text and timestamp of tweets to Bursty Keywords Detection module Michael Mathioudakis, Nick Koudas, Nick Koudas, TwitterMonitor: trend detection over the twitter stream., In: SIGMOD Conference, pp , 2010

Architecture: Back-End(Cont.) After bursty keywords are identified and grouped into trends, the Index is contacted by the Trend Analysis module to retrieve information on tweets that belong to each trend. Michael Mathioudakis, Nick Koudas, Nick Koudas, TwitterMonitor: trend detection over the twitter stream., In: SIGMOD Conference, pp , 2010

Architecture: Front-End Michael Mathioudakis, Nick Koudas, Nick Koudas, TwitterMonitor: trend detection over the twitter stream., In: SIGMOD Conference, pp. 1157, 2010

Architecture: Front-End (Cont.) A webpage reports recent trends in real time An interface allows users to rank trends by recency or current activity rate and submit their own short description for trends. Use an additional tab to display daily trends. Michael Mathioudakis, Nick Koudas, Nick Koudas, TwitterMonitor: trend detection over the twitter stream., In: SIGMOD Conference, pp , 2010

Demonstration Every trend will be represented by the entities, by the related bursty keywords. The audience will have the option to use the interface in order to acquire more information. Œ They will be shown additional keywords and skim through representative tweets  They will be able to track a trend’s popularity over time and spot the origin. Ž They will interact with the system by tracking the displayed trends according different criteria and submitting descriptions. Michael Mathioudakis, Nick Koudas, Nick Koudas, TwitterMonitor: trend detection over the twitter stream., In: SIGMOD Conference, pp , 2010

Hamed Abdelhaq, Christian Sengstock, and Michael Gertz EvenTweet: Online Localized Event Detection from Twitter

1. Introduction 2. Localized Event Detection Temporal Keyword Extraction Spatial Keyword Identification Keyword Clustering Cluster Scoring 3. System overview 4. Demonstration

INTRODUCTION EvenTweet, a system to detect localized events from a stream of tweets in real-time. Only about 1% of tweets are georeferenced. Focuses on detecting localized events from a stream of tweets in real-time. Adopts a continuous analysis of the most recent tweets within a time-based sliding window. Described by 1) related keywords & 2) estimation of the start time and the geographic location. Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

INTRODUCTION Tracks evolution over time: a fine-grained temporal resolution. A scoring scheme the gives a score of each event over time. Don’t estimate geo-coordinates for non-geotagged tweets, but be able to identify localized events using a possibly small amont of geo-tagged tweets: Both geo- and non-geo-tagged tweets are used to identify words best describing events. Only geo-tagged tweets are used to estimate the spatial distribution of such words. Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

1. Introduction 2. Localized Event Detection Temporal Keyword Extraction Spatial Keyword Identification Keyword Clustering Cluster Scoring 3. System overview 4. Demonstration

Localized Event Detection Basic Definitions Event: a phenomenon that stimulates people to post messages for a certain period of time. Localized events: Events happen within a small region, having a small spatial extent. (e.g., concerts, soccer matches, road works) Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

Localized Event A localized event is described as a tuple: le = (el, et, K) el is event location, represented as a small set of connected rectangular. et is the start time. K is a set of words frequently published during the event time and at that location. Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

Online Detection Basic Notation : Each tweet tw = (W, uid, l, t) W: a set of words uid: a user id l = (lon, lat): a geographic location t: timestamp Use a timeline divided into a sequence of equal-length time frames (…f c-1, f c ), where f c denotes the current time frame. Each time frame represents a short time interval during which tweets are posted. Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

Basic Notation (cont.) We use a time-based sliding window win k fc composed of k time frames and f c as its end point. The detection procedure of EvenTweet is triggered every time a new time frame elapses. Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

1. Introduction 2. Localized Event Detection Temporal Keyword Extraction Spatial Keyword Identification Keyword Clustering Cluster Scoring 3. System overview 4. Demonstration

Temporal Keyword Extraction Extraction of words showing a bursty frequency in the current time frame (these words are called keywords, Y c ) Given a set of words W c from the tweets published during the recent time frame f c, extract a subset Y c ⊆ W c which represents words likely to describe localized events. Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

Temporal Keyword Extraction (cont.) Use discrepancy paradigm to extract keywords based on their burstiness. Assume: during timeframe f c u(w, c): normalized by the number of users publishing tweets containing word w Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

Temporal Keyword Extraction (cont.) In addition, hist w = (u(w, 1), u(w, 2), …, u(w, m)) is a fixed historical sequence of usage values for w collected before the current time frame f c, such that m < c. It is used when the system needs to describe the normal behavior of word w over previous time frames. Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

Temporal Keyword Extraction (cont.) The discrepancy paradigm measures the deviation between the word usage value u(w,c) in the current time frame and an expected word usage baseline, b(w), which estimated from hist w. hist w is drawn from Gaussian distribution with mean b(w). μ and deviation b(w). σ Higher deviation, higher burstiness degree Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

Temporal Keyword Extraction (cont.) The burtinesss degree of a word w is the z-score defined: b_degree(w, c) :=( u(w,c)−b(w). μ )/b(w). σ Choose words whose burstiness degree is larger than two standard deviations above the mean as keywords. Keywords observed for the first time will have μ =0 and σ =0. Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

1. Introduction 2. Localized Event Detection Temporal Keyword Extraction Spatial Keyword Identification Keyword Clustering Cluster Scoring 3. System overview 4. Demonstration

Spacial Keyword Identification Find keywords which are highly localized. Only use georeferenced tweets. g grid G Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

Spacial Keyword Identification Only use georeferenced tweets. g -Calculate Entropy H(S i ) -Discard all keywords with entropy larger than a threshold ρ. Why? -We’ll have Y c = set of filtered keywords Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

Keyword Clustering Each S i is a vector. Clustering event keywords using their S i Similarity calculation: Cosine similarity Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013) Cosine Similarity, Wikipedia,

Keyword Clustering -There is a distance threshold Т -If a new keyword falls out of the threshold, it forms a new cluster itself. Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013) Saed Sayad, Kmeans clustering,

Cluster Scoring To determine which clusters of keywords is more likely being referred to localized events, filter out spurious clusters. To score a cluster: 1. Score each keyword 2. Sum up all scores Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

Cluster Scoring Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

1. Introduction 2. Localized Event Detection Temporal Keyword Extraction Spatial Keyword Identification Keyword Clustering Cluster Scoring 3. System overview 4. Demonstration

System Overview Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)

Demonstration Hamed Abdelhaq, Christian Sengstock, Michael Gertz: EvenTweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp (2013)