Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.

Slides:



Advertisements
Similar presentations
The Social Scientific Method An Introduction to Social Science Research Methodology.
Advertisements

Delft University of Technology A Comparative Study of Users’ Microblogging Behavior on Sina Weibo and Twitter Qi Gao Web Information Systems Delft University.
Albert Gatt Corpora and Statistical Methods Lecture 13.
Promoting Your Business Through Twitter ©2009, All rights reserved Fox Coaching Associates.
Influence and Passivity in Social Media Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman Social Computing Lab, HP Labs.
SOCIAL MEDIA & PHYSICAL ACTIVITY PROMOTION: MAKING THE CONNECTIONS Presented by: Sandra De Freitas
Georgiana Ifrim, Bichen Shi, Igor Brigadir Insight Centre for Data Analytics University College Dublin Event Detection in Twitter using Aggressive Filtering.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Disciplinary Differences in Selected Scholars' Twitter Transmissions Kim Holmberg 1 and Mike Thelwall 2 1 |
Language and Computation Group 18 th November 2011.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
SNOW Workshop, 8th April 2014 Real-time topic detection with bursty ngrams: RGU participation in SNOW 2014 challenge Carlos Martin and Ayse Goker (Robert.
We Know #Tag: Does the Dual Role Affect Hashtag Adoption? Lei Yang 1, Tao Sun 2, Ming Zhang 2, Qiaozhu Mei 1 1 School of Information, the University.
WIMS 2014, Thessaloniki, June 2014 A soft frequent pattern mining approach for textual topic detection Georgios Petkos, Symeon Papadopoulos, Yiannis Kompatsiaris.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
CHURN PREDICTION MODEL IN RETAIL BANKING USING FUZZY C- MEANS CLUSTERING Džulijana Popović Consumer Finance, Zagrebačka banka d.d. Consumer Finance, Zagrebačka.
Social Theory Driven Operational Forecasting of Civil Unrest Event Outbreaks Final Project Presentation Peter Wu Apr 30, 2015.
The Geography of Online News Engagement Martin Saveski, MIT Media Lab, Cambridge, USA Daniele Quercia, Yahoo Labs, Barcelona, Spain Amin Mantrach, Yahoo.
Discovering Emerging Topics in Social Streams via Link Anomaly Detection.
Emerging Topic Detection on Twitter (Cataldi et al., MDMKDD 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences
Social Media Facebook, Twitter, Google+, etc.. What is Social Technology?  Communication tools  Interactive tools  Examples?
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
Sentiment Analysis of Social Media Content using N-Gram Graphs Authors: Fotis Aisopos, George Papadakis, Theordora Varvarigou Presenter: Konstantinos Tserpes.
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
TWITTER What is Twitter, a Social Network or a News Media? Haewoon Kwak Changhyun Lee Hosung Park Sue Moon Department of Computer Science, KAIST, Korea.
AUTOMATED TEXT CATEGORIZATION: THE TWO-DIMENSIONAL PROBABILITY MODE Abdulaziz alsharikh.
Microblogs: Information and Social Network Huang Yuxin.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
CHAPTER 2 Statistical Inference, Exploratory Data Analysis and Data Science Process cse4/587-Sprint
Prediction of Influencers from Word Use Chan Shing Hei.
A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.
Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts Zhe Zhao Paul Resnick Qiaozhu Mei Presentation Group 2.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
FACTORS AFFECTING RESPONSE RATES FOR REAL-LIFE MIR QUERIES Jin Ha LeeM. Cameron JonesJ. Stephen Downie Graduate School of Library and Information Science.
Social Media: The Basics Teresa Marks School Community Oral Health Conference Friday, October 16, 2015.
Twitter Games: How Successful Spammers Pick Targets Vasumathi Sridharan, Vaibhav Shankar, Minaxi Gupta School of Informatics and Computing, Indiana University.
Social Networks and Surveillance: Evaluating Suspicion by Association Ryan P. Layfield Dr. Bhavani Thuraisingham Dr. Latifur Khan Dr. Murat Kantarcioglu.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Information Transfer through Online Summarizing and Translation Technology Sanja Seljan*, Ksenija Klasnić**, Mara Stojanac*, Barbara Pešorda*, Nives Mikelić.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.
HOW TO DOMINATE TWITTER Communicating with 140 Characters..
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
On Frequent Chatters Mining Claudio Lucchese 1 st HPC Lab Workshop 6/15/12 1st HPC Workshp - Claudio Lucchese.
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
A Nonparametric Method for Early Detection of Trending Topics Zhang Advisor: Prof. Aravind Srinivasan.
Week of March 23 Partial correlations Semipartial correlations
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit.
ENHANCING CLUSTERING BLOG DOCUMENTS BY UTILIZING AUTHOR/READER COMMENTS Beibei Li, Shuting Xu, Jun Zhang Department of Computer Science University of Kentucky.
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
14. June 2016 Mapping democracy Indira Ishmurzina
A Simple Approach for Author Profiling in MapReduce
Erasmus University Rotterdam
#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),
Classification & Prediction
A Network Science Approach to Fake News Detection on Social Media
Pei Lee, ICDE 2014, Chicago, IL, USA
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Examining Hurricane Irma with Twitter Data
Presentation transcript:

Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life

Introduction “ “ Twitter is your window to the world. Every day…500 million users500 million tweets

Real-life Twitter

Introduction User Engagement Platform to share real- world events Litter understanding about people engagement in real- world events Event Detection Primary source of news content Hard to spot useful information from so many tweets.

Introduction Predict the (i) presence, and (ii) degree of the user’s engagement 643 real-world events User Engagement Aggressive data preprocessing Hierarchical clustering of tweets Time-dependent n-gram Cluster ranking Headlines re-clustering Event Detection

Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering

Methodology – Data Source 1. US Presidential Elections in Nov 2012, 23: Nov 2012, 06:30 2. Ukraine, Syria and the Bitcoin in Feb 2014, 17:30 – 26 Feb 2014, 18:15.

Methodology – Data Pre-processing and filtering Aggressive Filtering RemoveTweetsVocabulary

Methodology – Data Pre-processing and filtering Removal of URLs, user mentions, hashtags, digits and punctuation Tokenization by white spaceRemoval of stop words

Methodology – Data Pre-processing and filtering Structure- based filtering > 2 user mentions > 2 hashtags < 4 text tokens

Methodology – Data Pre-processing and filtering Vocabulary filtering Bi-gramTri-gram

Hierarchical clustering – Step 1 Computing hierarchical clustering: fastcluster library in python Scale and normalize the tweet-term matrix Compute tweet pairwise distance

Hierarchical clustering – Step 2 Higher threshold Different topics in the same cluster Lower threshold Same topic in lots of different clusters, i.e. topic fragmentation Cutting the dendrogram at a 0.5 distance threshold

Hierarchical clustering – Step 3

Hierarchical clustering – Step 4 Selecting topic headlines by the clusters’ size Re-clustering headlines to avoid topic fragmentation For each selected topic, select the headline with the earliest publication time

Results Analysis: Tweet Length and Structure Tweet length at least53 Tweet-term matrix 3,2583,777 Terms588

Results Analysis: Unigrams vs Bi-grams/Tri-grams Vocabulary Bi-grams and Tri-grams Uni-gram Tweet-term matrix Terms588482

Results Analysis: Topic Precision (Stream 1) Accuracy : 100% Ground TruthDetected Topic Headline Obama wins Vermont WASHINGTON (AP) - Obama wins Vermont; Romney wins Kentucky. #Election2012 Romney wins Indiana Not a shocker NBC reporting #Romney wins Indiana & Kentucky #Obama wins Vermont Romney wins Kentucky Sky News projection: Romney wins Kentucky. #election2012

Results Analysis: Topic Precision (Stream 2) Googled for the first 100 detected topics 80% of detected topics are published as news

Implications Advantage Simplicity and efficiency, runs in less than an hour Strong filtering of tweets and terms seems to lead to efficient and clean results Limitation Topic fragmentation, where topics get repeated across several clusters Overcome the heavy noise aspect of Twitter content

Predicting User Engagement on Twitter with Real-World Events

5 questions to address Does a person post tweets about an event because they are interested in the topic pertaining to that event? Are they instead engaged because their friends are also posting tweets about it?Perhaps they are just a very active user of Twitter?Is their engagement a reflection of the fact that this is a local event? How and to what extent do the different topics of events affect the degree of a user’s engagement?

Dataset 2.7 billion English tweets, applies automated event detection algorithm 7468 real-world event clusters Annotators to read sample tweets from each event cluster Inferred the geolocations for 643 events clusters Twitter users based on the 643 events Predicted location by location inference algorithm All tweets posted by each user in most recent 6 months preceding their first engagement with any of the 643 events

The Statistical Model - Dependent Variables Presence of engagement Existence of at least one tweet that references to a particular event on Twitter Binary measure ( 1:engaged; 0:not engaged ) Degree of engagement Number of tweets that a user post regarding to the event Continuous measure

The Statistical Model - Predictor Variables Twitter activities Tweet’s content Twitter user types Geolocation Social network structure 17 variables, 5 major types

Result – Prediction of Presence Standardize the measures ➜ logistic regression ➜ Predict user’s engagement

Prediction of Presence Twitter activity Total tweets posted by a user prior to her event engagement ✔ Lower directed /Higher broadcast communication ✔ Ratio of hashtags used ✔ Ratio of retweets ✔ Tweet content Topical interest ✘ Twitter user type Informer ✔ Meformer ✘

Prediction of Presence Geolocation ✘ Social network Number of news friends ✔ Number of friends/followers and neighbors ✘

Result – Prediction of Degree Linear regression Participation levels in past ➜ participation levels in final Most significant predictor Number of posts from the users’ friend prior to the user’s engagement ✔ User’s network size ✘

Prediction of Degree w.r.t Different Topics Linear regression again Allow only 1 label for a given event

Prediction of Degree w.r.t Different Topics Topical interest Politics, Business and Sports events ✔✔ Entertainment ✔ Following News friends vs Friends News friends ➜ Politics & Business, Technology & Science, Sports, Entertainment Friends ➜ Local, Odd Geolocation Sports, Local ✔

5 questions to address Does a person post tweets about an event because they are interested in the topic pertaining to that event? Are they instead engaged because their friends are also posting tweets about it?Perhaps they are just a very active user of Twitter?Is their engagement a reflection of the fact that this is a local event? How and to what extent do the different topics of events affect the degree of a user’s engagement?

Answers to the 5 questions Does a person post tweets about an event because they are interested in the topic pertaining to that event? Yes, increase in significance of correlation between content of tweets related to events in specific topics and the user’s engagement

Answers to the 5 questions Are they instead engaged because their friends are also posting tweets about it? Yes, conditioned on the type of event (local events and odd news)

Answers to the 5 questions Perhaps they are just a very active user of Twitter? Yes, more active users are more likely to be interested and engaged in a new event

Answers to the 5 questions Is their engagement a reflection of the fact that this is a local event? Depends on the kind of event, yes for Sports and Local events

Answers to the 5 questions How and to what extent do the different topics of events affect the degree of a user’s engagement? Politics & Business, Technology & Science, Sports events depend more on content of past tweets Local, Odd events depend more on user’s social network

Implications Limitation Just allot events into a single category Did not consider people’s personality Did not consider that there exist different kinds of target users

Conclusion User Engagement Users’ prior activities and social network structure are good predictors for presence and degree Content of tweets and geographic proximity provide additional predictive power Event Detection Many topics are published as news User can trace the news back to its original tweet

THANK YOU