Social Theory Driven Operational Forecasting of Civil Unrest Event Outbreaks Final Project Presentation Peter Wu Apr 30, 2015.

Slides:



Advertisements
Similar presentations
Learning more about Facebook and Twitter. Introduction  What we’ve covered in the Social Media webinar series so far  Agenda for this call Facebook.
Advertisements

Twitter – what is it? The School District of Haverford Township |
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Chapter 2: The Research Process
BEHAVIORAL PREDICTION OF TWITTER USERS BASED ON TEXTUAL INFORMATION Shiyao Wang.
What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha*, Yan Li †, Lujo Bauer* *Carnegie Mellon.
TWITTER BASICS GATEHOUSE NEWS & INTERACTIVE DIVISION.
Language and Computation Group 18 th November 2011.
CS305: HCI in SW Development Evaluation (Return to…)
Lecture 22: Evaluation April 24, 2010.
Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.
Regression With Categorical Variables. Overview Regression with Categorical Predictors Logistic Regression.
Analysis and Forecasting of Trending Topics in Online Media Streams 1 ACM MM 2013 Tim Althoff, Damian Borth, Jörn Hees, Andreas Dengel German Research.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Data Mining.
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
Statistics in Bioinformatics May 12, 2005 Quiz 3-on May 12 Learning objectives-Understand equally likely outcomes, counting techniques (Example, genetic.
Twitter Volume Spikes: Analysis and Application in Stock Trading Yuexin Mao, Wei Wei and Bing Wang COMP4332/RMBI4310 CHAN Chun Ting ( )
Introduction Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Introduction Facebook How does Facebook use your data? Where do you think.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
Study announcement if you are interested!. Questions  Is there one type of mixed design that is more common than the other types?  Even though there.
Towards Detecting Influenza Epidemics by Analyzing Twitter Massages Aron Culotta Jedsada Chartree.
Today Evaluation Measures Accuracy Significance Testing
Soft Skills for a Digital Workplace: Verbal Communication Unit D: Improving Informal Communication.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Near East University Department of English Language Teaching Advanced Research Techniques Correlational Studies Abdalmonam H. Elkorbow.
Media Relations in a Social Media World By Julie DeBardelaben Deputy Director of Public Affairs CAP National Headquarters.
MA in English Linguistics Experimental design and statistics Sean Wallis Survey of English Usage University College London
Knowing Your Facebook From Your Flickr Dan O’ Neill – -
Highline Class, BI 348 Basic Business Analytics using Excel, Chapter 01 Intro to Business Analytics BI 348, Chapter 01.
The attraction hypothesis
Second Language and Curriculum Goals. Knowing how, when, and why to say what to whom. Successful Communication:
Predicting Positive and Negative Links in Online Social Networks
Presenter: Shanshan Lu 03/04/2010
Query trends CS 349 Presentation December 2 nd, 2008 Catherine Grevet.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Detecting Influenza Outbreaks by Analyzing Twitter Messages By Aron Culotta Jedsada Chartree 02/28/11.
PIER Research Methods Protocol Analysis Module Hua Ai Language Technologies Institute/ PSLC.
Prediction of Influencers from Word Use Chan Shing Hei.
4 Free Tools to monitor your Social Media online reputation.
Department of Electrical Engineering and Computer Science Kunpeng Zhang, Yu Cheng, Yusheng Xie, Doug Downey, Ankit Agrawal, Alok Choudhary {kzh980,ych133,
Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts Zhe Zhao Paul Resnick Qiaozhu Mei Presentation Group 2.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Linking Organizational Social Networking Profiles PROJECT ID: H JEROME CHENG ZHI KAI (A H ) 1.
Linking Organizational Social Networking Profiles Research Wrap-Up – 28 August
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Measuring Behavioral Trust in Social Networks
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
+ Social Media in the Classroom Tumblr & Twitter.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
On Frequent Chatters Mining Claudio Lucchese 1 st HPC Lab Workshop 6/15/12 1st HPC Workshp - Claudio Lucchese.
Analyzing and Predicting Question Quality in Community Question Answering Services Baichuan Li, Tan Jin, Michael R. Lyu, Irwin King, and Barley Mak CQA2012,
Realtime Financial Monitoring and Analysis System May 2010 Lietu Search Engine.
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
Grow Your Business with Social Marketing
Research Design. How do we know what we know? The way we make reasoning Deductive logic Begins with one or more premises, reasoning then proceeds logically.
Large-Scale Content-Based Audio Retrieval from Text Queries
Effects of User Similarity in Social Media Ashton Anderson Jure Leskovec Daniel Huttenlocher Jon Kleinberg Stanford University Cornell University Avia.
The DOOM Lab Missouri State University
SOCIAL COMPUTING Homework 3 Presentation
Epidemic Alerts EECS E6898: TOPICS – INFORMATION PROCESSING: From Data to Solutions Alexander Loh May 5, 2016.
PSY 614 Instructor: Emily Bullock Yowell, Ph.D.
Precursor pattern analysis for civil unrest events
All a Twitter About Literature
The End is the Beginning: Thinking Strategically About Assessment
Towards a Personal Briefing Assistant
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Presentation transcript:

Social Theory Driven Operational Forecasting of Civil Unrest Event Outbreaks Final Project Presentation Peter Wu Apr 30, 2015

Outline Introduction Political conflict prediction Protest participation theory Methodology Feature design Ground truth labels Modeling Findings

Political conflict prediction Crisis early warning Nature: strategic Predictand: future state of intra-national conflict or international relations Predictor: Social-economic indices and historical crisis records Civil unrest event forecasting Nature: operational Predictand: occurrence of concrete civil unrest events on a future day Predictor: GDELT (Global Database of Events, Location and Tone) event counts; retweet cascade lengths on Twitter (Ramakrishnan et al, 2014) Topic proportions and hashtag counts on Twitter (Boecking et al, 2014)

“While we have a pretty good track record using event data for political forecasting using statistical methods, typically guided by a considerable amount of theory, the jury is probably out with respect to theoretical Big Data methods......Big Data approaches appear to work fairly reliably if you have something specific in mind that is invariant to noise and you are looking for a specific pattern, which is to say, at least in some sense you have a theory……But generally if you expect the data simply to "speak to you", you are going to be disappointed.” (Schrodt, 2015)

Protest participation theory (Verba et al 1995; Schussman & Soule, 2005; Van Laer, 2011)

Metric development Where to measure? Questionnaire  Online social media (Twitter) Specifically, a data set containing all the tweets created by Cairo Twitter users from 12/1/2010 to 3/1/2011

Metric development (cont’d) What to measure? Interested in politics (enjoys political discussion)  Daily volume of political tweets Been asked to participate  Daily volume of tweets that present future protest information Knowledgeable in politics (reads daily newspaper)  Daily volume of political popular news media Affiliated with social organization  Daily volume of salient political activists

Metric development (cont’d) How to measure? Volume of political tweets Keyword match with TF-IDF based query term expansion Volume of “future protest” tweets Keyword matching rule: simultaneous occurrence of protest related words and “future day” words in English or Arabic Volume Manually identify news media and political activists from the list of most usernames by political tweets.

Ground truth labels Protest outbreaks in Cairo during 12/1/2010-3/31/2011 Manually curated through Google news search 15 protest outbreaks identified Example:

Research question A change in the value of a protest participation metric of Cairo over a base period of the M past days (M=1,2,3) is significantly correlated with a protest event outbreak that happens within a predicting horizon of the N upcoming days (N=1,2,3).

Modeling & prediction Logistic regression with backward stepwise selection based on Akaike information criterion (AIC) for each configuration of base period M and predicting horizon N. Leave-one-out cross validation to evaluate prediction. Performance compared against baseline models built using GDELT event count features.

Highlight of findings Daily volume of tweets that present future protest information has a significant positive correlation with future protest outbreaks under all configurations of M and N. Daily volume of political tweets (percentage) is only significant under M=3 and N=1,2 and surprisingly has a negative effect. To predict protest outbreaks 1 or 2 days into the future, choosing a base period M=3 gives the best performance; while when N=3, the best model is obtained at M=1. The selected main model achieves an AUC of under N=1, outperforming the baseline model the most, by 36.8%.

Highlight of findings (cont’d)

Questions?