Predicting Flu Trends using Twitter Data Harshavardhan Achrekar [1] Avinash Gandhe [ 2 ] Ross Lazarus [3] Ssu-Hsin Yu [2] Benyuan Liu [1] Workshop on Cyber-Physical.

Slides:



Advertisements
Similar presentations
Surveillance in a Pandemic: Situational Awareness
Advertisements

Predicting Flu Trends using Twitter Data Harshavardhan Achrekar [1] Avinash Gandhe [ 2 ] Ross Lazarus [3] Ssu-Hsin Yu [2] Benyuan Liu [1] SNEFT – Social.
Whats new with social media Dean Chew SEO Consultant Ayima Search Marketing.
Reeder et al. Perceived usefulness of a distributed community-based syndromic surveillance system: a pilot qualitative evaluation study. BMC Research Notes.
LeadManager™- Internet Marketing Lead Management Solution May, 2009.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Privacy: Facebook, Twitter
Top 5 Twitter Tips Mark Smith Rosemary Turner. What is Twitter? Twitter is a social networking and micro-blogging service that allows users to send and.
Influence and Passivity in Social Media Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman Social Computing Lab, HP Labs.
Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.
The Role of Twitter in YouTube Videos Diffusion George Christodoulou EPFL Switzerland Laboratory for Internet Computing Department of Computer Science.
Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.
Public Health Preparedness Arizona’s Near Real Time School-based Syndromic Surveillance Program Lea Trujillo PhD, Yue Qiu, MPH, Kenneth Komatsu, MPH, Laura.
The use of Social Media in Medical Education IAMSE Web Seminar January 20, 2010 Julie K. Hewett, IAMSE Association Manager.
GEO SYMPTOM SOLUTIONS Anurag Jain. Method of reach Content Categorization User Categorization based on site usage and declared information Scale for WebMD.
U.S. Surveillance Update Anthony Fiore, MD, MPH CAPT, USPHS Influenza Division National Center for Immunizations and Respiratory Disease Centers for Disease.
Masoud Valafar †, Reza Rejaie †, Walter Willinger ‡ † University of Oregon ‡ AT&T Labs-Research WOSN’09 Barcelona, Spain Beyond Friendship Graphs: A Study.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Monitoring Influenza Trends though Mining Social Media By Courtney D Corley, Armin R Mikler, Karan P Singh, and Diane J Cook Jedsada Chartree 02/07/2011.
Freshness Policy Binoy Dharia, K. Rohan Gandhi, Madhura Kolwadkar Department of Computer Science University of Southern California Los Angeles, CA.
PARTICIPATORY MEDICINE: LEVERAGING SOCIAL NETWORKS IN TELEHEALTH SOLUTIONS Duckki Lee, PhD Stduent Mobile and Pervasive Computing Lab University of Florida.
Social Media Motion: How to Get Started & Keep Going With Facebook, Twitter & More Presented by Eli Lilly and Company Hosted by Rob Robinson McNeely Pigott.
Forecasting World Wide Pandemics Using Google Flu Data to Forecast the Flu Brian Abe Dan Helling Eric Howard Ting Zheng Laura Braeutigam Noelle Hirneise.
Inbound Statistics Slides Attract. 1 Blogging There are 31% more bloggers today than there were three years ago 46% of people read blogs more than once.
Towards Real Time Epidemic Vigilance through Online Social Networks Lingji Chen [1] Harshavardhan Achrekar [ 2 ] Benyuan Liu [2] Ross Lazarus [3] MobiSys.
Aedes albopictus in Bermuda: seasonality, spatial correlates and density dependence David Kendell 1, Camilo Khatchikian 2, Laran Kaplan 2 and Todd Livdahl.
Google Flu Trends Terminology –Influenza = flu –ILI = influenza like illness CDC ILI time series –Weekly –1-2 week publication lag Predicting it using.
TM Aggregate Reporting of Pandemic Influenza Vaccine Doses Administered Using CDC’s Countermeasure & Response Administration (CRA) System and State Immunization.
Inbound Statistics Slides Template Resources for Partners.
Audumbar Chormale Advisor: Dr. Anupam Joshi M.S. Thesis Defense
Buyer Advertising & UMass Boston Navigating the Changing Landscape of Recruitment Communications Presented to: November 18, 2014.
Towards Detecting Influenza Epidemics by Analyzing Twitter Massages Aron Culotta Jedsada Chartree.
The Spread of Disease IB Geography II.
SPONSOR JAMES C. BENNEYAN DEVELOPMENT OF A PRESCRIPTION DRUG SURVEILLANCE SYSTEM TEAM MEMBERS Jeffrey Mason Dan Mitus Jenna Eickhoff Benjamin Harris.
TANEY COUNTY HEALTH DEPARTMENT AUGUST 2009 Situation Update: H1N1 Influenza A.
Learning from the 2009 H1N1 Pandemic Response 1 Daniel S. Miller MD, MPH Director, International Influenza Unit Office of the Secretary Office of Global.
Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Daniele Quercia, Michal Kosinski, David Stillwell, Jon Crowcroft COMP4332 Wong Po.
1 Twitter improves Seasonal Influenza Prediction [1] Computer Science Department, University of Massachusetts Lowell [2] Scientific Systems Company Inc,
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
Influenza-like Illness Surveillance at the National Level
Sore throat? Sniffles?Sore throat? Sniffles?  Google it! Duh!  During flu season, more people enter search queries concerning the flu.  Each year 90.
Dr. Zhen XU Branch of Respiratory Disease Prevention and Control Division for Disease Control and Emergency Response Chinese Center for Disease Control.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
EVERY CONNECTION has a starting point. Jasmine de Gaia Product Management WorldCat Consumer Discovery Social Networking & WorldCat.org.
Introduction for Basic Epidemiological Analysis for Surveillance Data National Center for Immunization & Respiratory Diseases Influenza Division.
Detecting Influenza Outbreaks by Analyzing Twitter Messages By Aron Culotta Jedsada Chartree 02/28/11.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
India’s largest digital media services company. -`-` (Online agency on Record) Few Clients … (Online agency on Record)
IB GEOGRAPHY II THE SPREAD OF DISEASE. OBJECTIVE By the end of this lesson, students will be able to: Explain how the geographic concepts of diffusion.
Social Media: The Basics Teresa Marks School Community Oral Health Conference Friday, October 16, 2015.
Linking Organizational Social Networking Profiles PROJECT ID: H JEROME CHENG ZHI KAI (A H ) 1.
Lily R. Jenkins and Diane E. Gan CSAFE Centre University of Greenwich 1.
Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How David A. Broniatowski Asst.
Avian Flu Pandemic Preparedness David A. Denneno APRN,BC, MSN, MEd, CEN Emergency Preparedness Coordinator Sturdy Memorial Hospital Attleboro, MA.
The Spread of Disease IB Geography II.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Descriptive study design
Some Final Material. GOOGLE FLU TRENDS Sore throat? Sniffles? Google it! Duh! During flu season, more people enter search queries concerning the flu.
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.
[xxxx] SEO Online Marketing for Business Catalyst Websites
By Samantha Kozar.  What are social networks?  What is Facebook?  What is Gowalla?  What are the capabilities of these sites?  Privacy Settings 
St. Luke’s Social Media Facebook  1,238 Likes; 14 Talking About This  Date Joined: October 26, 2010  Last Activity: March 26, 2011  Seldom posts status.
Outline Introduction Standards Project General Idea
Bariatric Surgery Weight Loss Prediction Tool
Epidemic Alerts EECS E6898: TOPICS – INFORMATION PROCESSING: From Data to Solutions Alexander Loh May 5, 2016.
One Health Early Warning Alert
Influenza-like Illness Surveillance at the National Level
Predicting Prevalence of Influenza-Like Illness From Geo-Tagged Tweets
Figure 4 - Sample of Data Collected
Analyzing social media data to monitor public health trends
Presentation transcript:

Predicting Flu Trends using Twitter Data Harshavardhan Achrekar [1] Avinash Gandhe [ 2 ] Ross Lazarus [3] Ssu-Hsin Yu [2] Benyuan Liu [1] Workshop on Cyber-Physical Networking Systems in conjunction with INFOCOM 2011 CPNS 2011, Shanghai, China SNEFT – Social Network Enabled Flu Trends [1] Computer Science Department, University of Massachusetts Lowell [2] Scientific Systems Company Inc, Woburn, MA [3] Department of Population Medicine - Harvard Medical School

Background Related Work Our Approach SNEFT System Architecture Twitter Dataset Description Twitter Dataset Analysis Detection and Prediction Conclusion Outline

Seasonal flu Influenza (flu) is contagious respiratory illness caused by influenza viruses. Seasonal - wave occurrence pattern. 5 to 20 % of population gets flu ≈ 200,000 people are hospitalized from flu related complications. 36,000 people die from flu every year in USA. worldwide death toll is 250,000 to 500,000. Epidemiologists use early detection of disease outbreak to reduce no. of people affected.

Related Work :- Google Flu Trends Certain Web Search terms are good Indicators of flu activity. Google Trend uses Aggregated search data on flu indicators. Estimate current flu activity around the world in real time. Accuracy of data {not every person who searches for “Flu” is sick} From example :- Google Flu Trend detects increased flu activity two weeks before CDC. CDC stands for Center for Disease Control Link:-

Our Approach OSN emerged as popular platform for people to make connections,share information and interact. OSN represent a previously untapped data source for detecting onset of an epidemic and predicting its spread. {“i am down with flu”, “got flu.”} msg exchange between users provide early,robust predictions. Twitter/Facebook mobile users tweet/posts updates with their geo-location updates. helps in carrying out refined analysis. User demographics like age, gender, location, affiliated networks.,etc can be inferred from data. snapshot of current epidemic condition and preview on what to expect next on daily or hourly bases. FaceBook:- 400, Myspace:- 200, Twitter:- 80 User Population (in millions)

System Architecture of SNEFT ILI Data OSN Data downloader crawler OSN models Math models ARMA Model Novelty Detector Filter / Predictor ILI Pre- diction Flu Warn- ing State Esti- mate Internet Data Collection Engine ILI stands for Influenza-Like Illness

OSN Data Collection Design of the Twitter data collection engine / Crawler

Twitter Data Set Real Time Response Stream fetches entries relevant to searched keyword having the tweets in reverse-time order. Data collection active from October 18, 2009 until present. Until October 23, 2010 we have collected 4.7 million tweets from 1.5 million unique users. CDC’s Inactive ILI period last from May 23, 2010 to October 9, 2010, Results in 31 weeks of CDC data available for comparison with Twitter dataset. (Power outage on our data collection site resulted in no data being collected from January 18, 2010 till January 20, 2010.)

Spatio Temporal Database for Twitter Data Set Crawler uses Streaming Real time Search Application Programming Interface (API) to fetch data at regular time intervals. A tweet has the Twitter User Name, the Post with status id Time stamp attached with each post. From Twitter’s username we can get profile details attached to every user which include number of followers, number of friends, his/her profile creation date, location {public or private from the profile page or mobile client} with status updates count User’s current location is passed as an input to Google’s location based web services to get geo-location codes (i.e., latitude and longitude) along with the country, state, city with a certain accuracy scale.

Twitter Data Set Analysis In our Twitter dataset 30.6% users are from USA, 41.3% users are outside USA 28.1% users have not published their location details. Status posting times (tweet timestamp in GMT) are converted to the local timezone of the individual profile. Day light saving are applied within required time frame. State-wise Distribution of USA users on Twitter for flu postings

Hourly Twitter usage pattern in USAAverage daily Twitter usage within a week Figure shows percentage of unique Twitter users who mentioned about flu in tweets at different hours of the day. The hourly activity patterns observed at different hours of the day are much to our expectations, with high traffic volumes being witnessed from late morning to early afternoon and less tweet posted from midnight to early morning, reflecting people’s work and rest hours within a day. Average daily usage pattern within a week suggests a trend on OSN sites with more people discussing about flu on weekdays than on weekends. Twitter Data Set Analysis

Twitter Data Set Cleaning Retweets: A retweet is a post originally made by one user that is forwarded by another user. Syndrome elapsed time: An individual patient may have multiple encounters associated with a single episode of illness. To avoid duplication the first encounter for each patient within any single syndrome group is reported to CDC, but subsequent encounters with the same syndrome are not reported as new episodes until more than six weeks has elapsed since the most recent encounter in the same syndrome. We call it syndrome elapsed time. Remove retweets and tweets from the same user within a certain syndrome elapsed time, since they do not indicate new ILI cases.

Number of Twitter users per week versus percentage of weighted ILI visits by CDC Twitter Data Set Analysis Increase in number of users tweeting about flu related activity is accompanied by increase in the percentage of weighted ILI visits reported by CDC in the same week.

Percentage of weighted ILI visits by CDC, original Scaled Twitter dataset and Scaled Twitter dataset filtered by retweets and syndrome elapsed time of one week displayed on weekly basis Twitter Data Set Analysis Plot sketches CDC’s percentage of ILI visits to physician with the original Twitter data and Filtered Twitter data both normalized to the scale of CDC data for a time-span of thirty one week when our data collection mechanism was active and CDC was publishing their reports online.

Twitter Improves Prediction of Influenza Data where t indexes weeks, y(t) denotes the percentage of physician visits due to ILI in week t, u(t) represents the number of unique Twitter users with flu related tweets in week t, and e(t) is a sequence of independent random variables. c is a constant term to account for offset. To predict the flu cases in week t using the ARX model based on the CDC data with 2 weeks of delay and/or the up-to-date Twitter data, we use where ˆy(t) represents the predicted CDC data in week t.

Cross Validation Results The results of 5-fold cross validation are given in Table II. According to 5-fold cross validation results, the model corresponding to m = 0 and n = 3 has the lowest RMSE.

Twitter dataset normalized to the same scale as CDC data along with its predicted values for percentage off weighted ILI visits (5-fold cross validation) Predicted Influenza Data (percentage of weighted ILI visits)

Conclusion and Future Work Investigated the use of a previously untapped data source, namely, messages posted on Twitter to track and predict influenza epidemic situation in the real world. Results show that the number of flu related tweets are highly correlated with ILI activity in CDC data with a Pearson correlation coefficient of Build auto-regression models to predict number of ILI cases in a population as percentage of visits to physicians in successive weeks. Tested our regressive models with the historic CDC data and verified that Twitter data substantially improves our model’s accuracy in predicting ILI cases. In view of the lag inherent in CDC’s ILI reports, Twitter data provides near real time assessment of influenza activity and can be used to effectively predict current ILI activity levels. Opportunity to significantly enhance public health preparedness among the masses for influenza epidemic and other large scale pandemic.

Thank You