Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove
Flu Prediction / Activity CDC Flu Activity Reports Influenza like Illness (ILI) for each region Google Flu Trends Aggregates search data to estimate flu activity Our experiment (Twitter) Analyze Twitter data (tweets) to estimate flu activity
Google Flu Trends CDC’s ILI data VS Google Flu Trends
Google Flu Trends Vs Twitter
Tweets, Phrases "having a cold"4 "have a cold“ 7 "feel feverish" "flu"5 "headache" "flu"8 "sick" "flu" 9 "flu" "fever“5 "came down with the flu"7 "chills" "flu"7 "catching the flu"6 "cough" "flu"6 "fatigue" "flu"8 "weakness" "flu"6 "flu like symptoms"4 "runny nose" "flu"5 "sore throat" "flu"7 "stomach ache" "flu"6 "stuffy nose" "flu"6 "tiredness" "flu"4 "vomiting" "flu"4 "watery eyes" "flu"6 "body hurts" "flu"7
Process Filter flu tweets from twitter data Store data for each state (FIPS) Filter Count flu tweets (weekly) Count total tweets (weekly) Count Ratio of flu related to total tweets Compare against Google/CDC Plot
Implementation Linux bash shell script Filtering find fips -name "*.gz" -exec zcat {} \; | grep "$1" Counting find … -exec zcat {} \; | awk ‘{ print $3 }' | awk '{ print $3 " " $2 " " $6 } sort -k 3n -k 2M -k 1n | uniq -c Plotting pr -mft -s, dates.txt NJ.tot NY.tot > RE2.tot Microsoft Excel
Challenges Filtering Phrases that express flu symptoms Processing time Segregation based on location Counting Processing time Storage format Plotting Lack of consistent CDC data Handling of large numeric data
Future Better prediction algorithm Live Tweet monitoring Flu propagation Facebook application