Presentation is loading. Please wait.

Presentation is loading. Please wait.

Google Flu Trends Terminology –Influenza = flu –ILI = influenza like illness CDC ILI time series –Weekly –1-2 week publication lag Predicting it using.

Similar presentations


Presentation on theme: "Google Flu Trends Terminology –Influenza = flu –ILI = influenza like illness CDC ILI time series –Weekly –1-2 week publication lag Predicting it using."— Presentation transcript:

1 Google Flu Trends Terminology –Influenza = flu –ILI = influenza like illness CDC ILI time series –Weekly –1-2 week publication lag Predicting it using frequency of search queries –Machine learning problem –Logit Regression

2 Flu a big problem 10s of millions of cases every year, worldwide 250-500k deaths every year, worldwide Swine flu pandemic is worse Surveillance –CDC –European Influenza Surveillance Scheme (EISS) –Also, monitoring volume of calls to help-lines, and –Volume of over-the-counter sales

3 Regression I(t) = fraction of doctor’s visits due to flu Q(t) = fraction of search queries related to flu logit(·) used to map I(t), Q(t) from [0,1] to R –logit(p) = log [p/(1-p)] Regression: logit(I(t)) =  logit(Q(t)) +  +  (t) –  = error –Correlation is a performance measure

4 Which Queries? Training data: I 1 (t),…,I 9 (t) –9 regions –Some subset of 9/28/2003 to 3/11/2007 for which > 0 Candidate queries –Database of 50m most common search queries in US –Q j (t) = volume fraction of candidate query in region j –Calculate correlations  j = corr( logit(Q j (·)), logit(I j (·)) ) –Z-transform the correlations to make them normally distributed –Average the Z-transformed correlations –Rank them

5 Which Queries? Q j (t) = sum of volume of top n=45 queries –Out-of-sample validation to choose n Avg. correlation of 0.9 over 9/28/2003 to 3/11/2007 Avg. correlation of 0.97 over 3/18/2007 to 5/11/2008 Query type# among top 45 Influenza complications11 Cold/flu remedy8 General flu symptoms5 Term for influenza4 Specific flu symptoms4 Symptoms of a flu complication4 Antibiotics3 General flu remedies2 Symptoms of a related disease2 Antivirals1 Related disease1

6 Extensions Geographic granularity –Example: predictions specific to state of Utah Not including query variants, misspellings, etc. Not using a weighted sum of query volumes –Just a plain sum –Queries volumes are very correlated

7 Discussion Use of early warning –New strain? Extra capacity needed? Public awareness? Searches may not indicate infection –But flu related news –Keeping secret the actual queries in the regression Not a substitute to actual surveillance –Demographics, genotype, …


Download ppt "Google Flu Trends Terminology –Influenza = flu –ILI = influenza like illness CDC ILI time series –Weekly –1-2 week publication lag Predicting it using."

Similar presentations


Ads by Google