Download presentation
Presentation is loading. Please wait.
1
Forecasting World Wide Pandemics Using Google Flu Data to Forecast the Flu Brian Abe Dan Helling Eric Howard Ting Zheng Laura Braeutigam Noelle Hirneise Using Google Flu Data to Forecast the Flu Brian Abe Dan Helling Eric Howard Ting Zheng Laura Braeutigam Noelle Hirneise Group C
2
Preview of Coming Attractions Introduction The Data Our Best Model Verifying The Model Comparing Our Model with Real Data Conclusions Introduction The Data Our Best Model Verifying The Model Comparing Our Model with Real Data Conclusions
3
Background The Flu –Generalization for multiple different viruses –Responsible for Respiratory illness Up to 500,000 deaths world wide per year –Virus is able to flourish in those with weaker immune systems Young Elderly Sick The Flu –Generalization for multiple different viruses –Responsible for Respiratory illness Up to 500,000 deaths world wide per year –Virus is able to flourish in those with weaker immune systems Young Elderly Sick
4
Background Conventional methods for forecasting possible medical catastrophes –Step 1 – Patient realizes they are sick –Step 2 – Patient makes a medical appt. –Step 3 – Patient goes to appointment and is diagnosed –Step 4 – Medical professional sends data to CDC Conventional methods for forecasting possible medical catastrophes –Step 1 – Patient realizes they are sick –Step 2 – Patient makes a medical appt. –Step 3 – Patient goes to appointment and is diagnosed –Step 4 – Medical professional sends data to CDC
5
The Future is Google Future methods for predicting local epidemics and world wide pandemics lie within the hands of Google –Google Flu Trends Weekly data with collection date starting on June 1, 2003 Data is a normalized aggregate of number of searches for “flu” or similar queries in a given area. –Source: http://www.google.org/about/flutrends/download.html http://www.google.org/about/flutrends/download.html –http://www.cdc.gov/flu/weekly/fluactivity.htmhttp://www.cdc.gov/flu/weekly/fluactivity.htm Future methods for predicting local epidemics and world wide pandemics lie within the hands of Google –Google Flu Trends Weekly data with collection date starting on June 1, 2003 Data is a normalized aggregate of number of searches for “flu” or similar queries in a given area. –Source: http://www.google.org/about/flutrends/download.html http://www.google.org/about/flutrends/download.html –http://www.cdc.gov/flu/weekly/fluactivity.htmhttp://www.cdc.gov/flu/weekly/fluactivity.htm
6
Google Flu Trends Idea behind the project was to predict pandemics and epidemics faster than conventional methods Early detection could lead to a lower rate of infection and subsequent number of deaths Could save you and your families lives someday Idea behind the project was to predict pandemics and epidemics faster than conventional methods Early detection could lead to a lower rate of infection and subsequent number of deaths Could save you and your families lives someday
7
Google Flu Trends The Future of Forecasting –Step 1: The sick realize they are sick –Step 2: Patient “Googles” their symptoms –Step 3: Data is aggregated and sent to the CDC The Future of Forecasting –Step 1: The sick realize they are sick –Step 2: Patient “Googles” their symptoms –Step 3: Data is aggregated and sent to the CDC
8
Pitfalls of the Data Everyone does not have the internet Everyone does not know how to use the internet Everyone does not use Google (≈18%) New strains of virus, such as H1N1 may not behave similarly to former strains –This may or may not be an issue Everyone does not have the internet Everyone does not know how to use the internet Everyone does not use Google (≈18%) New strains of virus, such as H1N1 may not behave similarly to former strains –This may or may not be an issue
9
Hypothesis Google data on the flu can be used to forecast future outbreaks of the flu
10
The Data Trace shows serious seasonality Notice the spike in 2003 from increased number of searches due to bird flu scare Trace shows serious seasonality Notice the spike in 2003 from increased number of searches due to bird flu scare
11
The Data Histogram of the data – definitely not normally distributed with huge Jarque-Bera Stat
12
Correlogram of the Data – looks like a possible AR(2) or AR(3)
13
The Data Unit-Root test – significant at the 1% level but not conclusive
14
Seasonal Differencing was done to make the data more stationary: SDUS=US-US(-52)
15
Histogram of the seasonally differenced data: Still not normal but now more normal with less skewness and is now single peaked.
16
The Data Correlogram of the seasonal difference – looks like an AR(2)
17
The Data Unit Root Test – Further evidence of stationary:
18
The Data First modeled using OLS: Tried AR(1) AR(2) first
19
The Data Correlogram – orthogonal
20
The Data Histogram of the residuals – highly kurtotic and negatively skewed.
21
The Data Serial correlation test – no serial correlation detected.
22
The Data Correlogram of SQ residuals – shows some significance:
23
The Data Test for Autoregressive Heteroskedasticity – positive for ARCH:
24
The Data Trace of the squared residuals – shows spikes meaning ARCH is present:
25
The Data ARCH GARCH model used:
26
The Data Correlogram of the residuals – now not orthogonal:
27
The Data Correlogram of squared residuals – now orthogonal:
28
The Data Histogram of ARCH GARCH residuals – far less kurtosis and skewness and closer to being normally distributed than before:
29
The Data Test for ARCH is no longer significant:
30
The Data Looking back at the OLS estimates and correlogram, there is a spike at lag 9 which could be significant so we added an MA(9) term to see if it would orthogonalize the correlogram in the ARCH GARCH model.
31
The Data
32
We still have highly significant Q-statistics showing orthogonal residuals:
33
The Data Still a positive test for ARCH:
34
The Data ARCH GARCH model estimated:
35
The Data Now the residuals are orthogonal at all visible lags:
36
The Data Squared residual correlogram is also significant:
37
The Data Histogram of the residuals – still single peaked, slightly skewed and kurtotic:
38
The Data No longer a positive test for ARCH:
39
Correlogram of Standardized Residuals
40
Correlogram of Resid Squared
41
Garch Trace
42
Garch Histogram
43
Ordinary residuals
44
Standardized residuals Lower Kurtosis
45
1 Forecast with 1 Year Time Saved Good fit
46
95% Confidence Interval Included
47
Recolored Forecast With One Year Saved Looks like a really good fit!
48
Few Months Ahead Forecast
49
Few months ahead forecast with 95% confidence interval included:
50
Recolored forecast with confidence interval included:
51
1 Year Ahead Forecast Standard error becomes huge at the end of the time horizon
52
Forecast With Actual Data:
53
Forecast and data with 95% confidence interval:
54
Recolored forecast Looks the same as the previous year but is actually slightly different:
55
The google search data and actual flu cases The trace of Google search data and actual cases:
56
The correlation matrix and Granger Test Highly correlated to the actual flu cases Both significant at 5% level in Granger Causality Test Highly correlated to the actual flu cases Both significant at 5% level in Granger Causality Test
57
Vector Autoregression Model Lab Confirm cases cause the Google Search -Significant at lag 1, 3, 4, 6, 7, 8, 9 The Google Search causes the Lab Confirm cases-only significant at lag 2 Lab Confirm cases cause the Google Search -Significant at lag 1, 3, 4, 6, 7, 8, 9 The Google Search causes the Lab Confirm cases-only significant at lag 2
58
The response graph
59
Conclusions Model fits very well Forecast can be used for more than just the flu, but any medical ailment that is easily contracted. –Could be especially useful in coming months when H1N1 mutates and returns this coming Fall. Model fits very well Forecast can be used for more than just the flu, but any medical ailment that is easily contracted. –Could be especially useful in coming months when H1N1 mutates and returns this coming Fall.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.