Download presentation
Presentation is loading. Please wait.
Published byShanna Cunningham Modified over 8 years ago
1
2/25/2016330 lecture 121 STATS 330: Lecture 12
2
2/25/2016330 lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence
3
2/25/2016330 lecture 123 Independence One of the regression assumptions is that the errors are independent. Data that is collected sequentially over time often have errors that are not independent. If the independence assumption does not hold, then the standard errors will be wrong and the tests and confidence intervals will be unreliable. Thus, we need to be able to detect lack of independence.
4
2/25/2016330 lecture 124 Types of dependence If large positive errors have a tendency to follow large positive errors, and large negative errors a tendency to follow large negative errors, we say the data has positive autocorrelation If large positive errors have a tendency to follow large negative errors, and large negative errors a tendency to follow large positive errors, we say the data has negative autocorrelation
5
2/25/2016330 lecture 125 Diagnostics If the errors are positively autocorrelated, Plotting the residuals against time will show long runs of positive and negative residuals Plotting residuals against the previous residual (ie e i vs e i-1 ) will show a positive trend A correlogram of the residuals will show positive spikes, gradually decaying
6
2/25/2016330 lecture 126 Diagnostics (2) If the errors are negatively autocorrelated, Plotting the residuals against time will show alternating positive and negative residuals Plotting residuals against the previous residual (ie e i vs e i-1 ) will show a negative trend A correlogram of the residuals will show alternating positive and negative spikes, gradually decaying
7
2/25/2016330 lecture 127 Residuals against time res<-residuals(lm.obj) plot(1:length(res),res, xlab=“time”,ylab=“residuals”, type=“b”) lines(1:length(res),res) abline(h=0, lty=2) Can omit the “x” vector if it is sequence numbers Dotted line at 0 (mean residual) Dots/lines
8
2/25/2016330 lecture 128
9
2/25/2016330 lecture 129 Residuals against previous res<-residuals(lm.obj) n<-length(res) plot.res<-res[-1] # element 1 has no previous prev.res<-res[-n] # have to be equal length plot(prev.res,plot.res, xlab=“previous residual”,ylab=“residual”)
10
2/25/2016330 lecture 1210 Plots for different degrees of autocorrelation
11
2/25/2016330 lecture 1211 Correlogram acf(residuals(lm.obj)) Correlogram (autocorrelation function, acf) is plot of lag k autocorrelation versus k Lag k autocorrelation is correlation of residuals k time units apart
12
2/25/2016330 lecture 1212
13
2/25/2016330 lecture 1213 Durbin-Watson test We can also do a formal hypothesis test, (the Durbin-Watson test) for independence The test assumes the errors follow a model of the form where the u i ’s are independent, normal and have constant variance. is the lag 1 correlation: this is the autoregressive model of order 1 NB
14
2/25/2016330 lecture 1214 Durbin-Watson test (2) When = 0, the errors are independent The DW test tests independence by testing = 0 is estimated by
15
2/25/2016330 lecture 1215 Durbin-Watson test (3) DW test statistic is Value of DW is between 0 and 4 Values of DW around 2 are consistent with independence Values close to 4 indicate negative serial correlation Values close to 0 indicate positive serial correlation
16
2/25/2016330 lecture 1216 Durbin-Watson test (4) There exist values d L, d U depending on the number of variables k in the regression and the sample size n – see table on next slide Use the value of DW to decide on independence as follows: 044-d U 4-d L dLdL dUdU Positive autocorrelation Negative autocorrelation Independence Inconclusive
17
2/25/2016330 lecture 1217 Durbin-Watson table
18
2/25/2016330 lecture 1218 Example: the advertising data Sales and advertising data Data on monthly sales and advertising spend for 35 months Model is Sales ~ spend + prev.spend (prev.spend = spend in previous month)
19
2/25/2016330 lecture 1219 > ad.df spend prev.spend sales 1 16 15 20.5 2 18 16 21.0 3 27 18 15.5 4 21 27 15.3 5 49 21 23.5 6 21 49 24.5 7 22 21 21.3 8 28 22 23.5 9 36 28 28.0 10 40 36 24.0 11 3 40 15.5 12 21 3 17.3 … 35 lines in all Advertising data
20
2/25/2016330 lecture 1220 R code for residual vs previous plot advertising.lm<-lm(sales~spend + prev.spend, data = ad.df) res<-residuals(advertising.lm) n<-length(res) plot.res<-res[-1] prev.res<-res[-n] plot(prev.res,plot.res, xlab="previous residual",ylab="residual",main="Residual versus previous residual \n for the advertising data") abline(coef(lm(plot.res~prev.res)), col="red", lwd=2)
21
2/25/2016330 lecture 1221
22
2/25/2016330 lecture 1222 Time series plot, correlogram – R code par(mfrow=c(2,1)) plot(res, type="b", xlab="Time Sequence", ylab = "Residual", main = "Time series plot of residuals for the advertising data") abline(h=0, lty=2, lwd=2,col="blue") acf(res, main ="Correlogram of residuals for the advertising data")
23
2/25/2016330 lecture 1223 Increasing trend?
24
2/25/2016330 lecture 1224 Calculating DW > rhohat<-cor(plot.res,prev.res) > rhohat [1] 0.4450734 > DW<-2*(1-rhohat) > DW [1] 1.109853 For n=35 and k=2, d L = 1.34. Since DW = 1.109 < d L = 1.34, strong evidence of positive serial correlation
25
2/25/2016330 lecture 1225 Durbin-Watson table use (1.28 + 1.39)/2 = 1.34
26
2/25/2016330 lecture 1226 Remedy (1) If we detect serial correlation, we need to fit special time series models to the data. For full details see STATS 326/726. Assuming that the AR(1) model is ok, we can use the arima function in R to fit the regression
27
2/25/2016330 lecture 1227 Fitting a regression with AR(1) errors > arima(ad.df$sales,order=c(1,0,0), xreg=cbind(spend,prev.spend)) Call: arima(x = ad.df$sales, order = c(1, 0, 0), xreg = cbind(spend, prev.spend)) Coefficients: ar1 intercept spend prev.spend 0.4966 16.9080 0.1218 0.1391 s.e. 0.1580 1.6716 0.0308 0.0316 sigma^2 estimated as 9.476: log likelihood = -89.16, aic = 188.32
28
2/25/2016330 lecture 1228 Comparisons lmarima Const (std err)15.60 (1.34)16.90 (1.67) Spend (std err)0.142 (0.035)0.128 (0.031) Prev Spend (Std err) 0.166 (0.036)0.139 (0.031) 1 st order Correlation 0.4420.497 Sigma3.6523.078
29
2/25/2016330 lecture 1229 Remedy (2) Recall there was a trend in the time series plot of the residuals, these seem related to time Thus, time is a “lurking variable”, a variable that should be in the regression but isn’t Try model Sales ~ spend + prev.spend + time
30
2/25/2016330 lecture 1230 Fitting new model time=1:35 new.advertising.lm<-lm(sales~spend + prev.spend + time, data = ad.df) res<-residuals(new.advertising.lm) n<-length(res) plot.res<-res[-1] prev.res<-res[-n] DW = 2*(1-cor(plot.res,prev.res))
31
2/25/2016330 lecture 1231 DW Retest DW is now 1.73 For a model with 3 explanatory variables, d u is about 1.66 (refer to the table), so no evidence of serial correlation Time is a highly significant variable in the regression Problem is fixed!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.