2/25/2016330 lecture 121 STATS 330: Lecture 12. 2/25/2016330 lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence.

2/25/2016330 lecture 121 STATS 330: Lecture 12

2/25/2016330 lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence

2/25/2016330 lecture 123 Independence  One of the regression assumptions is that the errors are independent.  Data that is collected sequentially over time often have errors that are not independent.  If the independence assumption does not hold, then the standard errors will be wrong and the tests and confidence intervals will be unreliable.  Thus, we need to be able to detect lack of independence.

2/25/2016330 lecture 124 Types of dependence  If large positive errors have a tendency to follow large positive errors, and large negative errors a tendency to follow large negative errors, we say the data has positive autocorrelation  If large positive errors have a tendency to follow large negative errors, and large negative errors a tendency to follow large positive errors, we say the data has negative autocorrelation

2/25/2016330 lecture 125 Diagnostics  If the errors are positively autocorrelated, Plotting the residuals against time will show long runs of positive and negative residuals Plotting residuals against the previous residual (ie e i vs e i-1 ) will show a positive trend A correlogram of the residuals will show positive spikes, gradually decaying

2/25/2016330 lecture 126 Diagnostics (2) If the errors are negatively autocorrelated, Plotting the residuals against time will show alternating positive and negative residuals Plotting residuals against the previous residual (ie e i vs e i-1 ) will show a negative trend A correlogram of the residuals will show alternating positive and negative spikes, gradually decaying

2/25/2016330 lecture 127 Residuals against time res<-residuals(lm.obj) plot(1:length(res),res, xlab=“time”,ylab=“residuals”, type=“b”) lines(1:length(res),res) abline(h=0, lty=2) Can omit the “x” vector if it is sequence numbers Dotted line at 0 (mean residual) Dots/lines

2/25/2016330 lecture 128

2/25/2016330 lecture 129 Residuals against previous res<-residuals(lm.obj) n<-length(res) plot.res<-res[-1] # element 1 has no previous prev.res<-res[-n] # have to be equal length plot(prev.res,plot.res, xlab=“previous residual”,ylab=“residual”)

2/25/2016330 lecture 1210 Plots for different degrees of autocorrelation

2/25/2016330 lecture 1211 Correlogram acf(residuals(lm.obj))  Correlogram (autocorrelation function, acf) is plot of lag k autocorrelation versus k  Lag k autocorrelation is correlation of residuals k time units apart

2/25/2016330 lecture 1212

2/25/2016330 lecture 1213 Durbin-Watson test  We can also do a formal hypothesis test, (the Durbin-Watson test) for independence  The test assumes the errors follow a model of the form where the u i ’s are independent, normal and have constant variance.  is the lag 1 correlation: this is the autoregressive model of order 1 NB

2/25/2016330 lecture 1214 Durbin-Watson test (2)  When  = 0, the errors are independent  The DW test tests independence by testing  = 0   is estimated by

2/25/2016330 lecture 1215 Durbin-Watson test (3) DW test statistic is  Value of DW is between 0 and 4  Values of DW around 2 are consistent with independence  Values close to 4 indicate negative serial correlation  Values close to 0 indicate positive serial correlation

2/25/2016330 lecture 1216 Durbin-Watson test (4)  There exist values d L, d U depending on the number of variables k in the regression and the sample size n – see table on next slide  Use the value of DW to decide on independence as follows: 044-d U 4-d L dLdL dUdU Positive autocorrelation Negative autocorrelation Independence Inconclusive

2/25/2016330 lecture 1217 Durbin-Watson table

2/25/2016330 lecture 1218 Example: the advertising data  Sales and advertising data  Data on monthly sales and advertising spend for 35 months  Model is Sales ~ spend + prev.spend (prev.spend = spend in previous month)

2/25/2016330 lecture 1219 > ad.df spend prev.spend sales 1 16 15 20.5 2 18 16 21.0 3 27 18 15.5 4 21 27 15.3 5 49 21 23.5 6 21 49 24.5 7 22 21 21.3 8 28 22 23.5 9 36 28 28.0 10 40 36 24.0 11 3 40 15.5 12 21 3 17.3 … 35 lines in all Advertising data

2/25/2016330 lecture 1220 R code for residual vs previous plot advertising.lm<-lm(sales~spend + prev.spend, data = ad.df) res<-residuals(advertising.lm) n<-length(res) plot.res<-res[-1] prev.res<-res[-n] plot(prev.res,plot.res, xlab="previous residual",ylab="residual",main="Residual versus previous residual \n for the advertising data") abline(coef(lm(plot.res~prev.res)), col="red", lwd=2)

2/25/2016330 lecture 1221

2/25/2016330 lecture 1222 Time series plot, correlogram – R code par(mfrow=c(2,1)) plot(res, type="b", xlab="Time Sequence", ylab = "Residual", main = "Time series plot of residuals for the advertising data") abline(h=0, lty=2, lwd=2,col="blue") acf(res, main ="Correlogram of residuals for the advertising data")

2/25/2016330 lecture 1223 Increasing trend?

2/25/2016330 lecture 1224 Calculating DW > rhohat<-cor(plot.res,prev.res) > rhohat [1] 0.4450734 > DW<-2*(1-rhohat) > DW [1] 1.109853 For n=35 and k=2, d L = 1.34. Since DW = 1.109 < d L = 1.34, strong evidence of positive serial correlation

2/25/2016330 lecture 1225 Durbin-Watson table use (1.28 + 1.39)/2 = 1.34

2/25/2016330 lecture 1226 Remedy (1)  If we detect serial correlation, we need to fit special time series models to the data.  For full details see STATS 326/726.  Assuming that the AR(1) model is ok, we can use the arima function in R to fit the regression

2/25/2016330 lecture 1227 Fitting a regression with AR(1) errors > arima(ad.df$sales,order=c(1,0,0), xreg=cbind(spend,prev.spend)) Call: arima(x = ad.df$sales, order = c(1, 0, 0), xreg = cbind(spend, prev.spend)) Coefficients: ar1 intercept spend prev.spend 0.4966 16.9080 0.1218 0.1391 s.e. 0.1580 1.6716 0.0308 0.0316 sigma^2 estimated as 9.476: log likelihood = -89.16, aic = 188.32

2/25/2016330 lecture 1228 Comparisons lmarima Const (std err)15.60 (1.34)16.90 (1.67) Spend (std err)0.142 (0.035)0.128 (0.031) Prev Spend (Std err) 0.166 (0.036)0.139 (0.031) 1 st order Correlation 0.4420.497 Sigma3.6523.078

2/25/2016330 lecture 1229 Remedy (2)  Recall there was a trend in the time series plot of the residuals, these seem related to time  Thus, time is a “lurking variable”, a variable that should be in the regression but isn’t  Try model Sales ~ spend + prev.spend + time

2/25/2016330 lecture 1230 Fitting new model time=1:35 new.advertising.lm<-lm(sales~spend + prev.spend + time, data = ad.df) res<-residuals(new.advertising.lm) n<-length(res) plot.res<-res[-1] prev.res<-res[-n] DW = 2*(1-cor(plot.res,prev.res))

2/25/2016330 lecture 1231 DW Retest  DW is now 1.73  For a model with 3 explanatory variables, d u is about 1.66 (refer to the table), so no evidence of serial correlation  Time is a highly significant variable in the regression  Problem is fixed!

2/25/2016330 lecture 121 STATS 330: Lecture 12. 2/25/2016330 lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence.

Similar presentations

Presentation on theme: "2/25/2016330 lecture 121 STATS 330: Lecture 12. 2/25/2016330 lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

2/25/2016330 lecture 121 STATS 330: Lecture 12. 2/25/2016330 lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence.

Similar presentations

Presentation on theme: "2/25/2016330 lecture 121 STATS 330: Lecture 12. 2/25/2016330 lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence."— Presentation transcript:

Similar presentations

About project

Feedback