STATS 330: Lecture 16 Case Study 7/17/ lecture 16

STATS 330: Lecture 16 Case Study 7/17/2018 330 lecture 16

Case study Aim of today’s lecture
To illustrate the modelling process using the evaporation data. 7/17/2018 330 lecture 16 STATS 330 lect 16

The Evaporation data Data in data frame evap.df Aims of the analysis:
Understand relationships between explanatory variables and the response Be able to predict evaporation loss given the other variables 7/17/2018 330 lecture 16 STATS 330 lect 16

Case Study: Evaporation data
Recall from Lecture 15: variables are evap: the amount of moisture evaporating from the soil in the 24 hour period (response) maxst: maximum soil temperature over the 24 hour period minst: minimum soil temperature over the 24 hour period avst: average soil temperature over the 24 hour period maxat: maximum air temperature over the 24 hour period minat: minimum air temperature over the 24 hour period avat: average air temperature over the 24 hour period maxh: maximum humidity over the 24 hour period minh: minimum humidity over the 24 hour period avh: average humidity over the 24 hour period wind: average wind speed over the 24 hour period. 7/17/2018 330 lecture 16 STATS 330 lect 16

Modelling cycle Choose Model Fit model Examine residuals Transform
Bad fit Good fit Use model Plots, theory 7/17/2018 330 lecture 16 STATS 330 lect 16

Modelling cycle (2) Our plan of attack: Graphical check
Suitability for regression Gross outliers Preliminary fit Model selection (for prediction) Transforming if required Outlier check Use model for prediction 7/17/2018 330 lecture 16 STATS 330 lect 16

Step 1: Plots Preliminary plots
Want to get an initial idea of suitability of data for regression modelling Check for linear relationships, outliers Pairs plots, coplots Data looks OK to proceed, but evap/maxh plot looks curved 7/17/2018 330 lecture 16 STATS 330 lect 16

7/17/2018 330 lecture 16

Points to note Avh has very few values
Strong relationships between response and some variables (particularly maxh, avst) Not much relationship between response and minst, minat, wind strong relationships between min, av and max No obvious outliers 7/17/2018 330 lecture 16 STATS 330 lect 16

Step 2: preliminary fit Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) avst * minst maxst * avat minat maxat avh minh maxh ** wind Residual standard error: on 35 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 10 and 35 DF, p-value: 2.073e-11 7/17/2018 330 lecture 16

7/17/2018 330 lecture 16

Findings Plots OK, normality dubious
Gam plots indicated no transformations Point 31 has quite high Cooks distance but removing it doesn’t change regression much Model is OK. Could interpret coefficients, but variables highly correlated. 7/17/2018 330 lecture 16

Step 3: Model selection Use APR Model selected was
evap ~ maxat + maxh + wind However, this model does not fit all that well (outliers, non-normality) Try “best AIC” model evap ~ avst + maxst + maxat + minh+maxh Now proceed to step 4 7/17/2018 330 lecture 16 STATS 330 lect 16

Step 4: Diagnostic checks
For a quick check, plot the regression object produced by lm model1.lm<-lm(evap ~ avst + maxst + maxat + minh+maxh, data=evap.df) plot(model1.lm) 7/17/2018 330 lecture 16 STATS 330 lect 16

Outliers ? Non-normal? 7/17/2018 330 lecture 16 STATS 330 lect 16

Conclusions? No real evidence of non-linearity, but will check further with gams Normal plot looks curved Some largish outliers Points 2, 41 have largish Cooks D 7/17/2018 330 lecture 16 STATS 330 lect 16

Checking linearity Check for linearity with gams > library(mgcv)
>plot(gam(evap ~ s(avst) + s(maxst) + s(maxat) + s(maxh) + s(wind), data=evap.df)) 7/17/2018 330 lecture 16 STATS 330 lect 16

Transform avst, maxh ? 7/17/2018 330 lecture 16 STATS 330 lect 16

Remedy Gam plots for avst and maxh are curved
Try cubics in these variables Plots look better Cubic terms are significant 7/17/2018 330 lecture 16 STATS 330 lect 16

7/17/2018 330 lecture 16

> summary(model2.lm) Coefficients:
> model2.lm<-lm(evap ~ poly(avst,3) + maxst + maxat + minh+poly(maxh,3), data=evap.df) > summary(model2.lm) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ** poly(avst, 3) ** poly(avst, 3) * poly(avst, 3) maxst e-05 *** maxat ** minh poly(maxh, 3) e-05 *** poly(maxh, 3) poly(maxh, 3) * --- Residual standard error: on 36 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 9 and 36 DF, p-value: 4.459e-15 7/17/2018 330 lecture 16

New model > influenceplots(model2.lm) Lets now adopt model
lm(evap~poly(avst,3)+maxst+maxat+poly(maxh,3) + wind Outliers are not too bad but lets check > influenceplots(model2.lm) 7/17/2018 330 lecture 16 STATS 330 lect 16

7/17/2018 330 lecture 16

Deletion of points Points 2, 6, 7, 41 are affecting the fitted values, some coefficients. Removing these one at a time and refitting indicates that the cubics are not very robust, so we revert to the non-polynomial model The coefficients of the non-polynomial model are fairly stable when we delete these points one at a time, so we decide to retain them. 7/17/2018 330 lecture 16 STATS 330 lect 16

Normality? However, the normal plot for the non-polynomial model is not very straight – WB test has p-value 0. Normality of polynomial model is better Try predictions with both 7/17/2018 330 lecture 16 STATS 330 lect 16

predict.df = data.frame(avst = mean(evap.df$avst),
maxst = mean(evap.df$maxst), maxat = mean(evap.df$maxat), maxh = mean(evap.df$maxh), minh = mean(evap.df$minh)) rbind(predict(model1.lm, predict.df,interval="p" ), predict(model2.lm, predict.df,interval="p" )) fit lwr upr CV fit: 7/17/2018 330 lecture 16

STATS 330: Lecture 16 Case Study 7/17/ lecture 16

Similar presentations

Presentation on theme: "STATS 330: Lecture 16 Case Study 7/17/ lecture 16"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

STATS 330: Lecture 16 Case Study 7/17/ lecture 16

Similar presentations

Presentation on theme: "STATS 330: Lecture 16 Case Study 7/17/ lecture 16"— Presentation transcript:

Similar presentations

About project

Feedback