Download presentation
Presentation is loading. Please wait.
Published byPaula Mathews Modified over 9 years ago
1
5/13/2015330 lecture 91 STATS 330: Lecture 9
2
5/13/2015330 lecture 92 Diagnostics Aim of today’s lecture: To give you an overview of the modelling cycle, and begin the detailed discussion of diagnostic procedures
3
5/13/2015330 lecture 93 The modelling cycle We have seen that the regression model describes rather specialized forms of data Data are “planar” Scatter is uniform over the plane We have looked at some plots that help us decide if the data is suitable for regression modelling pairs reg3d coplots
4
5/13/2015330 lecture 94 Residual analysis Another approach is to fit the model and examine the residuals If the model is appropriate the residuals have no pattern A pattern in the residuals usually indicates that the model is not appropriate If this is the case we have two options Option 1: Select another form of model eg non-linear regression – see other courses Option 2: transform the data so that the regression model fits the transformed data – see next slide
5
5/13/2015330 lecture 95 Modelling cycle This leads us into a modelling cycle Fit Examine residuals Transform data or change model if necessary This cycle is repeated until we are happy with the fitted model Diagramatically….
6
5/13/2015330 lecture 96 Modelling cycle Examine residuals Fit model Transform/ change Choose Model Bad fit Good fit Use model Plots, theory
7
5/13/2015330 lecture 97 What makes a bad fit? Non-planar data Outliers in the data Errors depend on covariates (non-constant scatter) Errors not independent Errors not normally distributed First two points can be serious as they affect the meaning and accuracy of the estimated coefficients. The others affect mainly standard errors, not estimated coefficients – see subsequent lectures
8
5/13/2015330 lecture 98 Detecting non-planar data For the next 2 lectures, we look at diagnosing non-planar data and choosing a transformation. We can diagnose non-planar data (nonlinearity) by fitting the model, and plotting residuals versus fitted values, residuals against explanatory variables fitting additive models In each case, a curved plot indicates non-planar data
9
5/13/2015330 lecture 99 Plotting residuals versus fitted values plot(cherry.lm, which=1) which = 1 selects a plot of residuals versus fitted values
10
5/13/2015330 lecture 910 Indication of a curve: so data non-planar
11
5/13/2015330 lecture 911 Additive models These are models of the form Y=g 1 (x 1 ) + g 2 (x 2 ) + … + g k (x k ) + where g 1,…,g k are transformations Fitted using the gam function in R The transformations are estimated by the software Use the function to suggest good transformations
12
5/13/2015330 lecture 912 Example: cherry trees > library(mgcv) This is mgcv 0.8-8 > cherry.gam<- gam(volume~s(height) + s(diameter),data=cherry.df) > plot(cherry.gam) Don’t forget the s if you want to estimate the transformation
13
5/13/2015330 lecture 913 Example Strong suggestion that diameter be transformed
14
5/13/2015330 lecture 914 Fitting polynomials To fit model y = b 0 +b 1 x + b 2 x 2, use y ~ poly(x,2) To fit model y = b 0 +b 1 x + b 2 x 2 + b 3 x 3, use y ~ poly(x,3) and so on
15
5/13/2015330 lecture 915 Orthogonal polynomials The model fitted by y~poly(x,2) is of the form Y = 0 + 1 p 1 (x) + 2 p 2 (x) where p 1 is a polynomial of degree 1 ie of form ax + b and p 2 is a polynomial of degree 2 ie of form ax 2 + bx + c p 1, p 2 chosen to be uncorrelated (best possible estimation)
16
5/13/2015330 lecture 916 Adding a quadratic term (cherry trees) Suggestion of quadratic behaviour in diameter, so fit a quadratic model in diameter (add a term diameter^2) cherry2.lm=lm(volume~poly(diameter,2)+height,data=cherry.df) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.56553 6.72218 0.233 0.817603 poly(diameter, 2)1 80.25223 3.07346 26.111 < 2e-16 *** poly(diameter, 2)2 15.39923 2.63157 5.852 3.13e-06 *** height 0.37639 0.08823 4.266 0.000218 *** --- Residual standard error: 2.625 on 27 degrees of freedom Multiple R-Squared: 0.9771, Adjusted R-squared: 0.9745 How to add a quadratic term R2 improved from 94.8% to 97.7%
17
5/13/2015330 lecture 917 Equation of quadratic To see which quadratic is fitted, use > cherry.quad = lm(volume ~ diameter + I(diameter^2) + height, data=cherry.df) > summary(cherry.quad) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -9.92041 10.07911 -0.984 0.333729 diameter -2.88508 1.30985 -2.203 0.036343 * I(diameter^2) 0.26862 0.04590 5.852 3.13e-06 *** height 0.37639 0.08823 4.266 0.000218 *** Residual standard error: 2.625 on 27 degrees of freedom Multiple R-Squared: 0.9771, Adjusted R-squared: 0.9745 Surface fitted fitted is -9.92041 - 2.88508 x diameter + 0.26862 x diameter 2 + 0.37639 x height Same model!
18
Splines An alternative to polynomials are splines – these are piecewise cubics, which join smoothly at “knots” Give a more flexible fit to the data Values at one point not affected by values at distant points, unlike polynomials 5/13/2015330 lecture 918
19
Example with 4 knots 5/13/2015330 lecture 919
20
5/13/2015330 lecture 920 Adding a spline term (cherry trees) library(splines) cherry3.lm=lm(volume~bs(diameter,6)+height,data=cherry.df) Coefficients: (Intercept) -16.3679 7.4856 -2.187 0.03921 * bs(diameter, df = 6)1 0.1941 7.9374 0.024 0.98070 bs(diameter, df = 6)2 5.5744 3.1704 1.758 0.09201. bs(diameter, df = 6)3 10.7976 3.9798 2.713 0.01240 * bs(diameter, df = 6)4 31.4053 5.5545 5.654 9.35e-06 *** bs(diameter, df = 6)5 42.2665 6.1297 6.895 4.97e-07 *** bs(diameter, df = 6)6 58.6454 4.2781 13.708 1.49e-12 *** height 0.3970 0.1050 3.780 0.00097 *** Residual standard error: 2.8 on 23 degrees of freedom Multiple R-squared: 0.9778, Adjusted R-squared: 0.971 How to add a spline term (3 knots) R2 improved from 94.8% to 97.78%
21
5/13/2015330 lecture 921
22
5/13/2015330 lecture 922 Example: Tyre abrasion data Data collected in an experiment to study the abrasion resistance of tyres Variables are Hardness: Hardness of rubber Tensile: Tensile strength of rubber Abrasion Loss: Amount of rubber worn away in a standard test (response)
23
5/13/2015330 lecture 923 Tyre abrasion data > rubber.df hardness tensile abloss 1 45 162 372 2 61 232 175 3 71 231 136 4 81 224 55 5 53 203 221 6 64 210 164 7 79 196 82 8 56 200 228 9 75 188 128 10 88 119 64 11 71 151 219 … 30 observations in all
24
5/13/2015330 lecture 924 Tyre abrasion data > rubber.lm<-lm(abloss~hardness + tensile, data=rubber.df) > summary(rubber.lm) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 885.1611 61.7516 14.334 3.84e-14 *** hardness -6.5708 0.5832 -11.267 1.03e-11 *** tensile -1.3743 0.1943 -7.073 1.32e-07 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 36.49 on 27 degrees of freedom Multiple R-Squared: 0.8402, Adjusted R-squared: 0.8284 F-statistic: 71 on 2 and 27 DF, p-value: 1.767e-11
25
5/13/2015330 lecture 925 Tyre abrasion data We will use this example to illustrate all the methods we have discussed so far to check if the data are planar, scattered about a flat regression plane i.e. Pairs Spinning plot Coplot Residual versus fitted value plot Fitting GAMS
26
5/13/2015330 lecture 926 Tyre abrasion data (pairs)
27
Spinning 5/13/2015330 lecture 927
28
5/13/2015330 lecture 928 Tyre abrasion data (coplot)
29
5/13/2015330 lecture 929 Tyre abrasion data: residuals v fitted values data(rubber.df) rubber.lm = lm(abloss~., data=rubber.df) plot(rubber.lm, which=1)
30
5/13/2015330 lecture 930 Tyre abrasion data: gams library(mgcv) rubber.gam<- gam(abloss~s(tensile) + s(hardness),data=rubber.df) plot(rubber.gam)
31
5/13/2015330 lecture 931 Tyre abrasion data: Conclusions Pairs plot : not very informative Spinner: Hint of a “kink” Coplot: suggestion of non-linearity, a “kink” Residuals versus fitted values: weak suggestion that regression is not planar gam plots: hardness is OK, but strong suggestion that tensile needs transforming Suggested transformation: looks a bit like a cubic or 4 th degree polynomial, so try these (or a spline). Using a spline, the R 2 is increased from 84% to 94.3%.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.