Download presentation
Presentation is loading. Please wait.
Published byRalph Stokes Modified over 9 years ago
1
Occasionally, we are able to see clear violations of the constant variance assumption by looking at a residual plot - characteristic “funnel” shape… often this can be fixed through a variance stabilizing transformation. if the standard deviation of the response is proportional to the mean, then often the logarithm transformation of the response works…do a regression of log(y) against the explanatory variables if the variance of the response is proportional to the mean, then often the square root transformation of the response works… do a regression of sqrt(y) against the expl. variables…
2
In any case, always perform the transformation on the response and then refit the regression and check the residuals to make sure you’ve found the transformation that shows the best residual plots. Note that if you transform the response you will probably need to express the predictions back in the original scale - so if you fit log(y) the prediction will be exp( ). The regression coefficients will have to be interpreted though on the transformed scale. For the log transform though, we have a nice interpretation:
3
This implies that an increase in 1 for x 1 means that the original response is predicted to increase by a factor of ; this means that the coefficients can be interpreted as multiplicative effects instead of additive ones. Let’s consider the Box-Cox method of determining a transformation. It should be used with positive response variables and the method finds the transformation that gives the best fit. It uses the general formula Using maximum likelihood we may find the “best” value of lambda - actually a confidence interval for lambda … see the R code…
4
#read in the gasconsumption data #bring in the MASS library and apply the #boxcox function on the simple linear model attach(gasconsumption) g=lm(MPG~WT) ; summary(g) library(MASS) boxcox(g,plotit=T) #plot log-likelihood #against lambda - find the maximum #notice that values between ~.25 and -1.5 #are in the 95% confidence interval of #the maximum. Your authors chose -1 #and worked with GPM instead of MPG since #GPM=1/MPG. If you want to find the exact #lambda, try this… l=boxcox(g); l$x[l$y==max(l$y)] #note this is #harder to interpret than it’s rounded value #-1…
5
Now for practice, load the faraway library and get the dataset called prostate. Look at the help file for the dataset and go through the various diagnostics that we’ve considered in this chapter and find the best model for predicting log(psa) –check the normality assumption on the errors - are any transformations required? –find large leverage points & look for outliers –see if there are influential points –is the constant variance assumption met? HW: Work on #6.1, 6.14, 6.15, 6.18, 6.20,6.21, 6.23
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.