Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed data.
Options for fixing problems with the model Abandon the simple linear regression model and find a more appropriate (but typically more complex) model. Transform the data so that the simple linear regression model works for the transformed (new) data.
Abandoning the model If not linear: try a different function, like a quadratic (Ch. 7) or an exponential function (Ch. 13). If unequal error variances: use weighted least squares (Ch. 10). If error terms are not independent: try fitting a time series model (Ch. 12). If important predictor variables omitted: try fitting a multiple regression model (Ch. 6). If outlier: use robust estimation procedure (Ch. 10).
Choices for transforming the data Transform X values only. Transform Y values only. Transform both X and Y values simultaneously.
If the only thing wrong with your model is that linear doesn’t work… Try transforming only the X values. You wouldn’t want to transform the Y values here, because you might change the well-behaved error terms (normal, equal variances) into badly-behaved error terms (not normal, unequal variances).
Example 1: Memory retention time prop Subjects asked to memorize a list of disconnected items. Asked to recall them at various times up to a week later Predictor time = time, in minutes, since initially memorized the list. Response prop = proportion of items recalled correctly.
Example 1: Fitted line plot
Example 1: Residual vs. fits plot
Example 1: Normal probability plot
Example 1: Transform the X data time prop log10_time Change (“transform”) the predictor time to log 10 (time).
Example 1: New fitted line plot
Example 1: Predicting new proportion Estimated regression function: Therefore, we predict the proportion of words recalled after 1000 days is:
Example 1: New residuals vs. fits plot
Example 1: Normal probability plot
Some possible transformations of X These are guidelines only and not complete. It usually takes some trial and error to find the best transformation.
Example 1: Time* = 1/Time
Example 1: Time* = exp(-Time)
If evidence of non-normality and unequal error variances … Since it is the shapes and spreads of the Y distributions that need to be changed, try transforming the Y values. Transformation on Y may also help “straighten out” a curved relationship. May also need to simultaneously transform the X values.
Example 2: Gestation time and birthweight for mammals Mammal Birthwgt Gestation Goat Sheep Deer Porcupine Bear Hippo Horse Camel Zebra Giraffe Elephant Predictor Birthwgt = birthweight, in kg, of mammal. Response Gestation = number of days until birth
Example 2: Fitted line plot
Example 2: Residual vs. fits plot
Example 2: Normal probability plot
Example 2: Transform the Y data Mammal Birthwgt Gestation logGest Goat Sheep Deer Porcupine Bear Hippo Horse Camel Zebra Giraffe Elephant Change (“transform”) the response Gestation to log 10 (Gestation).
Example 2: New fitted line plot
Example 2: Predicting new gestation Estimated regression function: Therefore, since: we predict the gestation length of another mammal at 50 kgs to be:
Example 2: New residual vs fits plot
Example 2: New normal probability plot
Some possible transformations of Y if not normal and unequal variances These are guidelines only. It usually takes trial and error to find the best transformation. And maybe a simultaneous transformation on X.
Example 3: Length and Weight of Alligators
Example 3: Residuals vs fits plot
Example 3: Normal probability plot
Example 3: Transform the data weight length loge_wt loge_len … and so on … Transform predictor weight to log e (weight) Transform response length to log e (length)
Example 3: New fitted line plot
Example 3: New residual plot
Example 3: New normal probability plot
Transforming data in Minitab Calc >> Calculator … In box labeled “Store result in variable,”, tell Minitab in which column (variable) you want the transformed data stored. Type (input) the expression for the desired transformation in the box labeled Expression. Use the available functions. Select okay. The data will appear in the column of the worksheet that you specified.