Presentation is loading. Please wait.

Presentation is loading. Please wait.

Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed.

Similar presentations


Presentation on theme: "Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed."— Presentation transcript:

1 Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed data.

2 Options for fixing problems with the model Abandon the simple linear regression model and find a more appropriate (but typically more complex) model. Transform the data so that the simple linear regression model works for the transformed (new) data.

3 Abandoning the model If not linear: try a different function, like a quadratic (Ch. 7) or an exponential function (Ch. 13). If unequal error variances: use weighted least squares (Ch. 10). If error terms are not independent: try fitting a time series model (Ch. 12). If important predictor variables omitted: try fitting a multiple regression model (Ch. 6). If outlier: use robust estimation procedure (Ch. 10).

4 Choices for transforming the data Transform X values only. Transform Y values only. Transform both X and Y values simultaneously.

5 If the only thing wrong with your model is that linear doesn’t work… Try transforming only the X values. You wouldn’t want to transform the Y values here, because you might change the well-behaved error terms (normal, equal variances) into badly-behaved error terms (not normal, unequal variances).

6 Example 1: Memory retention time prop 1 0.84 5 0.71 15 0.61 30 0.56 60 0.54 120 0.47 240 0.45 480 0.38 720 0.36 1440 0.26 2880 0.20 5760 0.16 10080 0.08 Subjects asked to memorize a list of disconnected items. Asked to recall them at various times up to a week later Predictor time = time, in minutes, since initially memorized the list. Response prop = proportion of items recalled correctly.

7 Example 1: Fitted line plot

8 Example 1: Residual vs. fits plot

9 Example 1: Normal probability plot

10 Example 1: Transform the X data time prop log10_time 1 0.84 0.00000 5 0.71 0.69897 15 0.61 1.17609 30 0.56 1.47712 60 0.54 1.77815 120 0.47 2.07918 240 0.45 2.38021 480 0.38 2.68124 720 0.36 2.85733 1440 0.26 3.15836 2880 0.20 3.45939 5760 0.16 3.76042 10080 0.08 4.00346 Change (“transform”) the predictor time to log 10 (time).

11 Example 1: New fitted line plot

12 Example 1: Predicting new proportion Estimated regression function: Therefore, we predict the proportion of words recalled after 1000 days is:

13 Example 1: New residuals vs. fits plot

14 Example 1: Normal probability plot

15 Some possible transformations of X These are guidelines only and not complete. It usually takes some trial and error to find the best transformation.

16 Example 1: Time* = 1/Time

17 Example 1: Time* = exp(-Time)

18 If evidence of non-normality and unequal error variances … Since it is the shapes and spreads of the Y distributions that need to be changed, try transforming the Y values. Transformation on Y may also help “straighten out” a curved relationship. May also need to simultaneously transform the X values.

19 Example 2: Gestation time and birthweight for mammals Mammal Birthwgt Gestation Goat 2.75 155 Sheep 4.00 175 Deer 0.48 190 Porcupine1.50 210 Bear 0.37 213 Hippo 50.00 243 Horse 30.00 340 Camel 40.00 380 Zebra 40.00 390 Giraffe98.00 457 Elephant113.00670 Predictor Birthwgt = birthweight, in kg, of mammal. Response Gestation = number of days until birth

20 Example 2: Fitted line plot

21 Example 2: Residual vs. fits plot

22 Example 2: Normal probability plot

23 Example 2: Transform the Y data Mammal Birthwgt Gestation logGest Goat 2.75 155 2.19033 Sheep 4.00 175 2.24304 Deer 0.48 190 2.27875 Porcupine1.50 210 2.32222 Bear 0.37 213 2.32838 Hippo 50.00 243 2.38561 Horse 30.00 340 2.53148 Camel 40.00 380 2.57978 Zebra 40.00 390 2.59106 Giraffe 98.00 457 2.65992 Elephant 113.00 670 2.82607 Change (“transform”) the response Gestation to log 10 (Gestation).

24 Example 2: New fitted line plot

25 Example 2: Predicting new gestation Estimated regression function: Therefore, since: we predict the gestation length of another mammal at 50 kgs to be:

26 Example 2: New residual vs fits plot

27 Example 2: New normal probability plot

28 Some possible transformations of Y if not normal and unequal variances These are guidelines only. It usually takes trial and error to find the best transformation. And maybe a simultaneous transformation on X.

29 Example 3: Length and Weight of Alligators

30 Example 3: Residuals vs fits plot

31 Example 3: Normal probability plot

32 Example 3: Transform the data weight length loge_wt loge_len 130 94 4.86753 4.54329 51 74 3.93183 4.30407 640 147 6.46147 4.99043 28 58 3.33220 4.06044 80 86 4.38203 4.45435 110 94 4.70048 4.54329 33 63 3.49651 4.14313 90 86 4.49981 4.45435 36 69 3.58352 4.23411 38 72 3.63759 4.27667 366 128 5.90263 4.85203 84 85 4.43082 4.44265 80 82 4.38203 4.40672 102 90 4.62497 4.49981 … and so on … Transform predictor weight to log e (weight) Transform response length to log e (length)

33 Example 3: New fitted line plot

34 Example 3: New residual plot

35 Example 3: New normal probability plot

36 Transforming data in Minitab Calc >> Calculator … In box labeled “Store result in variable,”, tell Minitab in which column (variable) you want the transformed data stored. Type (input) the expression for the desired transformation in the box labeled Expression. Use the available functions. Select okay. The data will appear in the column of the worksheet that you specified.


Download ppt "Remedial measures … or “how to fix problems with the model” Transforming the data so that the simple linear regression model is okay for the transformed."

Similar presentations


Ads by Google