1 Re-expressing Data Chapter 6 – Normal Model –What if data do not follow a Normal model? Chapters 8 & 9 – Linear Model –What if a relationship between two variables is not linear?
2 Re-expressing Data Re-expression is another name for changing the scale of (transforming) the data. Usually we re-express the response variable, Y.
3 Goals of Re-expression Goal 1 – Make the distribution of the re-expressed data more symmetric. Goal 2 – Make the spread of the re-expressed data more similar across groups.
4 Goals of Re-expression Goal 3 – Make the form of a scatter plot more linear. Goal 4 – Make the scatter in the scatter plot more even across all values of the explanatory variable.
5 Ladder of Powers Power: 2 Re-expression: Comment: Use on left skewed data.
6 Ladder of Powers Power: 1 Re-expression: Comment: No re-expression. Do not re-express the data if they are already well behaved.
7 Ladder of Powers Power: ½ Re-expression: Comment: Use on count data or when scatter in a scatter plot tends to increase as the explanatory variable increases.
8 Ladder of Powers Power: “0” Re-expression: Comments: Not really the “0” power. Use on right skewed data. Measurements cannot be negative or zero.
9 Ladder of Powers Power: –½, –1 Re-expression: Comments: Use on right skewed data. Measurements cannot be negative or zero. Use on ratios.
10 Goal 1 - Symmetry Data are obtained on the time between nerve pulses along a nerve fiber. Time is rounded to the nearest half unit where a unit is of a second. –30.5 represents
11 Time ( sec)
12 Time – Nerve Pulses Distribution is skewed right. Sample mean (12.305) is much larger than the sample median (7.5). Many potential outliers. Data not from a Normal model.
13 Sqrt(Time)
14 Log(Time)
15 Summary Time – Highly skewed to the right. Sqrt(Time) – Still skewed right. Log(Time) –Fairly symmetric and mounded in the middle. –Could have come from a Normal model.
16 Goal 3 – Straighten Up What is the relationship between the temperature of coffee and the time since it was poured? –Y, temperature ( o F) –X, time (minutes)
17
18 Cooling Coffee There is a general negative association – as time since the coffee was poured increases the temperature of the coffee decreases.
19 Linear Model
20 Linear Model Fit Summary –Predicted Temp = – 1.56*Time –On average, temperature decreases 1.56 o F per minute. –R 2 = 0.99, 99% of the variation in temperature is explained by the linear relationship with time.
21 Plot of Residuals
22 Curved Pattern There is a clear pattern in the plot of residuals versus time. –Under predict, over predict, under predict. The linear fit is very good, but we can do better.
23
24 Log(Temp) by Time Summary –Predicted Log(Temp) = – *Time –On average, log temperature decreases log( o F) per minute.
25 Plot of Residuals
26 Interpretation There is a random scatter of points around the zero line. The linear model relating Log(Temp) to Time is the best we can do.
27 Original Scale? Predicted Log(Temp) = – *Time Predicted Temp = 180.3*e –0.0114*Time –Predicted temp at time=0, o F –The predicted temp in one more minute is the predicted temp now multiplied by e – =
28 JMP Method 1 –Create a new column in JMP, Log(Temp): Cols – Formula – Transcendental – Log.
29 JMP Method 1 (continued) –Fit Y by X Y – Log(Temp) X – Time –Fit Linear
30 JMP Method 2 –Fit Y by X Y – Temp X – Time –Fit Special Transform Y – Log