Download presentation
Presentation is loading. Please wait.
Published byMartin Fowler Modified over 9 years ago
1
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data
2
Example 1 Do the data suggest that a linear function is adequate in describing the relationship between skin cancer mortality and latitude?
3
Example 2 Do the data suggest that a linear function is adequate in describing the relationship between the length and weight of an alligator?
4
Example 3 Do the data suggest that a linear function is adequate in describing the relationship between iron content and weight loss due to corrosion?
5
Lack of fit test for a linear function … the basic idea Use general linear test approach. Full model is most general model with no restrictions on the means μ j at each X j level. Reduced model assumes that the μ j are a linear function of the X j, i.e., μ j = β 0 + β 1 X j. Determine SSE(F), SSE(R), and F statistic. If the P-value is small, reject the reduced model (H 0 : No lack of fit (linear)) in favor of the full model (H A : Lack of fit (not linear)).
6
Assumptions and requirements The Y observations for a given X level are independent. The Y observations for a given X level are normally distributed. The distribution of Y for each level of X has the same variance. LOF test requires repeat observations, called replications (or replicates), for at least one of the X values.
7
Notation iron wgtloss 0.01 127.6 0.01 130.1 0.01 128.0 0.48 124.0 0.48 122.0 0.71 110.8 0.71 113.1 0.95 103.9 1.19 101.5 1.44 92.3 1.44 91.4 1.96 83.7 1.96 86.2 c different levels of X (c=7 with X 1 =0.01, X 2 =0.48, …, X 7 =1.96) n j = number of replicates for j th level of X (X j ) (n 1 =3, n 2 =2, …, n 7 =2) for a total of n = n 1 + … + n c observations. Y ij = observed value of the response variable for the i th replicate of X j (Y 11 =127.6, Y 21 =130.1, …, Y 27 =86.2)
8
The Full Model Assume nothing about (or “put no structure on”) the means of the responses, μ j, at the j th level of X: Make usual assumptions about error terms (ε ij ): normal, mean 0, constant variance σ 2. Least squares estimates of μ j are sample means of responses at X j level. “Pure error sum of squares”
9
The Reduced Model Assume the means of the responses, μ j, are linearly related to the j th level of X (same model as before, just modified subscripts): Make usual assumptions about error terms (ε ij ): normal, mean 0, constant variance σ 2. Least squares estimates of μ j are as usual. “Error sum of squares”
10
Error sum of squares decomposition error deviationpure error deviationlack of fit deviation
11
The F test
12
The Decision (Intuitively) If the largest portion of the error sum of squares is due to lack of fit, the F test should be large. A large F* statistic leads to a small P-value (determined by F(c-2, n-2) distribution). If P-value is small, reject null and conclude significant lack of (linear) fit.
13
LOF Test summarized in an ANOVA Table
14
LOF Test in Minitab Stat >> Regression >> Regression … Specify predictor and response. Under Options…, under Lack of Fit Tests, select box labeled “Pure error.” Select OK. Select OK. ANOVA table appears in session window.
15
Example 1 Do the data suggest that a linear function is adequate in describing the relationship between skin cancer mortality and latitude?
16
Example 1: Mortality and Latitude Analysis of Variance Source DF SS MS F P Regression 1 36464 36464 99.80 0.000 Residual Error 47 17173 365 Lack of Fit 30 12863 429 1.69 0.128 Pure Error 17 4310 254 Total 48 53637 19 rows with no replicates
17
Example 2 Do the data suggest that a linear function is adequate in describing the relationship between the length and weight of an alligator?
18
Example 2: Alligator length and weight Analysis of Variance Source DF SS MS F P Regression 1 342350 342350 117.35 0.000 Residual Error 23 67096 2917 Lack of Fit 17 66567 3916 44.36 0.000 Pure Error 6 530 88 Total 24 409446 14 rows with no replicates
19
Example 3 Do the data suggest that a linear function is adequate in describing the relationship between iron content and weight loss due to corrosion?
20
Example 3: Iron and corrosion Analysis of Variance Source DF SS MS F P Regression 1 3293.8 3293.8 352.27 0.000 Residual Error 11 102.9 9.4 Lack of Fit 5 91.1 18.2 9.28 0.009 Pure Error 6 11.8 2.0 Total 12 3396.6 2 rows with no replicates
21
Closing comment #1 The t-test or F=MSR/MSE test only tests whether there is a linear relation between the predictor and response (β 1 ≠0) or not (β 1 =0). Failing to reject the null does not imply that there is no relation between the predictor and response.
22
Example: Closing comment #1
23
The regression equation is Y* = 14.1 - 0.100 X Predictor Coef SE Coef T P Constant 14.118 2.598 5.44 0.000 X -0.0998 0.6942 -0.14 0.887 S = 13.25 R-Sq = 0.1% R-Sq(adj) = 0.0% Analysis of Variance Source DF SS MS F P Regression 1 3.6 3.6 0.02 0.887 Residual Error 24 4210.4 175.4 Lack of Fit 11 4188.3 380.8 223.87 0.000 Pure Error 13 22.1 1.7 Total 25 4214.0
24
Closing comments #2, #3 We used general linear test approach to test appropriateness of a linear function. It can just as easily be used to test for appropriateness of other functions (quadratic, cubic). The alternative H A : Lack of fit (not linear) includes all possible regression functions other than a linear one. Use residuals to help identify what type of function is appropriate.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.