Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data
Example 1 Do the data suggest that a linear function is adequate in describing the relationship between skin cancer mortality and latitude?
Example 2 Do the data suggest that a linear function is adequate in describing the relationship between the length and weight of an alligator?
Example 3 Do the data suggest that a linear function is adequate in describing the relationship between iron content and weight loss due to corrosion?
Lack of fit test for a linear function … the basic idea Use general linear test approach. Full model is most general model with no restrictions on the means μ j at each X j level. Reduced model assumes that the μ j are a linear function of the X j, i.e., μ j = β 0 + β 1 X j. Determine SSE(F), SSE(R), and F statistic. If the P-value is small, reject the reduced model (H 0 : No lack of fit (linear)) in favor of the full model (H A : Lack of fit (not linear)).
Assumptions and requirements The Y observations for a given X level are independent. The Y observations for a given X level are normally distributed. The distribution of Y for each level of X has the same variance. LOF test requires repeat observations, called replications (or replicates), for at least one of the X values.
Notation iron wgtloss c different levels of X (c=7 with X 1 =0.01, X 2 =0.48, …, X 7 =1.96) n j = number of replicates for j th level of X (X j ) (n 1 =3, n 2 =2, …, n 7 =2) for a total of n = n 1 + … + n c observations. Y ij = observed value of the response variable for the i th replicate of X j (Y 11 =127.6, Y 21 =130.1, …, Y 27 =86.2)
The Full Model Assume nothing about (or “put no structure on”) the means of the responses, μ j, at the j th level of X: Make usual assumptions about error terms (ε ij ): normal, mean 0, constant variance σ 2. Least squares estimates of μ j are sample means of responses at X j level. “Pure error sum of squares”
The Reduced Model Assume the means of the responses, μ j, are linearly related to the j th level of X (same model as before, just modified subscripts): Make usual assumptions about error terms (ε ij ): normal, mean 0, constant variance σ 2. Least squares estimates of μ j are as usual. “Error sum of squares”
Error sum of squares decomposition error deviationpure error deviationlack of fit deviation
The F test
The Decision (Intuitively) If the largest portion of the error sum of squares is due to lack of fit, the F test should be large. A large F* statistic leads to a small P-value (determined by F(c-2, n-2) distribution). If P-value is small, reject null and conclude significant lack of (linear) fit.
LOF Test summarized in an ANOVA Table
LOF Test in Minitab Stat >> Regression >> Regression … Specify predictor and response. Under Options…, under Lack of Fit Tests, select box labeled “Pure error.” Select OK. Select OK. ANOVA table appears in session window.
Example 1 Do the data suggest that a linear function is adequate in describing the relationship between skin cancer mortality and latitude?
Example 1: Mortality and Latitude Analysis of Variance Source DF SS MS F P Regression Residual Error Lack of Fit Pure Error Total rows with no replicates
Example 2 Do the data suggest that a linear function is adequate in describing the relationship between the length and weight of an alligator?
Example 2: Alligator length and weight Analysis of Variance Source DF SS MS F P Regression Residual Error Lack of Fit Pure Error Total rows with no replicates
Example 3 Do the data suggest that a linear function is adequate in describing the relationship between iron content and weight loss due to corrosion?
Example 3: Iron and corrosion Analysis of Variance Source DF SS MS F P Regression Residual Error Lack of Fit Pure Error Total rows with no replicates
Closing comment #1 The t-test or F=MSR/MSE test only tests whether there is a linear relation between the predictor and response (β 1 ≠0) or not (β 1 =0). Failing to reject the null does not imply that there is no relation between the predictor and response.
Example: Closing comment #1
The regression equation is Y* = X Predictor Coef SE Coef T P Constant X S = R-Sq = 0.1% R-Sq(adj) = 0.0% Analysis of Variance Source DF SS MS F P Regression Residual Error Lack of Fit Pure Error Total
Closing comments #2, #3 We used general linear test approach to test appropriateness of a linear function. It can just as easily be used to test for appropriateness of other functions (quadratic, cubic). The alternative H A : Lack of fit (not linear) includes all possible regression functions other than a linear one. Use residuals to help identify what type of function is appropriate.