Linear Lack of Fit (LOF) Test An F test for checking whether a linear regression function is inadequate in describing the trend in the data
Where does this topic fit in? Model formulation Model estimation Model evaluation Model use
Example 1 Do the data suggest that a linear function is inadequate in describing the relationship between skin cancer mortality and latitude?
Example 2 Do the data suggest that a linear function is inadequate in describing the relationship between the length and weight of an alligator?
Example 3 Do the data suggest that a linear function is inadequate in describing the relationship between iron content and weight loss due to corrosion?
Some notation
Decomposing the error
The basic idea Break down the residual error (“error sum of squares – SSE) into two components: –a component that is due to lack of model fit (“lack of fit sum of squares” – SSLF) –a component that is due to pure random error (“pure error sum of squares” – SSPE) If the lack of fit sum of squares is a large component of the residual error, it suggests that a linear function is inadequate.
A geometric decomposition
The decomposition holds for the sum of the squared deviations, too: Error sum of squares (SSE) Lack of fit sum of squares (SSLF) Pure error sum of squares (SSPE)
Breakdown of degrees of freedom Degrees of freedom associated with SSE Degrees of freedom associated with SSLF Degrees of freedom associated with SSPE
Definitions of Mean Squares And, the pure error mean square (MSPE) is defined as: The lack of fit mean square (MSLF) is defined as:
Expected Mean Squares If μ i = β 0 +β 1 X i, we’d expect the ratio MSLF/MSPE to be … If μ i ≠ β 0 +β 1 X i, we’d expect the ratio MSLF/MSPE to be … Use ratio, MSLF/MSPE, to reject whether or not μ i = β 0 +β 1 X i.
Expanded Analysis of Variance Table SourceDFSSMSF Regression1 Residual error n-2 Lack of fitc-2 Pure errorn-c Totaln-1
The formal lack of fit F-test Null hypothesis H 0 : μ i = β 0 +β 1 X i Alternative hypothesis H A : μ i ≠ β 0 +β 1 X i Test statistic P-value = What is the probability that we’d get an F* statistic as large as we did, if the null hypothesis is true? The P-value is determined by comparing F* to an F distribution with c-2 numerator degree of freedom and n-c denominator degrees of freedom.
LOF Test in Minitab Stat >> Regression >> Regression … Specify predictor and response. Under Options… –under Lack of Fit Tests, select the box labeled Pure error. Select OK.
Decomposing the error
Is there lack of linear fit? Analysis of Variance Source DF SS MS F P Regression Residual Error Lack of Fit Pure Error Total rows with no replicates
Decomposing the error
Is there lack of linear fit? Analysis of Variance Source DF SS MS F P Regression Residual Error Lack of Fit Pure Error Total rows with no replicates
Example 1 Do the data suggest that a linear function is not adequate in describing the relationship between skin cancer mortality and latitude?
Example 1: Mortality and Latitude Analysis of Variance Source DF SS MS F P Regression Residual Error Lack of Fit Pure Error Total rows with no replicates
Example 2 Do the data suggest that a linear function is not adequate in describing the relationship between the length and weight of an alligator?
Example 2: Alligator length and weight Analysis of Variance Source DF SS MS F P Regression Residual Error Lack of Fit Pure Error Total rows with no replicates
Example 3 Do the data suggest that a linear function is not adequate in describing the relationship between iron content and weight loss due to corrosion?
Example 3: Iron and corrosion Analysis of Variance Source DF SS MS F P Regression Residual Error Lack of Fit Pure Error Total rows with no replicates
Example 4 Do the data suggest that a linear function is not adequate in describing the relationship between mileage and groove depth?
Example 4: Tread wear Analysis of Variance Source DF SS MS F P Regression Residual Error Total No replicates. Cannot do pure error test.
When is it okay to perform the LOF Test? When the “INE” part of the “LINE” assumptions are met. The LOF test requires repeat observations, called replicates, for at least one of the values of the predictor X.