Conditional Test Statistics
Suppose that we are considering two Log-linear models and that Model 2 is a special case of Model 1. That is the parameters of Model 2 are a subset of the parameters of Model 1. Also assume that Model 1 has been shown to adequately fit the data.
In this case one is interested in testing if the differences in the expected frequencies between Model 1 and Model 2 is simply due to random variation] The likelihood ratio chi-square statistic that achieves this goal is:
Example
Goodness of Fit test for the all k-factor models Conditional tests for zero k-factor interactions
Conclusions The four factor interaction is not significant G2(3|4) = 0.7 (p = 0.705) The all three factor model provides a significant fit G2(3) = 0.7 (p = 0.705) All the three factor interactions are not significantly different from 0, G2(2|3) = 9.2 (p = 0.239). The all two factor model provides a significant fit G2(2) = 9.9 (p = 0.359) There are significant 2 factor interactions G2(1|2) = 33.0 (p = 0.00083. Conclude that the model should contain main effects and some two-factor interactions
There also may be a natural sequence of progressively complicated models that one might want to identify. In the laundry detergent example the variables are: Softness of Laundry Used Previous use of Brand M Temperature of laundry water used Preference of brand X over brand M
A natural order for increasingly complex models which should be considered might be: [1][2][3][4] [1][3][24] [1][34][24] [13][34][24] [13][234] [134][234] The all-Main effects model Independence amongst all four variables Since previous use of brand M may be highly related to preference for brand M, add first the 2-4 interaction Brand M is recommended for hot water add 2nd the 3-4 interaction brand M is also recommended for Soft laundry add 3rd the 1-3 interaction Add finally some possible 3-factor interactions
Likelihood Ratio G2 for various models d]f] G2 [1][3][24] 17 22.4 [1][24][34] 16 18 [13][24][34] 14 11.9 [13][23][24][34] 13 11.2 [12][13][23][24][34] 11 10.1 [1][234] 14.5 [134][24] 10 12.2 [13][234] 12 8.4 [24][34][123] 9 [123][234] 8 5.6
Stepwise selection procedures Forward Selection Backward Elimination
Forward Selection: Starting with a model that under fits the data, log-linear parameters that are not in the model are added step by step until a model that does fit is achieved. At each step the log-linear parameter that is most significant is added to the model: To determine the significance of a parameter added we use the statistic: G2(2|1) = G2(2) – G2(1) Model 1 contains the parameter. Model 2 does not contain the parameter
Backward Selection: Starting with a model that over fits the data, log-linear parameters that are in the model are deleted step by step until a model that continues to fit the model and has the smallest number of significant parameters is achieved. At each step the log-linear parameter that is least significant is deleted from the model: To determine the significance of a parameter deleted we use the statistic: G2(2|1) = G2(2) – G2(1) Model 1 contains the parameter. Model 2 does not contain the parameter
K = knowledge N = Newspaper R = Radio S = Reading L = Lectures
Continuing after 10 steps
The final step
The best model was found a the previous step [LN][KLS][KR][KN][LR][NR][NS]
Logit Models To date we have not worried whether any of the variables were dependent of independent variables. The logit model is used when we have a single binary dependent variable.
The variables Type of seedling (T) Depth of planting (D) Longleaf seedling Slash seedling Depth of planting (D) Too low. Too high Mortality (M) (the dependent variable) Dead Alive
The Log-linear Model Note: mij1 = # dead when T = i and D = j. mij2 = # alive when T = i and D = j. = mortality ratio when T = i and D = j.
Hence since
The logit model: where
Thus corresponding to a loglinear model there is logit model predicting log ratio of expected frequencies of the two categories of the independent variable. Also k +1 factor interactions with the dependent variable in the loglinear model determine k factor interactions in the logit model k + 1 = 1 constant term in logit model k + 1 = 2, main effects in logit model
1 = Depth, 2 = Mort, 3 = Type
Log-Linear parameters for Model: [TM][TD][DM]
Logit Model for predicting the Mortality
The best model was found by forward selection was [LN][KLS][KR][KN][LR][NR][NS] To fit a logit model to predict K (Knowledge) we need to fit a loglinear model with important interactions with K (knowledge), namely [LNRS][KLS][KR][KN] The logit model will contain Main effects for L (Lectures), N (Newspapers), R (Radio), and S (Reading) Two factor interaction effect for L and S
The Logit Parameters for the Model : LNSR, KLS, KR, KN ( Multiplicative effects are given in brackets, Logit Parameters = 2 Loglinear parameters) The Constant term: -0.226 (0.798) The Main effects on Knowledge: Lectures Lect 0.268 (1.307) None -0.268 (0.765) Newspaper News 0.324 (1.383) None -0.324 (0.723) Reading Solid 0.340 (1.405) Not -0.340 (0.712) Radio Radio 0.150 (1.162) None -0.150 (0.861) The Two-factor interaction Effect of Reading and Lectures on Knowledge