Download presentation
Presentation is loading. Please wait.
Published byPearl Casey Modified over 9 years ago
1
1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial Chapter 8 – loglinear models for contingency table log link Poisson cell counts Chapter 9 here – present graphs that show a model’s association and conditional independence patterns. selection and comparison of loglinear models Diagnostics for checking models, such as residuals association between ordinal variables
2
2 STA 617 – Chp9 Loglinear/Logit Models 9.1 ASSOCIATION GRAPHS AND COLLAPSIBILITY Darroch et al. (1980) – mathematical graph theory to represent certain loglinear model having a conditional independence structure An association graph has a set of vertices, each vertex representing a variable. An edge connecting two variables represents a conditional association between them.
3
3 STA 617 – Chp9 Loglinear/Logit Models loglinear model (WX,WY,WZ, YZ) without XY and XZ terms. It assumes independence between X and Y and between X and Z, conditional on the remaining two variables. Two loglinear models with the same pairwise associations have the same association graph. For instance, this association graph is also the one for model (WX,WYZ), which adds a three-factor WYZ interaction.
4
4 STA 617 – Chp9 Loglinear/Logit Models A path in an association graph is a sequence of edges leading from one variable to another. Two variables X and Y are said to be separated by a subset of variables if all paths connecting X and Y intersect that subset. For instance, in above Figure, W separates X and Y, since any path connecting X and Y goes through W. The subset {W, Z} also separates X and Y. A fundamental result states that two variables are conditionally independent given any subset of variables that separates them.
5
5 STA 617 – Chp9 Loglinear/Logit Models 9.1.2 Collapsibility in Three-Way Contingency Tables conditional associations in partial tables usually differ from marginal associations. However, under certain collapsibility conditions, they are the same. the fitted XY odds ratio is identical in the partial tables and the marginal table for models with association graphs
6
6 STA 617 – Chp9 Loglinear/Logit Models 9.1.4 Collapsibility and Association Graphs for Multiway Tables Bishop et al. (1975, p. 47) provided a parametric collapsibility condition with multiway tables:
7
7 STA 617 – Chp9 Loglinear/Logit Models 9.2 MODEL SELECTION AND COMPARISON Key: A model should be complex enough to fit well but also relatively simple to interpret, smoothing rather than overfitting the data. The potentially useful models are usually a small subset of the possible models.
8
8 STA 617 – Chp9 Loglinear/Logit Models 9.2.1 Considerations in Model Selection inclusion of certain terms: A study designed to answer certain questions, such as treatment group models should recognize distinctions between response and explanatory variables The modeling process should concentrate on terms linking responses and terms linking explanatory variables to responses. The model should contain the most general interaction term relating the explanatory variables. Thus, from the likelihood equations, this has the effect of equating the fitted totals to the sample totals at combinations of their levels.
9
9 STA 617 – Chp9 Loglinear/Logit Models Automobile example (Table 8.8) Two responses: I-injury, and S-seat-belt use Two explanatory variables: G-gender, L-location Then we need to include G*L, we can imply if we use a loglinear model with GL term. If S is also explanatory and only I is a response, should be fixed. We should then use logit rather than loglinear models, when the main focus is describing effects on that response.
10
10 STA 617 – Chp9 Loglinear/Logit Models For exploratory studies, a search among potential models may provide clues about associations and interactions. first fits the model having single-factor terms then the model having two-factor and single-factor terms then the model having three-factor and lower terms, and so on. Fitting such models often reveals a restricted range of good-fitting models.
11
11 STA 617 – Chp9 Loglinear/Logit Models Automatic model selection Backward/Forward/Stepwise model elimination, may also be useful but should be used with care and skepticism. Such a strategy need not yield a meaningful model.
12
12 STA 617 – Chp9 Loglinear/Logit Models the Dayton Student Survey gender ŽG. and race ŽR. Alcohol A, cigarettes C, marijuana M, gender G and race R
13
13 STA 617 – Chp9 Loglinear/Logit Models SAS Code data table9_1; input A $ C $ x1-x8; array x{*} x1-x8; retain i; i=0; drop i x1-x8; do R='White', 'Other'; do G='Female','Male'; do M='Yes', 'No'; i=i+1; count=x{i}; output; end;end;end; cards; Yes Yes 405 268 453 228 23 23 30 19 Yes No 13 218 28 201 2 19 1 18 No Yes 1 17 1 17 0 1 1 8 No No 1 117 1 133 0 12 0 17 ;
14
14 STA 617 – Chp9 Loglinear/Logit Models Responses: Alcohol A, cigarettes C, marijuana M, Explanatory: gender G and race R, always include GR Model selection – Mutual independence + GR Homogeneous association All three-factor terms Backward selection
15
15 STA 617 – Chp9 Loglinear/Logit Models
16
16 STA 617 – Chp9 Loglinear/Logit Models %let maineffect=A C M R G; %let data=table9_1; data allfit; run; /*STEP 1 main effects + GR*/ %modelbuild(G*R,model1); proc print data=modelfit; run; /*STEP 2 main effects + 2fis*/ %modelbuild(A*C A*M A*R A*G C*M C*R C*G M*R M*G G*R,model2); proc print data=modelfit; run; /*STEP 3 main effects + 2fis +3fis (not necessary for this example)*/ %modelbuild(A|C|M A|C|R A|C|G A|M|R A|M|G A|R|G C|M|R C|M|G C|R|G M|R|G,model3); proc print data=modelfit; run;
17
17 STA 617 – Chp9 Loglinear/Logit Models /*STEP 4 Backward selection starting from Model 2*/ %modelbuild( A*M A*R A*G C*M C*R C*G M*R M*G G*R,model4a); %modelbuild(A*C A*R A*G C*M C*R C*G M*R M*G G*R,model4b); %modelbuild(A*C A*M A*G C*M C*R C*G M*R M*G G*R,model4c); %modelbuild(A*C A*M A*R C*M C*R C*G M*R M*G G*R,model4d); %modelbuild(A*C A*M A*R A*G C*R C*G M*R M*G G*R,model4e); %modelbuild(A*C A*M A*R A*G C*M C*G M*R M*G G*R,model4f); %modelbuild(A*C A*M A*R A*G C*M C*R M*R M*G G*R,model4g); %modelbuild(A*C A*M A*R A*G C*M C*R C*G M*G G*R,model4h); %modelbuild(A*C A*M A*R A*G C*M C*R C*G M*R G*R,model4i); proc print data=allfit; run; /*Thus we delete CR*/
18
18 STA 617 – Chp9 Loglinear/Logit Models /*STEP 5 Backward selection starting from Model 4f above*/ %modelbuild( A*M A*R A*G C*M C*G M*R M*G G*R,model5a); %modelbuild(A*C A*R A*G C*M C*G M*R M*G G*R,model5b); %modelbuild(A*C A*M A*G C*M C*G M*R M*G G*R,model5c); %modelbuild(A*C A*M A*R C*M C*G M*R M*G G*R,model5d); %modelbuild(A*C A*M A*R A*G C*G M*R M*G G*R,model5e); %modelbuild(A*C A*M A*R A*G C*M M*R M*G G*R,model5); %modelbuild(A*C A*M A*R A*G C*M C*G M*G G*R,model5g); %modelbuild(A*C A*M A*R A*G C*M C*G M*R G*R,model5h); proc print data=allfit; run; /*Thus we delete CG*/
19
19 STA 617 – Chp9 Loglinear/Logit Models /*STEP 6 Backward selection starting from Model 5 above*/ %modelbuild( A*M A*R A*G C*M M*R M*G G*R,model6a); %modelbuild(A*C A*R A*G C*M M*R M*G G*R,model6b); %modelbuild(A*C A*M A*G C*M M*R M*G G*R,model6c); %modelbuild(A*C A*M A*R C*M M*R M*G G*R,model6d); %modelbuild(A*C A*M A*R A*G M*R M*G G*R,model6e); %modelbuild(A*C A*M A*R A*G C*M M*G G*R,model6); %modelbuild(A*C A*M A*R A*G C*M M*R G*R,model6g); proc print data=allfit; run; /*Thus we delete MR*/
20
20 STA 617 – Chp9 Loglinear/Logit Models /*STEP 6 Backward selection starting from Model 6 above*/ %modelbuild( A*M A*R A*G C*M M*G G*R,model7a); %modelbuild(A*C A*R A*G C*M M*G G*R,model7b); %modelbuild(A*C A*M A*G C*M M*G G*R,model7c); %modelbuild(A*C A*M A*R C*M M*G G*R,model7d); %modelbuild(A*C A*M A*R A*G M*G G*R,model7e); %modelbuild(A*C A*M A*R A*G C*M G*R,model7f); proc print data=allfit; run; /*STOP Model selection, final model Model 6 above*/ 7d-6f: 25.17-19.91=5.26 p=0.02 (DF=1 AG)
21
21 STA 617 – Chp9 Loglinear/Logit Models Final model Model 6, denoted by ( AC, AM, CM, AG, AR, GM, GR), has association graph Every path between C and {G, R} involves a variable in {A, M}. Given the outcome on alcohol use and marijuana use, the model states that cigarette use is independent of both gender and race. Collapsing over the explanatory variables race and gender, the conditional associations between C and A and between C and M are the same as with the model (AC, AM, CM) fitted in Section 8.2.4.
22
22 STA 617 – Chp9 Loglinear/Logit Models Model Removing GM term, (AC, AM, CM, AG, AR, GR) with G2=28.8 (DF=20), pvalue=0.09167, It does not fit poorly. However, one might collapse over gender and race in studying associations among the primary variables. An advantage of the full five-variable model is that it estimates effects of gender and race on these responses, in particular the effects of race and gender on alcohol use and the effect of gender on marijuana use.
23
23 STA 617 – Chp9 Loglinear/Logit Models 9.2.3 Loglinear Model Comparison Statistics
24
24 STA 617 – Chp9 Loglinear/Logit Models statistic Or for two nested loglinear models with It is asymptotically chi-squared with df equal to the difference between df for M 0 and M 1
25
25 STA 617 – Chp9 Loglinear/Logit Models 9.3 DIAGNOSTICS FOR CHECKING MODELS The model comparison test using is useful for detecting whether an extra term improves a model fit. Cell residuals provide a cell-specific indication of model lack of fit.
26
26 STA 617 – Chp9 Loglinear/Logit Models 9.3.1 Residuals for Loglinear Models Pearson residual is Haberman (1973) defined the standardized Pearson residual
27
27 STA 617 – Chp9 Loglinear/Logit Models 9.3.2 Student Survey Example Revisited Model
28
28 STA 617 – Chp9 Loglinear/Logit Models two-factor associations model Both models are good
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.