1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter.

Slides:



Advertisements
Similar presentations
LOGLINEAR MODELS FOR INDEPENDENCE AND INTERACTION IN THREE-WAY TABLES
Advertisements

Three or more categorical variables
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Model Selections and Comparisons (Categorical Data Analysis, Ch 9.2) Yumi Kubo Alvin Hsieh Model 1 Model 2.
Bivariate Analysis Cross-tabulation and chi-square.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Loglinear Models for Independence and Interaction in Three-way Tables Veronica Estrada Robert Lagier.
Loglinear Models for Contingency Tables. Consider an IxJ contingency table that cross- classifies a multinomial sample of n subjects on two categorical.
6.1.4 AIC, Model Selection, and the Correct Model oAny model is a simplification of reality oIf a model has relatively little bias, it tends to provide.
Log-linear Analysis - Analysing Categorical Data
1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Log-Linear Models & Dependent Samples Feng Ye, Xiao Guo, Jing Wang.
Statistics for Managers Using Microsoft® Excel 5th Edition
Modeling Wim Buysse RUFORUM 1 December 2006 Research Methods Group.
Statistics for Managers Using Microsoft® Excel 5th Edition
Instructor: K.C. Carriere
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
EPI 809/Spring Multiple Logistic Regression.
Linear statistical models 2008 Count data, contingency tables and log-linear models Expected frequency: Log-linear models are linear models of the log.
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
Incomplete Block Designs
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Chapter 15: Model Building
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Categorical Data Prof. Andy Field.
© Department of Statistics 2012 STATS 330 Lecture 28: Slide 1 Stats 330: Lecture 28.
Categorical Data Analysis School of Nursing “Categorical Data Analysis 2x2 Chi-Square Tests and Beyond (Multiple Categorical Variable Models)” Melinda.
Simple Linear Regression
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Factorial ANOVA More than one categorical explanatory variable STA305 Spring 2014.
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Chapter 14 Repeated Measures and Two Factor Analysis of Variance
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
1 STA 617 – Chp11 Models for repeated data Analyzing Repeated Categorical Response Data  Repeated categorical responses may come from  repeated measurements.
1 STA 617 – Chp9 Loglinear/Logit Models 9.7 Poisson regressions for rates  In Section 4.3 we introduced Poisson regression for modeling counts. When outcomes.
STA617 Advanced Categorical Data Analysis
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
Joyful mood is a meritorious deed that cheers up people around you like the showering of cool spring breeze.
1 STA 517 – Chp4 Introduction to Generalized Linear Models 4.3 GENERALIZED LINEAR MODELS FOR COUNTS  count data - assume a Poisson distribution  counts.
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Heart Disease Example Male residents age Two models examined A) independence 1)logit(╥) = α B) linear logit 1)logit(╥) = α + βx¡
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
The Mixed Effects Model - Introduction In many situations, one of the factors of interest will have its levels chosen because they are of specific interest.
1 Fighting for fame, scrambling for fortune, where is the end? Great wealth and glorious honor, no more than a night dream. Lasting pleasure, worry-free.
ANOVA and Multiple Comparison Tests
Logistic Regression Binary response variable Y (1 – Success, 0 – Failure) Continuous, Categorical independent Variables –Similar to Multiple Regression.
Introduction to logistic regression a.k.a. Varbrul
Log Linear Modeling of Independence
Joyful mood is a meritorious deed that cheers up people around you
Presentation transcript:

1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter 8 – loglinear models for contingency table log link Poisson cell counts  Chapter 9 here –  present graphs that show a model’s association and conditional independence patterns.  selection and comparison of loglinear models  Diagnostics for checking models, such as residuals  association between ordinal variables

2 STA 617 – Chp9 Loglinear/Logit Models 9.1 ASSOCIATION GRAPHS AND COLLAPSIBILITY  Darroch et al. (1980) – mathematical graph theory to represent certain loglinear model having a conditional independence structure  An association graph has a set of vertices, each vertex representing a variable.  An edge connecting two variables represents a conditional association between them.

3 STA 617 – Chp9 Loglinear/Logit Models  loglinear model (WX,WY,WZ, YZ) without XY and XZ terms.  It assumes independence between X and Y and between X and Z, conditional on the remaining two variables.  Two loglinear models with the same pairwise associations have the same association graph.  For instance, this association graph is also the one for model (WX,WYZ), which adds a three-factor WYZ interaction.

4 STA 617 – Chp9 Loglinear/Logit Models  A path in an association graph is a sequence of edges leading from one variable to another.  Two variables X and Y are said to be separated by a subset of variables if all paths connecting X and Y intersect that subset.  For instance, in above Figure, W separates X and Y, since any path connecting X and Y goes through W.  The subset {W, Z} also separates X and Y.  A fundamental result states that two variables are conditionally independent given any subset of variables that separates them.

5 STA 617 – Chp9 Loglinear/Logit Models Collapsibility in Three-Way Contingency Tables  conditional associations in partial tables usually differ from marginal associations. However, under certain collapsibility conditions, they are the same.  the fitted XY odds ratio is identical in the partial tables and the marginal table for models with association graphs

6 STA 617 – Chp9 Loglinear/Logit Models Collapsibility and Association Graphs for Multiway Tables  Bishop et al. (1975, p. 47) provided a parametric collapsibility condition with multiway tables:

7 STA 617 – Chp9 Loglinear/Logit Models 9.2 MODEL SELECTION AND COMPARISON  Key:  A model should be complex enough to fit well  but also relatively simple to interpret, smoothing rather than overfitting the data.  The potentially useful models are usually a small subset of the possible models.

8 STA 617 – Chp9 Loglinear/Logit Models Considerations in Model Selection  inclusion of certain terms: A study designed to answer certain questions, such as treatment group  models should recognize distinctions between response and explanatory variables  The modeling process should concentrate on terms linking responses and terms linking explanatory variables to responses.  The model should contain the most general interaction term relating the explanatory variables.  Thus, from the likelihood equations, this has the effect of equating the fitted totals to the sample totals at combinations of their levels.

9 STA 617 – Chp9 Loglinear/Logit Models Automobile example (Table 8.8)  Two responses: I-injury, and S-seat-belt use Two explanatory variables: G-gender, L-location  Then we need to include G*L, we can imply if we use a loglinear model with GL term.  If S is also explanatory and only I is a response, should be fixed.  We should then use logit rather than loglinear models, when the main focus is describing effects on that response.

10 STA 617 – Chp9 Loglinear/Logit Models  For exploratory studies, a search among potential models may provide clues about associations and interactions.  first fits the model having single-factor terms  then the model having two-factor and single-factor terms  then the model having three-factor and lower terms, and so on.  Fitting such models often reveals a restricted range of good-fitting models.

11 STA 617 – Chp9 Loglinear/Logit Models Automatic model selection  Backward/Forward/Stepwise model elimination, may also be useful but should be used with care and skepticism.  Such a strategy need not yield a meaningful model.

12 STA 617 – Chp9 Loglinear/Logit Models the Dayton Student Survey  gender ŽG. and race ŽR. Alcohol A, cigarettes C, marijuana M, gender G and race R

13 STA 617 – Chp9 Loglinear/Logit Models SAS Code data table9_1; input A $ C $ x1-x8; array x{*} x1-x8; retain i; i=0; drop i x1-x8; do R='White', 'Other'; do G='Female','Male'; do M='Yes', 'No'; i=i+1; count=x{i}; output; end;end;end; cards; Yes Yes Yes No No Yes No No ;

14 STA 617 – Chp9 Loglinear/Logit Models  Responses: Alcohol A, cigarettes C, marijuana M, Explanatory: gender G and race R, always include GR  Model selection –  Mutual independence + GR  Homogeneous association  All three-factor terms  Backward selection

15 STA 617 – Chp9 Loglinear/Logit Models

16 STA 617 – Chp9 Loglinear/Logit Models %let maineffect=A C M R G; %let data=table9_1; data allfit; run; /*STEP 1 main effects + GR*/ %modelbuild(G*R,model1); proc print data=modelfit; run; /*STEP 2 main effects + 2fis*/ %modelbuild(A*C A*M A*R A*G C*M C*R C*G M*R M*G G*R,model2); proc print data=modelfit; run; /*STEP 3 main effects + 2fis +3fis (not necessary for this example)*/ %modelbuild(A|C|M A|C|R A|C|G A|M|R A|M|G A|R|G C|M|R C|M|G C|R|G M|R|G,model3); proc print data=modelfit; run;

17 STA 617 – Chp9 Loglinear/Logit Models /*STEP 4 Backward selection starting from Model 2*/ %modelbuild( A*M A*R A*G C*M C*R C*G M*R M*G G*R,model4a); %modelbuild(A*C A*R A*G C*M C*R C*G M*R M*G G*R,model4b); %modelbuild(A*C A*M A*G C*M C*R C*G M*R M*G G*R,model4c); %modelbuild(A*C A*M A*R C*M C*R C*G M*R M*G G*R,model4d); %modelbuild(A*C A*M A*R A*G C*R C*G M*R M*G G*R,model4e); %modelbuild(A*C A*M A*R A*G C*M C*G M*R M*G G*R,model4f); %modelbuild(A*C A*M A*R A*G C*M C*R M*R M*G G*R,model4g); %modelbuild(A*C A*M A*R A*G C*M C*R C*G M*G G*R,model4h); %modelbuild(A*C A*M A*R A*G C*M C*R C*G M*R G*R,model4i); proc print data=allfit; run; /*Thus we delete CR*/

18 STA 617 – Chp9 Loglinear/Logit Models /*STEP 5 Backward selection starting from Model 4f above*/ %modelbuild( A*M A*R A*G C*M C*G M*R M*G G*R,model5a); %modelbuild(A*C A*R A*G C*M C*G M*R M*G G*R,model5b); %modelbuild(A*C A*M A*G C*M C*G M*R M*G G*R,model5c); %modelbuild(A*C A*M A*R C*M C*G M*R M*G G*R,model5d); %modelbuild(A*C A*M A*R A*G C*G M*R M*G G*R,model5e); %modelbuild(A*C A*M A*R A*G C*M M*R M*G G*R,model5); %modelbuild(A*C A*M A*R A*G C*M C*G M*G G*R,model5g); %modelbuild(A*C A*M A*R A*G C*M C*G M*R G*R,model5h); proc print data=allfit; run; /*Thus we delete CG*/

19 STA 617 – Chp9 Loglinear/Logit Models /*STEP 6 Backward selection starting from Model 5 above*/ %modelbuild( A*M A*R A*G C*M M*R M*G G*R,model6a); %modelbuild(A*C A*R A*G C*M M*R M*G G*R,model6b); %modelbuild(A*C A*M A*G C*M M*R M*G G*R,model6c); %modelbuild(A*C A*M A*R C*M M*R M*G G*R,model6d); %modelbuild(A*C A*M A*R A*G M*R M*G G*R,model6e); %modelbuild(A*C A*M A*R A*G C*M M*G G*R,model6); %modelbuild(A*C A*M A*R A*G C*M M*R G*R,model6g); proc print data=allfit; run; /*Thus we delete MR*/

20 STA 617 – Chp9 Loglinear/Logit Models /*STEP 6 Backward selection starting from Model 6 above*/ %modelbuild( A*M A*R A*G C*M M*G G*R,model7a); %modelbuild(A*C A*R A*G C*M M*G G*R,model7b); %modelbuild(A*C A*M A*G C*M M*G G*R,model7c); %modelbuild(A*C A*M A*R C*M M*G G*R,model7d); %modelbuild(A*C A*M A*R A*G M*G G*R,model7e); %modelbuild(A*C A*M A*R A*G C*M G*R,model7f); proc print data=allfit; run; /*STOP Model selection, final model Model 6 above*/ 7d-6f: =5.26 p=0.02 (DF=1 AG)

21 STA 617 – Chp9 Loglinear/Logit Models Final model  Model 6, denoted by ( AC, AM, CM, AG, AR, GM, GR), has association graph  Every path between C and {G, R} involves a variable in {A, M}. Given the outcome on alcohol use and marijuana use, the model states that cigarette use is independent of both gender and race.  Collapsing over the explanatory variables race and gender, the conditional associations between C and A and between C and M are the same as with the model (AC, AM, CM) fitted in Section

22 STA 617 – Chp9 Loglinear/Logit Models Model  Removing GM term, (AC, AM, CM, AG, AR, GR) with G2=28.8 (DF=20), pvalue= ,  It does not fit poorly. However, one might collapse over gender and race in studying associations among the primary variables.  An advantage of the full five-variable model is that it estimates effects of gender and race on these responses, in particular the effects of race and gender on alcohol use and the effect of gender on marijuana use.

23 STA 617 – Chp9 Loglinear/Logit Models Loglinear Model Comparison Statistics

24 STA 617 – Chp9 Loglinear/Logit Models statistic  Or for two nested loglinear models with It is asymptotically chi-squared with df equal to the difference between df for M 0 and M 1

25 STA 617 – Chp9 Loglinear/Logit Models 9.3 DIAGNOSTICS FOR CHECKING MODELS  The model comparison test using is useful for detecting whether an extra term improves a model fit.  Cell residuals provide a cell-specific indication of model lack of fit.

26 STA 617 – Chp9 Loglinear/Logit Models Residuals for Loglinear Models  Pearson residual is  Haberman (1973) defined the standardized Pearson residual

27 STA 617 – Chp9 Loglinear/Logit Models Student Survey Example Revisited  Model

28 STA 617 – Chp9 Loglinear/Logit Models  two-factor associations model  Both models are good