Log-linear Models HRP /03/04
Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti chapter 4). 2. Recall: log = + x = e ( e ) x A one-unit increase in X has a multiplicative impact of e on . 3. General idea: predict the expected frequency (count) in each cell by a product of “effects”— main effects and interactions. 4. (Take logs to linearize).
Log-linear vs. logistic 1. The expected distribution of the categorical variables is Poisson, not binomial. 2. The link function is the log, not the logit. 3. Predictions are estimates of the cell counts in a contingency table, not the logit of y.
Log-linear vs. logistic The variables investigated by log linear models are all treated as “response variables.” Therefore, loglinear models only demonstrate association between variables (like chi-square or correlation coefficient). If clear explanatory and response variables exist, then logistic regression should be used instead. Also, if the variables are continuous and cannot be broken down into discrete categories, logistic regression is preferable.
Example: 3-way contingency Heart DiseaseTotal Body WeightSexYesNo Not over weightMale15520 Female Total Over weightMale Female Total Source: Angela Jeansonne
In class exercise: Analyze these data using methods we have already learned. Is gender related to heart disease and is this effect modified or confounded by weight? What’s the relationship between overweight and gender (controlled for chd) and overweight and heart disease (controlled for gender)?
Heart DiseaseTotal SexYesNo All weightsMale Female Total Over weightMale Female Total OR male-CHD =35*100/(15*50)=4.66 Crude OR CHD-Male (ignore overweight)
Crude OR Overweight-Male (ignore heart disease) OverweightTotal SexYesNo All CHD-statusMale Female Total Over weightMale Female Total OR Overweight-Male =30*100/(20*50)=3.0
Crude OR CHD-Overweight (ignore gender) Heart DiseaseTotal WeightYesNo Men and Women combined Heavy Light Total Over weightMale Female Total OR CHD-Overweight =30*65/(50*55)=0.71
OR MH (CHD-Male) – stratified by Overweight
Stratified by Heart Disease OverweightTotal SexYesNo Heart DiseaseMale Female Total No CHDMale10515 Female Total
OR MH (Overweight-Male) – stratified by Heart Disease
Stratified by gender Heart DiseaseTotal GenderWeightYesNo MaleHeavy Light15520 Total FemaleHeavy Light Total
OR MH (CHD-Overweight) – stratified by Gender
Model with log-linear models
Model 1: Independence SAS CODE for generlized linear model with Poisson distribution and log link function: proc genmod data=loglinear; model total = Overweight IsMale HeartDis / dist=poisson link=log pred ; run; Model 1 (main effects only): Log (counts) = + overweight + isMale + HeartDisease Implies that the cell counts only depend on the MARGINAL probabilities (odds)
Independence model: parameters Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Intercept Overweight IsMale HeartDis Parameter Pr > ChiSq Intercept <.0001 Overweight IsMale <.0001 HeartDis Model 1: Log (counts) = (weight) – 1.1 (male) -.30 (heart disease)
Interpretation of Parameters: Marginal Odds Model 1: Log (counts) = (weight) – 1.1 (male) -.30 (heart disease) e -.41 = the (marginal) odds of being overweight =.66= 80/120 e -1.1 = the odds of being male =.33 = 50/150 e -0.3 = the odds of having disease=.74 = 85/115
Marginal probabilities P(overweight) =.66/(.66+1)=.40 (80/200) P(male)=.33/(.33+1)=.25 (50/200) P(heart disease)=.74/1.74=.425 (80/200) Predicted Counts As examples: The expected number of light men with heart disease = 200*(1-.40)(.25)(.425) under independence, or The expected number of light men without disease = 200*(1-.40)(.25)(1-.425) under independence, or 17.25
Independence model: goodness-of-fit Cells Observed Pred light/male/disease light/male/no disease light/female/disease light/female/no disease heavy/male/disease heavy/male/no disease heavy/female/disease heavy/female/no disease df = cells – parameters in model=8-4 Suggests independen ce model is a poor fit!!
Predicted Table (note: marginal proportions don’t change) Heart DiseaseTotal Body WeightSexYesNo Not over weightMale Female Total Over weightMale Female Total
Predicted OR CHD-Male Heart DiseaseTotal SexYesNo All weightsMale Female Total Over weightMale Female Total OR CHD-male =21.25*86.25/(28.75*63.75)=1.0
The model coefficients have an odds ratio interpretation…
Coefficients represent predicted counts in each cell Coefficients have a direct odds ratio interpretation Calculate OR CHD-Male in each Weight stratum This interpretation becomes more interesting/useful when interaction terms occur!
Expected OR CHD-Overweight Heart DiseaseTotal WeightYesNo All gendersHeavy Light Total Over weightMale Female Total OR CHD-Overweight =34*69/(46*51)=1.0
Expected OR Overweight-Male OverweightTotal SexYesNo All CHD statusMale Female Total Over weightMale Female Total OR Overweight-Male =20*90/(60*30)=1.0
Model with Interaction: Model 2 (main effects + interaction with gender): This model corresponds to case when heart disease and overweight are conditionally independent (conditioned on gender). Log (counts) = + overweight + isMale + HeartDisease + isMale * HeartDisease + isMale * overweight proc genmod data=loglinear; model total = Overweight IsMale HeartDis isMale*HeartDis isMale*Overweight/ dist=poisson link=log pred ; run; Implies that gender is associated with heart disease and with overweight but overweight and heart disease are independent. OR CHD -Male 1 and OR Overweight-Male 1, but OR CHD-Overweight =1
Model 2: Log (counts) = (weight) – 2.4 (male) -.69 (heart disease) 1.54 (if male and heartdis) (if overweight and male) Analysis Of Parameter Estimates Standard Wald 95% Parameter DF Estimate Error Confidence Limits Intercept Overweight IsMale HeartDis IsMale*HeartDis Overweight*IsMale Analysis Of Parameter Estimates Chi- Parameter Square Pr > ChiSq Intercept <.0001 Overweight <.0001 IsMale <.0001 HeartDis <.0001 IsMale*HeartDis <.0001 Overweight*IsMale
Interpretation of Parameters, Model 2 Model 2: Log (counts) = (weight) – 2.4 (male) -.69 (heart disease) 1.54 (if male and heartdis) (if overweight and male)
OR estimate from predicted counts Cells Observed Pred light/male/disease light/male/no disease 5 6 light/female/disease light/female/no disease heavy/male/disease heavy/male/no disease 10 9 heavy/female/disease heavy/female/no disease OR CHD-Male is not confounded by weight
OR Overweight-Male Model 2: Log (counts) = (weight) – 2.4 (male) -.69 (heart disease) 1.54 (if male and heartdis) (if overweight and male)
OR estimate from predicted counts Cells Observed Pred light/male/disease light/male/no disease 5 6 light/female/disease light/female/no disease heavy/male/disease heavy/male/no disease 10 9 heavy/female/disease heavy/female/no disease OR male-overweight is not confounded by chd
OR CHD-OVerweight Model 2: Log (counts) = (weight) – 2.4 (male) -.69 (heart disease) 1.54 (if male and heartdis) (if overweight and male)
Interpretation: Model 2 Overweight and heart-disease are independent when you condition on gender. Heart Disease MenYesNo Overweight219 WomenOverweight normal normal 146 OR=21*6/14*9 =1.0 OR=16.6*33.3/33.3*33.3 =1.0
Model 3: only male and chd are related Output Model 3: Log (counts) = (weight) – 1.9 (male) -.69 (heart disease) 1.54 (if male and heartdis) Model 2 (main effects + single interaction): This model corresponds to case when heart disease and overweight and gender and overweight are conditionally independent. Log (counts) = + overweight + isMale + HeartDisease + isMale * HeartDisease
OR: Male and CHD Model 3: Log (counts) = (weight) – 1.9 (male) -.69 (heart disease) 1.54 (if male and heartdis)
Cells Observed Pred light/male/disease light/male/no disease 5 9 light/female/disease light/female/no disease heavy/male/disease heavy/male/no disease 10 6 heavy/female/disease heavy/female/no disease Model 3: only male and chd are related
Collapses to… CHD No CHD MaleFemale
And… heart disease and overweight are independent, regardless of gender CHD No CHD Overweightlight
And… overweight and gender are independent, regardless of disease Male Female Overweightlight
M4: All pair-wise interactions proc genmod data=loglinear; model total = Overweight IsMale HeartDis isMale*HeartDis isMale*Overweight Overweight*HeartDis / dist=poisson link=log pred ; run; Model 4 (main effects +all pairwise interactions): No pair of variables is conditionally independent. Log (counts) = + overweight + isMale + HeartDisease isMale * HeartDisease + isMale * overweight + HeartDis * overweight
Model 4: Log (counts) = (weight) – 2.7 (male) -.45 (heart disease) 1.8 (if male and heartdis) (if overweight and male)-.82 (if over and heartdis) Standard Wald 95% Parameter DF Estimate Error Confidence Limits Intercept Overweight IsMale HeartDis IsMale*HeartDis Overweight*IsMale Overweight*HeartDis Analysis Of Parameter Estimates Chi- Parameter Square Pr > ChiSq Intercept <.0001 Overweight IsMale <.0001 HeartDis IsMale*HeartDis <.0001 Overweight*IsMale Overweight*HeartDis
OR: Male and CHD Model 4: Log (counts) = (weight) – 2.7 (male) -.45 (heart disease) 1.8 (if male and heartdis) (if overweight and male)-.82 (if over and heartdis) Corresponds to the M-H summary OR, stratified by overweight
OR: CHD and overweight Model 4: Log (counts) = (weight) – 2.7 (male) -.45 (heart disease) 1.8 (if male and heartdis) (if overweight and male)-.82 (if over and heartdis) Corresponds to the M-H summary OR, stratified by gender
OR: male and overweight Model 4: Log (counts) = (weight) – 2.7 (male) -.45 (heart disease) 1.8 (if male and heartdis) (if overweight and male)-.82 (if over and heartdis) Corresponds to the M-H summary OR, stratified by chd
OR estimate from predicted counts Cells Observed Pred light/male/disease light/male/no disease 5 4 light/female/disease light/female/no disease heavy/male/disease heavy/male/no disease heavy/female/disease heavy/female/no disease GOOD FIT!
The saturated model Model 5 (saturated): Log (counts) = + overweight + isMale + HeartDisease isMale * HeartDisease + isMale * overweight + HeartDis * overweight + isMale * HeartDisease * overweight Perfect fit—but no degrees of freedom.