Adjusting for extraneous factors Topics for today More on logistic regression analysis for binary data and how it relates to the Wolf and Mantel- Haenszel estimates of a common odds ratios Interpreting logistic regression analysis Estimating a common risk ratio in the presence of a stratification factor. Connection to Poisson regression. Readings Jewell Chapter 9
Stratified analysis for binary data Data from the ith stratum: DNot D Eaiai bibi Not Ecici didi Variance formulae in Jewell Chapter 9
Regression-based stratified analysis for Berkeley data data berkeley; input stratum male a b ; cards; run; data berkeley; set berkeley; n=a+b; proc genmod; class stratum; model a/n=male stratum/dist=binomial; run;
Stratified analysis Standard 95% Conf Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 male stratum <.0001 stratum <.0001 stratum <.0001 stratum <.0001 stratum <.0001 stratum Scale Wolf estimate: SE= CI: ( , ) Mantel-Haenszel (have to take logs of this estimate): CI: ( , ) Lets talk about the rest of these regression results … what do they mean?
Interpreting the logistic regression model for the Berkeley data We are fitting the following model: Based on the fitted model, we can predict the admission probabilities in each cell
Observed and predicted values for Berkeley Data GenderDeptLogit of admit probability Predicted prob Observed prob N Female = %108 Male = %825 Female = %25 Male =63%560 Female %593 Male =37%325 Female %375 Male =33%417 Female %393 Male =28%191 Female %341 Male =.066%373
More on the Berkeley logistic regression analysis In addition to providing an estimate of the overall gender effect, the logistic regression analysis allows us to compare admission rates between the departments. Based on the observed data, what is the log odds ratio for admission to department 5 versus department 6 for females? What about for males? What about department 1 versus department 6 for males? Females?
Another example We can add additional factors into the logistic regression model so as to obtain an estimate of the log-odds ratio, adjusting for these additional factors. Example, smoking in the Epilepsy study. Lets look in SAS: proc freq ; table one3*cig2 /chisq; run;
Epilepsy data in SAS
Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 DRUG DRUG DRUG Scale Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept <.0001 DRUG DRUG DRUG CIG Scale
Why don’t drug estimates change much?? Hint – look at association between drug and smoking proc freq ; table one3*cig2 /chisq; run;
Relative Risks We’ve talked about estimating odds ratios while adjusting for another factor. Several approaches: 1.Cochran-Mantel-Haenszel test 2.Wolf estimate of the adjusted logodds ratio 3.Mantel-Haenszel estimate of adjusted odds ratio 4.Logistic regression Lets turn now to analogous consideration for risk ratios or relative risks
Example from Jewell Table 9.2 Weight in pounds Behavior type CHDNo CHDRR (log RR) <=150A B10305(.924) A B10270(.832) A B21297(.298) A B19253(.825) 180+A B19361(.993) Relationship between behavior (type A vs type B personality) and coronary heart disease events (see p82 in Jewell for description). Unadjusted RR was 2.2 with 95% CI of (1.72, 2.87). Since weight is important, we need to adjust for it too
Wolf’s adjusted relative risk
Mantel-Haenszel adjusted relative risk
CHD example Jewell table 9.7 provides the weights for the Wolf and Mantel- Haenszel methods Lets look at using Poisson regression to do the adjustment.
Fitting the CHD model in SAS data chd; input weight behavior a b; cards; Run; data cdh; set chd; lnn=log(a+b); proc genmod; model a=behavior/dist=poisson offset=lnn; proc genmod; class weight; model a=behavior weight/dist=poisson offset=lnn; run; Notice the inclusion of the offset term corresponding to log of the sample size in each cell
SAS results Unadjusted Adjusted How do we interpret the weight coefficients?
Lets analyze the arsenic data in SAS data mlung; input village conc age atrisk events; cards; ……………………… ………………… run; data mlung; set mlung; latrisk=log(atrisk); proc genmod; class conc; model events= conc /dist=poisson offset=latrisk; where conc=0 | conc>900; run; proc genmod; class age conc; model events= conc age /dist=poisson offset=latrisk; where conc=0 | conc>900; run;
Results Unadjusted analysis Adjusted analysis Why does the concentration effect change? How do we interpret the age effects? How to add in all the concentrations?