Discussion: Week 4 Phillip Keung
Outline Logistic regression and the HIVNET dataset 2x2 tables vs logistic regression Effect modifiers in logistic regression Presence of confounding Other forms of binary regression And associated computational issues Note on mhodds
Logistic Regression logistic will6 c.know6 will0 know0 i.group i.educ cohort age, coef Syntax: c.varname i.varname coef estimates store base Need to save estimates for LR test later
OR Estimate The odds of being willing to participate in a vaccine trial is 0.954 times the odds of being willing to participate for two groups that have knowledge scores that differ by one point given the two groups being compared are from the same cohort, are the same age, have the same education level, risk group, and have the same baseline knowledge and willingness scores.
Aside: Tables vs Regression Logistic regression is helpful when: Adjusting for lots of covariates Adjusting for continuous covariates Data is sparse Let’s look at the OR of willingness and dichotomized education level (i.e. the school variable) using tables and logistic regression
Aside: Tables vs Regression egen five_age = cut(age), group(5) Convert age to a discrete variable with 5 groups of roughly equal size mhodds will6 school, by(five_age) OR: 0.811 logistic will6 school i.five_age OR: 0.809 logistic will6 school age OR: 0.796
Aside: Tables vs Regression Notice that (with 5 age groups) the logistic regression and the M-H adjusted OR are very close Choice of number of categories arbitrary, and could leave residual confounding Logistic regression allows more flexibility for modeling the relationship between age and willingness
Education Interaction logistic will6 c.know6##i.educ will0 know0 i.group cohort age, coef Syntax: ## estimates store educ_inter Store estimates from regression lrtest base educ_inter Compare base model to model with education interaction term using LR test
Effect Modification Base model: ------------------------------------------------------------------------------ will6 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- know6 | -.0467021 .0389866 -1.20 0.231 -.1231145 .0297103 will0 | 2.394492 .2140915 11.18 0.000 1.974881 2.814104 know0 | .0602142 .0381116 1.58 0.114 -.0144832 .1349116 | group | 2 | .1675457 .3851763 0.43 0.664 -.587386 .9224775 3 | .1116234 .4154844 0.27 0.788 -.7027111 .9259579 4 | -.3821958 .4885276 -0.78 0.434 -1.339692 .5753007 educ | 2 | .1504567 .3705785 0.41 0.685 -.5758638 .8767771 3 | .0118304 .4053975 0.03 0.977 -.7827342 .8063949 4 | .0824443 .4692348 0.18 0.861 -.837239 1.002128 5 | .5598642 .5823134 0.96 0.336 -.5814491 1.701177 6 | .0829779 .5169882 0.16 0.872 -.9303003 1.096256 cohort | .2638523 .2289511 1.15 0.249 -.1848836 .7125882 age | .0229322 .0128295 1.79 0.074 -.0022131 .0480776 _cons | -3.178244 .674853 -4.71 0.000 -4.500932 -1.855557
Effect Modification New model: ------------------------------------------------------------------------------ will6 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- know6 | -.0554959 .0811209 -0.68 0.494 -.2144899 .1034981 | educ | 2 | .2652283 .5050073 0.53 0.599 -.7245678 1.255024 3 | .0274294 .5789323 0.05 0.962 -1.107257 1.162116 4 | -1.18485 .9756056 -1.21 0.225 -3.097002 .7273021 5 | 1.277227 1.246544 1.02 0.306 -1.165954 3.720408 6 | .3250482 1.170416 0.28 0.781 -1.968926 2.619022 educ#c.know6 | 2 | -.024935 .1067504 -0.23 0.815 -.234162 .184292 3 | .001546 .1053807 0.01 0.988 -.2049964 .2080885 4 | .1706287 .1342288 1.27 0.204 -.0924548 .4337123 5 | -.0974846 .1758253 -0.55 0.579 -.4420959 .2471266 6 | -.026445 .155034 -0.17 0.865 -.3303061 .2774162 will0 | 2.398833 .2149534 11.16 0.000 1.977532 2.820134 know0 | .0625678 .0384289 1.63 0.103 -.0127516 .1378871 group | 2 | .179361 .3953121 0.45 0.650 -.5954364 .9541584 3 | .0946594 .4197758 0.23 0.822 -.7280859 .9174048 4 | -.3755364 .4984134 -0.75 0.451 -1.352409 .6013358 cohort | .2759143 .2301219 1.20 0.231 -.1751164 .7269449 age | .0235063 .0129468 1.82 0.069 -.001869 .0488815 _cons | -3.189212 .6970006 -4.58 0.000 -4.555308 -1.823116
LR Test Can only be used to compare nested models The base model is a sub-model of the new model P-value of 0.66, so no evidence of interaction between education level and knowledge score at month 6
Group Interaction logistic will6 c.know6##i.group will0 know0 i.educ cohort age, coef estimates store group_inter lrtest base group_inter
Confounding Look at how the OR between willingness and knowledge changes with and without the possible confounder No statistical test for confounding Rules of thumb exist E.g. 10-15% change in coefficient Not recommended Should have a scientific reason to declare a covariate a confounder though
Confounding Base model Log-OR: -0.0467 logistic will6 c.know6 will0 know0 i.group cohort age, coef Log-OR: -0.0465 logistic will6 c.know6 will0 know0 i.educ cohort age, coef Log-OR: -0.0459 logistic will6 c.know6 will0 know0 i.group i.educ age, coef Log-OR: -0.0451 logistic will6 c.know6 will0 know0 i.group i.educ cohort, coef Log-OR: -0.521
RR/RD Regression Use binreg to get RR/RD estimates binreg will6 c.know6 cohort, coef rr RR: 0.973 binreg will6 c.know6 cohort, rd RD: -0.00620
RR/RD Regression What do the models look like? For RR: For RD: log 𝑝 = 𝛽 0 + 𝛽 1 𝑘𝑛𝑜𝑤6+ 𝛽 2 𝑐𝑜ℎ𝑜𝑟𝑡 For RD: 𝑝= 𝛽 0 + 𝛽 1 𝑘𝑛𝑜𝑤6+ 𝛽 2 𝑐𝑜ℎ𝑜𝑟𝑡 Interpretation of coefficients/output?
RR/RD Regression RR: The likelihood of being willing to participate in vaccine trials is 0.973 times the likelihood of being willing to participate for two groups that have knowledge scores that differ by one point given that the groups are from the same cohort.
RR/RD Regression RD: There would be 6.2 fewer people for every 1000 who would be willing to participate in vaccine trials in a group with a knowledge score that was one point higher than another group given that both groups were from the same cohort.
Aside: Computational Issues We encountered no problems when running the full model with logistic regression Try: binreg will6 c.know6 will0 know0 i.group i.educ cohort age, rr binreg will6 c.know6 will0 know0 i.group i.educ cohort age, rd After > 600 iterations (for the RD), the model will not have converged
Aside: Computational Issues Why? Reasons are technical, but loosely speaking, it has to do with the fact that the estimated probabilities should lie between 0 and 1 It’s not easy to force the estimates to remain inside that sensible range unless you use the log-odds
RR/RD Vs OR Pros for OR: Cons for OR: Works when you have case-control data Is approximately the RR for rare events Logistic regression implemented in most statistical software packages Cons for OR: OR is not necessarily the right measure of association Also, OR arguably harder to interpret OR has strange properties (look up non-collapsibility if you’re interested)
RR/RD vs OR Pros for RR/RD: Cons for RR/RD: RR or RD is often the actual measure of association that you’re interested in Cons for RR/RD: RR/RD regression is numerically unstable (i.e. the model won’t converge) Not applicable to case-control data Not necessarily available in every software package
Aside: mhodds Word of warning about continuous covariates and mhodds: It does something like logistic regression when you give it a variable that seems continuous . mhodds will6 age Score test for trend of odds with age (The Odds Ratio estimate is an approximation to the odds ratio for a one unit increase in age) ---------------------------------------------------------------- Odds Ratio chi2(1) P>chi2 [95% Conf. Interval] 1.010376 1.00 0.3176 0.990126 1.031039
Aside: mhodds . logistic will6 age, iter(1) convergence not achieved Logistic regression Number of obs = 750 LR chi2(1) = 0.99 Prob > chi2 = 0.3199 Log likelihood = -388.26013 Pseudo R2 = 0.0013 ------------------------------------------------------------------------------ will6 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 1.010389 .0102959 1.01 0.310 .9904101 1.030772 Warning: convergence not achieved Answers are slightly different for technical reasons which are uninteresting
Questions?