Discussion: Week 4 Phillip Keung.

Slides:



Advertisements
Similar presentations
Statistical Analysis SC504/HS927 Spring Term 2008
Advertisements

Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
Logistic Regression.
What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.
SC968: Panel Data Methods for Sociologists Random coefficients models.
Assignment 12 – Appendix B- 1 DON’T TAKE NOTES! ALL THAT FOLLOWS IS IN THE COURSE MATERIALS! The purpose of Appendix B is to statistically analyze a behavioral.
Overview of Logistics Regression and its SAS implementation
Matched designs Need Matched analysis. Incorrect unmatched analysis. cc cc exp,exact Proportion | Exposed Unexposed | Total Exposed
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function F(Z) giving the probability is the cumulative standardized.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
Logistic Regression and Odds Ratios
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Modelling risk ratios and risk differences …this is *new* methodology…
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Event History Models Sociology 229: Advanced Regression Class 5
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.
Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
Logistic Regression. Linear Regression Purchases vs. Income.
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
Exact Logistic Regression
1 Ordinal Models. 2 Estimating gender-specific LLCA with repeated ordinal data Examining the effect of time invariant covariates on class membership The.
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
QM222 Class 9 Section A1 Coefficient statistics
New directions in experience studies
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
Logistic Regression APKC – STATS AFAC (2016).
Advanced Quantitative Techniques
assignment 7 solutions ► office networks ► super staffing
Advanced Quantitative Techniques
Lecture 18 Matched Case Control Studies
QM222 Class 11 Section A1 Multiple Regression
Event History Analysis 3
QM222 Class 8 Section A1 Using categorical data in regression
Introduction to logistic regression a.k.a. Varbrul
Introduction to Logistic Regression
QM222 Class 15 Section D1 Review for test Multicollinearity
Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Part 1: The gologit model & gologit2 program.
Problems with infinite solutions in logistic regression
CMGPD-LN Methodological Lecture Day 4
Count Models 2 Sociology 8811 Lecture 13
Common Statistical Analyses Theory behind them
Logistic Regression.
Case-control studies: statistics
Presentation transcript:

Discussion: Week 4 Phillip Keung

Outline Logistic regression and the HIVNET dataset 2x2 tables vs logistic regression Effect modifiers in logistic regression Presence of confounding Other forms of binary regression And associated computational issues Note on mhodds

Logistic Regression logistic will6 c.know6 will0 know0 i.group i.educ cohort age, coef Syntax: c.varname i.varname coef estimates store base Need to save estimates for LR test later

OR Estimate The odds of being willing to participate in a vaccine trial is 0.954 times the odds of being willing to participate for two groups that have knowledge scores that differ by one point given the two groups being compared are from the same cohort, are the same age, have the same education level, risk group, and have the same baseline knowledge and willingness scores.

Aside: Tables vs Regression Logistic regression is helpful when: Adjusting for lots of covariates Adjusting for continuous covariates Data is sparse Let’s look at the OR of willingness and dichotomized education level (i.e. the school variable) using tables and logistic regression

Aside: Tables vs Regression egen five_age = cut(age), group(5) Convert age to a discrete variable with 5 groups of roughly equal size mhodds will6 school, by(five_age) OR: 0.811 logistic will6 school i.five_age OR: 0.809 logistic will6 school age OR: 0.796

Aside: Tables vs Regression Notice that (with 5 age groups) the logistic regression and the M-H adjusted OR are very close Choice of number of categories arbitrary, and could leave residual confounding Logistic regression allows more flexibility for modeling the relationship between age and willingness

Education Interaction logistic will6 c.know6##i.educ will0 know0 i.group cohort age, coef Syntax: ## estimates store educ_inter Store estimates from regression lrtest base educ_inter Compare base model to model with education interaction term using LR test

Effect Modification Base model: ------------------------------------------------------------------------------ will6 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- know6 | -.0467021 .0389866 -1.20 0.231 -.1231145 .0297103 will0 | 2.394492 .2140915 11.18 0.000 1.974881 2.814104 know0 | .0602142 .0381116 1.58 0.114 -.0144832 .1349116 | group | 2 | .1675457 .3851763 0.43 0.664 -.587386 .9224775 3 | .1116234 .4154844 0.27 0.788 -.7027111 .9259579 4 | -.3821958 .4885276 -0.78 0.434 -1.339692 .5753007 educ | 2 | .1504567 .3705785 0.41 0.685 -.5758638 .8767771 3 | .0118304 .4053975 0.03 0.977 -.7827342 .8063949 4 | .0824443 .4692348 0.18 0.861 -.837239 1.002128 5 | .5598642 .5823134 0.96 0.336 -.5814491 1.701177 6 | .0829779 .5169882 0.16 0.872 -.9303003 1.096256 cohort | .2638523 .2289511 1.15 0.249 -.1848836 .7125882 age | .0229322 .0128295 1.79 0.074 -.0022131 .0480776 _cons | -3.178244 .674853 -4.71 0.000 -4.500932 -1.855557

Effect Modification New model: ------------------------------------------------------------------------------ will6 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- know6 | -.0554959 .0811209 -0.68 0.494 -.2144899 .1034981 | educ | 2 | .2652283 .5050073 0.53 0.599 -.7245678 1.255024 3 | .0274294 .5789323 0.05 0.962 -1.107257 1.162116 4 | -1.18485 .9756056 -1.21 0.225 -3.097002 .7273021 5 | 1.277227 1.246544 1.02 0.306 -1.165954 3.720408 6 | .3250482 1.170416 0.28 0.781 -1.968926 2.619022 educ#c.know6 | 2 | -.024935 .1067504 -0.23 0.815 -.234162 .184292 3 | .001546 .1053807 0.01 0.988 -.2049964 .2080885 4 | .1706287 .1342288 1.27 0.204 -.0924548 .4337123 5 | -.0974846 .1758253 -0.55 0.579 -.4420959 .2471266 6 | -.026445 .155034 -0.17 0.865 -.3303061 .2774162 will0 | 2.398833 .2149534 11.16 0.000 1.977532 2.820134 know0 | .0625678 .0384289 1.63 0.103 -.0127516 .1378871 group | 2 | .179361 .3953121 0.45 0.650 -.5954364 .9541584 3 | .0946594 .4197758 0.23 0.822 -.7280859 .9174048 4 | -.3755364 .4984134 -0.75 0.451 -1.352409 .6013358 cohort | .2759143 .2301219 1.20 0.231 -.1751164 .7269449 age | .0235063 .0129468 1.82 0.069 -.001869 .0488815 _cons | -3.189212 .6970006 -4.58 0.000 -4.555308 -1.823116

LR Test Can only be used to compare nested models The base model is a sub-model of the new model P-value of 0.66, so no evidence of interaction between education level and knowledge score at month 6

Group Interaction logistic will6 c.know6##i.group will0 know0 i.educ cohort age, coef estimates store group_inter lrtest base group_inter

Confounding Look at how the OR between willingness and knowledge changes with and without the possible confounder No statistical test for confounding Rules of thumb exist E.g. 10-15% change in coefficient Not recommended Should have a scientific reason to declare a covariate a confounder though

Confounding Base model Log-OR: -0.0467 logistic will6 c.know6 will0 know0 i.group cohort age, coef Log-OR: -0.0465 logistic will6 c.know6 will0 know0 i.educ cohort age, coef Log-OR: -0.0459 logistic will6 c.know6 will0 know0 i.group i.educ age, coef Log-OR: -0.0451 logistic will6 c.know6 will0 know0 i.group i.educ cohort, coef Log-OR: -0.521

RR/RD Regression Use binreg to get RR/RD estimates binreg will6 c.know6 cohort, coef rr RR: 0.973 binreg will6 c.know6 cohort, rd RD: -0.00620

RR/RD Regression What do the models look like? For RR: For RD: log 𝑝 = 𝛽 0 + 𝛽 1 𝑘𝑛𝑜𝑤6+ 𝛽 2 𝑐𝑜ℎ𝑜𝑟𝑡 For RD: 𝑝= 𝛽 0 + 𝛽 1 𝑘𝑛𝑜𝑤6+ 𝛽 2 𝑐𝑜ℎ𝑜𝑟𝑡 Interpretation of coefficients/output?

RR/RD Regression RR: The likelihood of being willing to participate in vaccine trials is 0.973 times the likelihood of being willing to participate for two groups that have knowledge scores that differ by one point given that the groups are from the same cohort.

RR/RD Regression RD: There would be 6.2 fewer people for every 1000 who would be willing to participate in vaccine trials in a group with a knowledge score that was one point higher than another group given that both groups were from the same cohort.

Aside: Computational Issues We encountered no problems when running the full model with logistic regression Try: binreg will6 c.know6 will0 know0 i.group i.educ cohort age, rr binreg will6 c.know6 will0 know0 i.group i.educ cohort age, rd After > 600 iterations (for the RD), the model will not have converged

Aside: Computational Issues Why? Reasons are technical, but loosely speaking, it has to do with the fact that the estimated probabilities should lie between 0 and 1 It’s not easy to force the estimates to remain inside that sensible range unless you use the log-odds

RR/RD Vs OR Pros for OR: Cons for OR: Works when you have case-control data Is approximately the RR for rare events Logistic regression implemented in most statistical software packages Cons for OR: OR is not necessarily the right measure of association Also, OR arguably harder to interpret OR has strange properties (look up non-collapsibility if you’re interested)

RR/RD vs OR Pros for RR/RD: Cons for RR/RD: RR or RD is often the actual measure of association that you’re interested in Cons for RR/RD: RR/RD regression is numerically unstable (i.e. the model won’t converge) Not applicable to case-control data Not necessarily available in every software package

Aside: mhodds Word of warning about continuous covariates and mhodds: It does something like logistic regression when you give it a variable that seems continuous . mhodds will6 age Score test for trend of odds with age (The Odds Ratio estimate is an approximation to the odds ratio for a one unit increase in age) ---------------------------------------------------------------- Odds Ratio chi2(1) P>chi2 [95% Conf. Interval] 1.010376 1.00 0.3176 0.990126 1.031039

Aside: mhodds . logistic will6 age, iter(1) convergence not achieved Logistic regression Number of obs = 750 LR chi2(1) = 0.99 Prob > chi2 = 0.3199 Log likelihood = -388.26013 Pseudo R2 = 0.0013 ------------------------------------------------------------------------------ will6 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 1.010389 .0102959 1.01 0.310 .9904101 1.030772 Warning: convergence not achieved Answers are slightly different for technical reasons which are uninteresting

Questions?