Sec 9C – Logistic Regression and Propensity scores

Slides:



Advertisements
Similar presentations
Lecture 11 (Chapter 9).
Advertisements

A workshop introducing doubly robust estimation of treatment effects
M2 Medical Epidemiology
HSRP 734: Advanced Statistical Methods July 24, 2008.
Propensity Score Matching Lava Timsina Kristina Rabarison CPH Doctoral Seminar Fall 2012.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Chance, bias and confounding
Multinomial Logistic Regression
Stratification and Adjustment
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Unit 6: Standardization and Methods to Control Confounding.
Advanced Statistics for Interventional Cardiologists.
Simple Linear Regression
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33. Research Question Are nursing homes dangerous for seniors? Does admittance to a nursing home increase risk.
Amsterdam Rehabilitation Research Center | Reade Multiple regression analysis Analysis of confounding and effectmodification Martin van de Esch, PhD.
Assessing ETA Violations, and Selecting Attainable/Realistic Parameters Causal Effect/Variable Importance Estimation and the Experimental Treatment Assumption.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
Latent Growth Modeling Byrne Chapter 11. Latent Growth Modeling Measuring change over repeated time measurements – Gives you more information than a repeated.
Tim Wiemken PhD MPH CIC Assistant Professor Division of Infectious Diseases University of Louisville, Kentucky Confounding.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
Matching STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February
1 Statistical Review of the Observational Studies of Aprotinin Safety Part II: The i3 Drug Safety Study CRDAC and DSaRM Meeting September 12, 2007 P. Chris.
Randomized Assignment Difference-in-Differences
Logistic regression (when you have a binary response variable)
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
MATCHING Eva Hromádková, Applied Econometrics JEM007, IES Lecture 4.
DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.
(www).
Methods of Presenting and Interpreting Information Class 9.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Matching methods for estimating causal effects Danilo Fusco Rome, October 15, 2012.
Survival-time inverse-probability- weighted regression adjustment
The comparative self-controlled case series (CSCCS)
Advanced Quantitative Techniques
Constructing Propensity score weighted and matched Samples Stacey L
Notes on Logistic Regression
Advanced Quantitative Techniques
Workshop on Residential Property Price Indices
Lecture 18 Matched Case Control Studies
Applied Biostatistics: Lecture 2
Sampling Population: The overall group to which the research findings are intended to apply Sampling frame: A list that contains every “element” or.
Epidemiology 503 Confounding.
Regression Techniques
Introduction to logistic regression a.k.a. Varbrul
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Date: Presenter: Ryan Chen
Methods of Economic Investigation Lecture 12
Treatment effect: Part 2
Impact Evaluation Methods
Why use marginal model when I can use a multi-level model?
Introduction to Logistic Regression
Evaluating Effect Measure Modification
The Aga Khan University
The European Statistical Training Programme (ESTP)
Chapter: 9: Propensity scores
Enhancing causal influence (in observational studies)
Enhancing Causal Inference in Observational Studies
Confounders.
Enhancing Causal Inference in Observational Studies
Effect Modifiers.
Presentation transcript:

Sec 9C – Logistic Regression and Propensity scores

propensity In a randomized trial, we know the probability (“propensity”) that a person with a certain set of covariates/ risk factors (age, gender …) will be in group A or B. In a randomized trial with 50:50 allocation, the probability is 50% for being in A or B for all variables. Everyone has the SAME propensity and, on average, the covariates are the same in A and B. These are linked.

Controlling for confounding In comparing group A to B in a non randomized study, one may have confounding as the risk factors are not necessarily balanced between the two groups. One option to control for confounding is to include all the potential covariates in a multivariate model. If there are only a few covariates, another option is to make strata. Within a stratum, there would be no association between treatment (A or B) and covariates. For example, if gender and smoking were the only risk factors, could compare A to B in male smokers, female smokers, male non smokers and female non smokers.

However, if we knew the probability of each person being assigned to treatment A (= 1- prob of assignment to B), one can shows that if one stratifies or matches on this probability (this propensity), the average values of the covariates within each stratum (or each match) are (at least) roughly the same between the two treatments! That is, it is not necessary to use all the covariate variables directly to make (too many) strata or matches. Those with the same propensity have the same (or very similar) covariate values.

Logistic for estimating propensity While we do not know the probability of assignment to A (and B) we can model it using logistic regression. Here the outcome is treatment group (A or B) and the potential confounders are the predictors. We can then use the logit score to “summarize” all the covariates into a single score. We can then make strata using this score or use it as a single continuous covariate. We do not have to be concerned whether this model for A or B is “correct”, or have any meaning as long as the strata made in this way produce balance.

Example: tooth whitening Is a new treatment for "whiter teeth" better than the standard treatment? Sample of n=350 people. t test - comparing mean gray scale scores (high is bad) Unadjusted scores - observational study This is not a randomized trial   Group n mean SD SEM STD 208 39.45 24.1 1.67 NEW 142 42.51 20.8 1.75 Mean difference 3.06 2.49 t=-1.23, p=0.219

Covariate comparison- not the same STD, n=208   NEW, n=142 p value mean SD sem  age 22.36 6.47 0.45 24.4 6.33 0.53 0.004 Sugar use 6.10 3.08 0.21 5.84 3.06 0.26 0.435 PCT SE Male 28.4% 3.1% 47.2% 4.2% 0.0003 Floss 28.9% 35.9% 4.0% 0.1629 Yearly clean 31.7% 3.2% 32.4% 3.9% 0.8960 drink coffee 42.3% 3.4% 74.7% 3.7% <0.0001 drink tea 30.8% 62.7% 4.1% use mouthwash 22.1% 2.9% 25.4% 0.4827

Logistic model for “new tx”-propensity variable Log OR SE p value Intercept -1.798 0.5417 0.0009 Age 0.0214 0.0196 0.2744 Male 0.3898 0.2559 0.1277 Floss 0.3280 0.2601 0.2073 Yearly clean -0.0543 0.2556 0.8319 Sugar use -0.0401 0.0393 0.3078 Coffee 0.9042 0.2767 0.0011 Tea 0.8681 0.2570 0.0007 mouthwash -0.1009 0.2844 0.7228 Score = -1.798 + 0.0214 Age + 0.3898 Male + 0.3280 Floss – 0.0543 Y clean – 0.0401 Sugar + 0.9042 coffee + 0.8681 tea – 0.1009 mouthwash Propensity (“new tx”) = exp(score) / [1 + exp(score)]

Covariate compare by propensity strata mean age tx stratum 1 stratum 2 stratum 3 stratum 4 STD 18.0 24.8 25.5 25.6 NEW 25.2 23.5 23.7 25.8 p value 0.0668 0.2648 0.1696 0.8743 mean sugar use 6.55 5.63 6.05 5.76 7.62 6.66 5.55 5.33 0.4616 0.1587 0.3865 0.5455 pct male 3.6% 24.5% 44.7% 71.1% 0.0% 30.8% 46.0% 65.3% 0.078 0.514 0.906 0.566 pct who floss 20.5% 34.7% 26.3% 42.1% 25.0% 23.1% 30.0% 53.1% 0.838 0.225 0.702 0.307

Covariate compare by propensity strata pct yearly tooth clean tx stratum 1 stratum 2 stratum 3 stratum 4 STD 26.5% 40.8% 34.2% 28.9% NEW 75.0% 25.6% 32.0% 34.7% p value 0.070 0.126 0.827 0.566 pct drink coffee 0.0% 86.8% 100.0% 46.2% 78.0% 1.000 0.274 0.271 pct drink tea 8.2% 57.9% 60.0% 0.040 0.842 pct use mouthwash 19.3% 14.3% 31.6% 50.0% 16.0% 32.7% 0.226 0.186 0.150 0.915

Gray scale means by propensity strata (quartiles)   STD NEW n mean p value score stratum difference 1 83 21.3 4 27.5 87 6.2 0.5304 0-.2 2 49 43.9 39 36.9 88 -7.0 0.0915 0.2-0.4 3 38 53.9 50 40.6 -13.3 0.0014 0.4-0.6 58.9 50.2 -8.7 0.0358 0.6+ total n 208 142 350 adjusted mean 44.5 38.8 -5.7 0.06 unadjusted mean 39.4 42.5 3.1 0.21 adj mean 52.2 -9.7 stratum 2,3,4

Propensity score as continuous covariate Regression on gray scale variable Regression coefficient SE p value Intercept 52.57 1.69 < 0.0001 New tx -9.77 2.31 score 17.56 1.43 New tx * score -7.94 2.76 0.0042 R square = 0.328, SDe = 18.8 Q- If the propensity score is a good proxy for the 8 covariates, what should happen if any or all of the 8 covariates are added to the above model?

Propensity score as continuous covariate As the propensity to choose the NEW treatment increases, the mean difference between the two treatments increases.

Advantages of propensity score 1. Reduces all the covariates to one dimension 2. Easy to check if the two groups being compared overlap on the score (ie on the covariates) 3. Does not extrapolate beyond the range of the data (unlike linear regression) 4. Robust – Does not matter if model for propensity score is incorrectly specified as long as covariates are the same in the strata or matches made by the score Disadvantages Can only have two groups (can be modified) Don’t directly assess effects of covariates on outcome

Can check propensity score overlap between the two groups Lack of overlap indicates that some subjects have covariate values on one group that are completely absent in the other group.