Threats to validity in observational studies

Slides:



Advertisements
Similar presentations
June 25, 2006 Propensity Score Adjustment in Survival Models Carolyn Rutter Group Health Cooperative AcademyHealth, Seattle WA.
Advertisements

Andrea M. Landis, PhD, RN UW LEAH
Agency for Healthcare Research and Quality (AHRQ)
If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/
Study Designs in Epidemiologic
Reading the Dental Literature
Introduction to Cancer Epidemiology Epidemiology and Molecular Pathology of Cancer: Bootcamp course Tuesday, 3 January 2012.
Biostatistics ~ Types of Studies. Research classifications Observational vs. Experimental Observational – researcher collects info on attributes or measurements.
 Confounders are usually controlled with the “standard” response regression model.  The standard model includes confounders as covariates in the response.
Cohort Studies.
EVIDENCE BASED MEDICINE
Cohort Studies Hanna E. Bloomfield, MD, MPH Professor of Medicine Associate Chief of Staff, Research Minneapolis VA Medical Center.
Analysis of Complex Survey Data
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Chapter 5 Research Methods in the Study of Abnormal Behavior Ch 5.
Dr. Abdulaziz BinSaeed & Dr. Hayfaa A. Wahabi Department of Family & Community medicine  Case-Control Studies.
Study Design. Study Designs Descriptive Studies Record events, observations or activities,documentaries No comparison group or intervention Describe.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
ECON ECON Health Economic Policy Lab Kem P. Krueger, Pharm.D., Ph.D. Anne Alexander, M.S., Ph.D. University of Wyoming.
 Is there a comparison? ◦ Are the groups really comparable?  Are the differences being reported real? ◦ Are they worth reporting? ◦ How much confidence.
October 15H.S.1 Causal inference Hein Stigum Presentation, data and programs at:
Assessing ETA Violations, and Selecting Attainable/Realistic Parameters Causal Effect/Variable Importance Estimation and the Experimental Treatment Assumption.
Article Review Cara Carty 09-Mar-06. “Confounding by indication in non-experimental evaluation of vaccine effectiveness: the example of prevention of.
HSRP 734: Advanced Statistical Methods July 31, 2008.
VSM CHAPTER 6: HARM Evidence-Based Medicine How to Practice and Teach EMB.
Case-control study Chihaya Koriyama August 17 (Lecture 1)
Chapter 2 Nature of the evidence. Chapter overview Introduction What is epidemiology? Measuring physical activity and fitness in population studies Laboratory-based.
Nies and Nies and McEwen: Chapter 4: ATI: Chapter 3 Epidemiology.
Case Control Study Dr. Ashry Gad Mohamed MB, ChB, MPH, Dr.P.H. Prof. Of Epidemiology.
Causal relationships, bias, and research designs Professor Anthony DiGirolamo.
Unit 2 – Public Health Epidemiology Chapter 4 – Epidemiology: The Basic Science of Public Health.
System error Biases in epidemiological studies FETP India.
Survival Analysis 1 Always be contented, be grateful, be understanding and be compassionate.
1 Lecture 6: Descriptive follow-up studies Natural history of disease and prognosis Survival analysis: Kaplan-Meier survival curves Cox proportional hazards.
Overview of Study Designs. Study Designs Experimental Randomized Controlled Trial Group Randomized Trial Observational Descriptive Analytical Cross-sectional.
Design of Clinical Research Studies ASAP Session by: Robert McCarter, ScD Dir. Biostatistics and Informatics, CNMC
Summary: connecting the question to the analysis(es) Jay S. Kaufman, PhD McGill University, Montreal QC 26 February :40 PM – 4:20 PM National Academy.
The parametric g-formula and inverse probability weighting
Medical Technology and Practice Patterns Institute 4733 Bethesda Ave., Suite #510 Bethesda, MD Phone: Fax: Comparison of.
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
Epidemiological Study Designs And Measures Of Risks (1)
Journal Club Curriculum-Study designs. Objectives  Distinguish between the main types of research designs  Randomized control trials  Cohort studies.
Harvard T.H. Chan School of Public Health
Present: Disease Past: Exposure
Biostatistics Case Studies 2016
Epidemiological Methods
CASE-CONTROL STUDIES Ass.Prof. Dr Faris Al-Lami MB,ChB MSc PhD FFPH
Ageing with ideal cardiovascular risk factors
Some Epidemiological Studies
Lecture 1: Fundamentals of epidemiologic study design and analysis
chance Learning impeded by two processes: Bias , Chance
Systolic Blood Pressure Intervention Trial (SPRINT)
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Chapter 6 Research Validity.
Lecture 4: Introduction to confounding (part 2)
Presenter: Wen-Ching Lan Date: 2018/03/28
By the completion of this presentation, the participant will be able to:
ERRORS, CONFOUNDING, and INTERACTION
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Internal Validity - The extent to which all explanations for changes in the DV between conditions have been eliminated -- other than the IV. ie(7a)
Selecting the Right Predictors
The objective of this lecture is to know the role of random error (chance) in factor-outcome relation and the types of systematic errors (Bias)
Measures of Disease Occurrence
HEC508 Applied Epidemiology
Counterfactual models Time dependent confounding
Research Techniques Made Simple: Interpreting Measures of Association in Clinical Research Michelle Roberts PhD,1,2 Sepideh Ashrafzadeh,1,2 Maryam Asgari.
Dr Luis E Cuevas – LSTM Julia Critchley
Enhancing Causal Inference in Observational Studies
Enhancing Causal Inference in Observational Studies
Chapter 3 Hernán & Robins Observational Studies
Presentation transcript:

Threats to validity in observational studies Jay S. Kaufman, PhD McGill University, Montreal QC 25 February 2016 11:05 AM – 11:45 AM National Academy of Sciences 2101 Constitution Ave NW, Washington, DC 20418 USA

Statistical models are used to estimate relationships between variables in observational data sets. Y X 1 β0 β1 Y X

Three main inferential targets of these models: 1) Real world in the present 2) Real world in the future 3) Hypothetical world in the future The inferential target determines the adjustment strategy. Most people here are interested in 3) surveillance, descriptive study clinical prediction model causal inference, etiologic study

You observe: Pr(Y|X=x) You want to know: Pr(Y|SET[X=x]) If you are trying to estimate the causal effect of a treatment, your job is to PREDICT what would happen in the FUTURE if you did thing A compared to what would happen if you did thing B. To do this from observational data, you must often adjust statistically for factors that are associated with the treatment and the outcome. You observe: Pr(Y|X=x) You want to know: Pr(Y|SET[X=x]) This is the intervention you want to know about, but unfortunately you don’t really get to “SET” anything.

The adjustment tradition in statistics exists to link these two quantities: Z Pr(Y|X=x)  Pr(Y|SET[X=x]) X Y BUT! ΣPr(Y|X=x, Z=z)Pr(Z=z) = Pr(Y|SET[X=x])

Read: Pr(Y|SET[X=x]) as: Pr(Y|SET[X=x1]) versus Pr(Y|SET[X=x2]) x1 and x2 are the levels at which you intervene to set the treatment; contrast is usually a difference or ratio. Causal inference from passively observed data requires not just structural identification, but also: positivity (there are sufficient data available on the treatment and outcome in the range of interest) consistency (the way that people came to be treated in the data set is comparable to the way that you plan to treat them in your intervention) correct specification of statistical models

Z X Y Y X Z X* Y* U X Y Three main structural threats to validity: Confounding Bias Selection Bias Information Bias Z X Y Y X Z X* Y* U X Y

If you didn’t get to Z by one pathway, you are more EPIB 648 Kaufman 5 Sept 2013 If you didn’t get to Z by one pathway, you are more likely to have gotten there via the other pathway: A Z B π A B Z is lost weight. If you didn’t get here by caloric reduction (X), you are more likely to have gotten there from exercise (Y) A Z π A B | Z B Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology 2004 Sep;15(5):615-25.

If you didn’t get to Z by one pathway, you are more EPIB 648 Kaufman 5 Sept 2013 If you didn’t get to Z by one pathway, you are more likely to have gotten there via the other pathway: smoking Clinical Diagnosis genetic mutation Z is lost weight. If you didn’t get here by caloric reduction (X), you are more likely to have gotten there from exercise (Y) smoking Clinical Diagnosis genetic mutation Cole SR, et al. Illustrating bias due to conditioning on a collider. Int J Epidemiol. 2010 Apr;39(2):417-20.

Banack HR, Kaufman JS. The "obesity paradox" explained Banack HR, Kaufman JS. The "obesity paradox" explained. Epidemiology 2013; 24: 461-2. Banack HR, Kaufman JS. The obesity paradox: understanding the effect of obesity on mortality among individuals with CVD. Prev Med 2014; 62: 96-102. Banack HR, Kaufman JS. Does selection bias explain the obesity paradox among individuals with cardiovascular disease? Ann Epidemiol 2015 May;25(5):342-9. Lajous M, Banack HR, Kaufman JS, Hernán MA. Should patients with chronic disease be told to gain weight? Am J Med 2015;128(4):334-6.

Canto et al. JAMA 2011;306(19):2120-2127.

5 major CHD risk factors: Hypertension Smoking Dyslipidemia Diabetes Family hx of CHD 25% of original cohort

mortality is a collider, making risk factors associated with Pre-hospital MI mortality is a collider, making risk factors associated with every U among those selected (S). Effect estimators will thus be biased if any U is not controlled. So, in contrast to previous descriptions, bias will exist even if MI hospitalization is not confounded. Risk Factors MI Hospital- ization Post Hosp Mortality Pre-Hosp S U Flanders WD, et al. A Nearly Unavoidable Mechanism for Collider Bias with Index-Event Studies. Epidemiology 2014 Sep;25(5):762-4.

Therefore, selection bias results in this example from: Recruiting MI hospitalized patients into the study when there are common unmeasured causes of MI hospitalization and mortality Removal of the frailest people via pre-hospital mortality (maybe around 30%?) Removal of 75% of the hospitalized cohort with prior CVD diagnosis or transfer within 30 days This is easily enough to produce a paradoxical reverse association in which the risk factors erroneously appear protective, even if there is no individual whose risk is lowered by the presence of one of these factors.

Some additional selection bias structures: Treatment Censoring Death Symptoms U Unmeasured variable U represents underlying disease severity, and those with more severe disease have a greater risk of death. Patients with more severe disease are more likely to be censored because they are unwell. Patients receiving treatment are at a greater risk of experiencing side effects, which also lead to drop-out.

Some additional selection bias structures: EPIB 648 Kaufman 5 Sept 2013 Some additional selection bias structures: Treatment Symptoms U Censoring Death In this variation of the previous structure, treatment and underlying severity both affect symptoms, which in turn affects drop-out. The censoring as a function of symptoms, which is affected by both treatment and U, creates the same conditional dependency.

Other mechanisms of selection bias: • Differential loss to follow-up, also known as “informative censoring” • Missing data bias, nonresponse bias: Censoring can represent missing data on the outcome for any reason, not just as a result of loss to follow up. • Healthy worker bias: Effect of an occupational chemical in a factory. Unmeasured illness is predictive of death and of missing work, but only subjects at work are recruited. • Self-selection bias, volunteer bias • Selection affected by treatment received before study entry (left-truncation)

Survival Produces an Unavoidable Selection Bias: EPIB 648 Kaufman 5 Sept 2013 Survival Produces an Unavoidable Selection Bias: Start out with a randomized trial so that all covariates are balanced at time 0. Once events occur, if you condition your estimate on having survived to the next time point, every other cause of disease must now be correlated with exposure. Genetic Variant Genetic Variant = 0 = ? Death Death Treat ment Treat ment time 1 time 2 Flanders WD, Klein M. Properties of 2 counterfactual effect definitions of a point exposure. Epidemiology 2007; 18(4):453-60.

This is exactly why the HAZARD RATIO (the parameter estimated by a Cox Proportional Hazards Model) should not be used (unless the outcome is rare): The hazard of death at time 1 is the probability of dying at time 1. But the hazard at time 2 is the probability of dying at time 2 among those who survived past time 1: Treated survivors of time 1 differ in their distribution of U compared to untreated survivors of time 1, making this conditional measure confounded by U in a way that a marginal measure is not. This concern applies to both observational studies and randomized experiments. Treatment Y1 Y2 U

Why do we continue to base inference on so many confusing studies that use highly selected samples, such as diagnosed patients? There is a simple design concept to avoid this mess…

An important step in eliminating “obesity paradox” and similar selection biases is just to ensure that the start of exposure and the start of follow-up coincide. That is exactly how we analyze randomized clinical trials: Nobody would ever propose an RCT that would select individuals free of disease 5 years after randomization and then compare the disease incidence between arms only from that point forward.

A simple rule of ensuring that the start of follow-up and initiation of treatment coincide is natural in RCTs, but often overlooked when analyzing observational studies. For example, widespread confusion about the cardiovascular effects of hormone therapy resulted from observational analyses that effectively ignored the first few years of follow-up by comparing prevalent users versus never users. Admittedly, this rule is hard to apply to exposures like obesity that lack a clear onset, but should be very clear for medical and pharmacological interventions. Hernán MA, Robins JM. Observational Studies Analyzed Like Randomized Experiments: Best of Both Worlds. Epidemiology 2008;19:789-92.

Then estimate risk of outcome at each follow-up time, without conditioning on survival up to that point (just comparing to the baseline denominator) Causal effect estimate is the difference between covariate-standardized survival curves at time t Hernán MA, The hazards of hazard ratios. Epidemiology 2010;21(1):13-5.

C X Y Z C X Y To the extent that confounding and selection bias are due to measured covariates C, these can be handled by inverse weighting (IPTW, IPCW) This is especially convenient for longitudinal data in which the confounder C may be effected by previous treatment Xt and may in turn influence the next dose of treatment Xt+1. It is also helpful in the longitudinal setting where the remaining cohort at each time t becomes increasing selected. Reweighting the cohort by measured characteristics allows remaining subjects to proxy for the ones that are missing. C X Y Z C X Y

Summary: Models are used to parameterize associations between treatment and response variables. Often, we want to interpret these associations causally (i.e. predicting the change in Y that would occur under some specific intervention on X). The validity of this causal interpretation is threatened by systematic and random errors. The systematic errors include confounding bias, which get a lot of attention in training and practice. Information bias and selection bias are other important sources of systematic error, and should be considered more frequently and thoughtfully in design and analysis.