Sample Actual vote Gallup Poll* *Based on a national survey conducted Oct in likely voters N = 2263 N = 125,040,818 Obama McCain Other 50% 46% 4% 53% 46% 1% Which was present in the sample? Selection bias Measurement bias Sampling error Perhaps we should conduct our elections by sampling?
Confounding and Interaction I Confounding: one of the central problems in observational clinical research –What is it? What does it do? What is its origin? –What kind of variables act as confounders? –Which variables are not confounders (colliders and intermediary variables)? –Use of causal diagrams (DAGs) to conceptualize confounding and guide us for what to adjust for »Emphasis on specifying the research question and understanding the underlying biological/clinical/behavioral system »Confounding is a substantive, not statistical issue
Matches and Lung Cancer A tobacco company researcher believes that exposure to matches is a cause of lung cancer He conducts a large case-control study to test this hypothesis Exposure odds ratio = (820/180) / (340/660) = disease odds ratio OR = 8.8 95% CI (7.2, 10.9) Should we remove matches from the environment?
Smoking, Matches, and Lung Cancer Stratified Crude Non-SmokersSmokers OR crude OR CF+ = OR smokers OR CF- = OR non - smokers Stratification produces two 2-by-2 tables In each stratum, all subjects are homogeneous with respect to smoking We have adjusted or controlled for smoking OR crude = 8.8 (7.2, 10.9) OR smokers = 1.0 (0.6, 1.5) OR non-smoker = 1.0 (0.5, 2.0) OR adjusted = 1.0 (0.5, 2.0)
Confounding: Smoking, Matches, and Lung Cancer Illustrates how confounding can create an apparent effect even when there is no actual true effect –Can also be opposite: confounding can mask an effect when one is truly present Proper terminology –In the relationship between matches and lung cancer, smoking is a confounding factor or a confounder –Smoking confounds the relationship between matches and lung cancer In clinical research, confounding has a very specific meaning
Estes continues to be confounding puzzle Ray RATTO Ray RATTO SHAWN ESTES seemed loath to analyze his own performance last night, for fear that people would see the first three innings and use them to obscure the last four. But that's what made his outing so perfectly Estes-like -- an ongoing argument with himself that he eventually won. Well, an argument in which he held his own and his teammates won for him in the bottom of the ninth. Ramon Martinez lined a game-tying single with two outs, and Jeff Kent followed two batters later with a bases- loaded walk off Juan Acevedo to give the Giants a 2-1 victory against Colorado and move them to within 4 1/2 games of division leader Arizona. It was in many ways an eye-opening victory for a team that hadn't had one of this type for a while.
Finding: “After an initial course of post-exposure prophylactic (PEP) medication following a sexual exposure to HIV infection, gay men reported a decrease in the practice of high-risk behavior over the following year.” Reviewer: “Perhaps the men simply withheld the real amount of high-risk behavior they had in order to be eligible for future courses of PEP. How do you account for this confounding?”
Stratified Crude Matches Absent Matches Present OR crude OR CF+ = OR matches OR CF+ = OR no matches OR crude = 21.0 (16.4, 26.9) OR matches = 21.0 (10.7, 41.3) OR no matches = 21.0 (13.1, 33.6) The study is not over! To be complete, you decide to examine the relationship between smoking and lung cancer independent from the use of matches.
Confounding: Smoking, Matches, and Lung Cancer Interpretation? What is the effect of matches on the relationship between smoking and lung cancer? Matches have no effect on the relationship Effect of matches could have been predicted based on matches — lung cancer relationship –Illustrates one important component in the requirements of a confounder (aka a confounding factor) - A confounder must be causally related to the outcome
Confounding: Examples of Magnitude and Direction Stratified (adjusted) Crude (unadjusted) Potential Confounder Absent Potential Confounder Present OR crude OR CF+ OR CF-
Nightlights Let there be light!
Nightlights and Myopia Quinn et al. Nature 1999 Prevalence Ratio =
Insert picture with nightlight off Lights are off and the stumbling around begins.
Nightlights and Myopia Two subsequent studies found no association and explained the prior result by confounding –Zadnik et al. and Gwiazda et al. Nature, 2000
Child’s Myopia Night Light ? ? How might confounding account for this finding?
Child’s Myopia Night Light Parental Myopia X X Positive or negative confounding? Positive
Insert picture with nightlight on again Let there be light (again)!
AZT to Prevent HIV After Needlesticks Case-control study of whether post-exposure AZT use can prevent HIV seroconversion after needlestick (NEJM 1997) Crude OR crude = 0.61 (95% CI: ) Interpretation? Could confounding be present? Interpretation? Could confounding be present?
HIV AZT Use ? ?
HIV AZT Use Severity of Exposure ? ? Positive or negative confounding?
Adjustment for Confounder Potential confounder: severity of exposure Minor Severity Major Severity Crude Stratified OR crude =0.61 OR adjusted = 0.30 (95% CI: 0.12 – 0.79) Negative Confounding Confounding by Indication
Classification Schemes for Error in Clinical Research Szklo and Nieto –Bias »Selection Bias »Information/Measurement Bias –Confounding –Chance Other Common Approach –Bias »Selection Bias »Information/Measurement Bias »Confounding –Chance
Counterfactuals: Conceptualizing Why Confounding Occurs Night lights and myopia Ideal study: evaluate children exposed to night lights for several years and directly compare them to the SAME children not exposed to night lights –Result (e.g. risk ratio) is called the “Effect Measure” of night lights –Assuming no measurement error, the “effect measure” must be true. However, since time has passed and children are older it is impossible to assess them without night lights Hence, the ideal is “counterfactual” – contrary to the fact. It is unobservable. It cannot happen. Exposed to night lights Exposed to night lights Unexposed to night lights Unexposed to night lights time
Counterfactuals: Conceptualizing Why Confounding Occurs Gender and heart disease Ideal study: evaluate men for several years for occurrence of heart disease; compare them directly to SAME individuals who are now women However, you cannot change a man into a woman and you cannot go back in time The “effect measure” is preposterous. It cannot be observed. It is counterfactual. men women time
Counterfactuals: Conceptualizing Why Confounding Occurs Nights and Myopia Because we cannot perform the counterfactual ideal (SAME population studied under 2 conditions), we must evaluate TWO distinct populations (exposed to a night light and unexposed) to study the problem –Result (e.g. risk ratio): a “measure of association” The TWO distinct populations may be subject to different influences OTHER than just the night light If these influences cause the disease under study, any difference in the risk ratio between the SAME population study (effect measure) and the TWO population study (measure of association) is what is known as confounding Confounding occurs because of a mixing of effects Exposed to night lights Exposed to night lights Unexposed to night lights Unexposed to night lights time Other influences Other influences
Striving for the Counterfactual In the real (observable) world All of our strategies in analytic studies are striving to simulate the counterfactual We strive for our TWO distinct populations (exposed and unexposed) to be “exchangeable” Whenever the TWO distinct populations are “non- exchangeable”, confounding will be present Our strategies to manage confounding are attempts to make our populations exchangeable
Back to the Observable (Factual) World: Criteria for Confounding Confounding occurs because of mixing between exposures of interest and unwanted extraneous effects Extraneous effects are termed confounders Criteria for a confounder –Must be causally associated with the outcome, or be a surrogate for a causally related variable –Must be associated with the exposure under study, but cannot be caused by the exposure –Must not be on the causal pathway under study (i.e., must not be an intermediary variable)
C C ? ? E E D D Causal Diagrams -- DAGs DAGs = directed acyclic graphs; aka chain graphs Consist of nodes (variables) and arrows “Directed”: all arrows have one-way direction and depict causal relationships “Acyclic”: there is never a complete circle (i.e. no factor can cause itself) Better than the rough criteria for confounding when planning studies and analyses Identifies pitfalls of adjusting and not adjusting for certain variables Frontier of epidemiologic theory Research Question: Does E cause D? Forces investigator to conceptualize system
C C ? ? E E D D Confounding in a DAG Confounding occurs if there is a factor C that is a “Common Cause” of both E and D C is the genesis of a “backdoor path” to E and D Adjusting/controlling for C closes the backdoor paths; eliminates confounding
Lung Cancer Matches Smoking ? ? Smoking is a “common cause” of matches and lung cancer. It therefore confounds the relationship (positive CF) Controlling for smoking blocks the paths and unconfounds relationship Smoking is a “common cause” of matches and lung cancer. It therefore confounds the relationship (positive CF) Controlling for smoking blocks the paths and unconfounds relationship RQ: Do matches cause lung cancer?
Birth Defects Multi- vitamin Use History of birth defects ? ? Genetic factor is the “common cause” but cannot be measured or adjusted for Genetic Factor (not measured) Adjusting for history of birth defects, which can be measured, blocks the path between genetic factor and MVI use, and prevents confounding Threat: negative confounding Hernan AJE 2002
Serious Head Injury Use of Helmets in Motorcyclists Safety- oriented Personality (not measured) ? ? Safe Driving Practices Threat: positive confounding Adjusting for safe driving practices, which can (theoretically) be measured, blocks path from safety- oriented personality to head injury
Attraction of DAGs Abstract: The Criteria –Must be causally associated with the outcome, or be a surrogate for a causally related variable –Must be associated with the exposure under study, but cannot be caused by the exposure –Must not be on the causal pathway under study (i.e. must not be an intermediary variable) More tangible: DAGs –Draw the system –Look for “common causes” of exposure and disease Birth Defects Multi- vitamin Use ? ? Genetic Factor (not measured) History of birth defects
The Challenge DAGs provide the framework However, to identify the confounders, you need to be a subject matter expert
Sexual Activity ? Mortality RQ: Does sexual activity cause greater lifespan?
Self- reported General Health Unknown biologic factor(s) (not measured) Sexual Activity ? Mortality RQ: Does sexual activity cause greater lifespan?
Ca channel Blockers GI Bleeding ? RQ: Do Calcium channel blockers cause GI bleeding?
Coronary Artery Disease Other Meds (e.g., aspirin) Ca channel Blockers GI Bleeding ? RQ: Do Calcium channel blockers cause GI bleeding?
Birth Defects Folate Intake ? ? What should we do with stillbirths (spontaneous abortions)? RQ: Does lack of folate cause birth defects? Stillbirths are associated with folate intake, even among infants without birth defects: OR = 0.50 Stillbirths are associated with birth detects: OR = Stillbirths are not on the causal pathway between folate and birth defects In the past, other investigators have commonly adjusted for stillbirths in analyses, or have limited analyses to live births. Should we adjust for stillbirths here? Slone Epidemiology Unit Birth Defects Study Hernan AJE 2002
Adjustment for Stillbirths Stillbirth No stillbirth Crude Stratified OR crude = 0.65 (95% CI 0.45 – 0.95) OR adjusted = 0.80 (95% CI: 0.53 – 1.2) Apparent positive confounding Public health implication: No reason for women to supplement diet with folate Slone Epidemiology Unit Birth Defects Study Hernan AJE 2002
Birth Defects Folate Intake Stillbirths ? ? RQ: Does lack of folate intake cause birth defects? Use of DAGs to Identify What is Not Confounding Stillbirths are a “common effect” of both the exposure and disease – not a common cause. Common effects are called “colliders” Adjusting for colliders OPENS paths. Will actually result in bias. It is harmful. Stillbirths are a “common effect” of both the exposure and disease – not a common cause. Common effects are called “colliders” Adjusting for colliders OPENS paths. Will actually result in bias. It is harmful. Hernan AJE 2002
Birth Defects Multi- vitamin use Maternal Weight Gain ? ? No common causes for exposure and disease DAGs to Identify What is Not Confounding Maternal weight gain is a collider Adjusting for colliders will OPEN the path. Will actually result in bias. It is harmful. Maternal weight gain is a collider Adjusting for colliders will OPEN the path. Will actually result in bias. It is harmful. Behavioral factors (not measured) Genetic Factor (not measured) Hernan AJE 2002
DAGs Force Investigators to First Conceptualize the System Study of sunlight exposure & melanoma A college intern is given a dataset and asked to estimate relationship between sunlight exposure and melanoma – adjusted for “everything” He analyzes the data and finds that gum chewing is associated with melanoma and associated with sunlight exposure After adjusting for gum chewing there is an appreciable difference between the crude and adjusted measure of association Should gum chewing be controlled for? No. Just by chance alone there can be the appearance of confounding Based on our a priori understanding of the role of gum chewing (in melanoma), it is more likely that chance – as opposed to truth -- is causing appearance of confounding Controlling for a variable should only be done if there is a strong subject matter evidence. i.e. If it is not in your DAG, don’t control for it.
Rules for Reading DAGs A path is blocked if –a collider (“common effect”) is present, which has not been adjusted for (by stratification, mathematical regression or other techniques) Or –a non-collider (“common cause”) is adjusted for To prevent confounding, block all of the paths Folate Birth defects Stillbirths ? ? Nightlights Child’s Myopia Parental Myopia ? ?
Rules for Reading DAGs A path is open if –A collider (“common effect”) is adjusted for Or –a non-collider (“common cause”) is not adjusted for Open paths produce bias Folate Birth defects Stillbirths ? ? Nightlights Child’s Myopia Parental Myopia ? ?
What other variables are NOT Confounders? “Must not be on the causal pathway under study (i.e. must not be an intermediary variable)” A variable that you are conceiving as an intermediate step in the causal path under study between the exposure in question and the disease is not a confounding variable. E E D D factor I Despite being associated with both exposure and outcome, Factor I is not a confounder It is on the pathway under study. It is an intermediary variable Despite being associated with both exposure and outcome, Factor I is not a confounder It is on the pathway under study. It is an intermediary variable
CCR5 and HIV Disease Progression CCR5 (receptor) defect AIDS How should CD4 count be handled in assessing the association between CCR5 defect status and progression in HIV disease to AIDS? ? ? CCR5: the human cellular receptor for HIV –found on CD4 cells Genetic defects in CCR5 now described CD4 count potent predictor of time-to-AIDS CCR5: the human cellular receptor for HIV –found on CD4 cells Genetic defects in CCR5 now described CD4 count potent predictor of time-to-AIDS CD4 count
CCR5 and HIV Disease Progression CCR5 (receptor) defect AIDS How should CD4 count be handled in assessing the association between CCR5 defect status and progression in HIV disease to AIDS? CD4 count CCR5: the human cellular receptor for HIV –found on CD4 cells Genetic defects in CCR5 now described CD4 count potent predictor of time-to-AIDS CCR5: the human cellular receptor for HIV –found on CD4 cells Genetic defects in CCR5 now described CD4 count potent predictor of time-to-AIDS
It depends upon the research question CCR5 defect ? [Other mechanisms] ? [CD4 count] AIDS #1: Do CCR5 defects reduce progression to AIDS, irrespective of mechanism? CCR5 defect Low CD4 count AIDS Do not adjust for CD4 count ! High CD4 count CD4 count Do Adjust ! #2: Do CCR5 defects reduce progression to AIDS, independent of CD4 count?
RQ 1: What if you did adjust for CD4 count? CCR5 defect AIDS #1: Is CCR5 associated with progression to AIDS, irrespective of mechanism? Low CD4 count High CD4 count If “via CD4 count” was only pathway, no effect for CCR5 would be observed after stratification ? [CD4 count]
Taylor et al. JAIDS 2003 CCR5 defect Other mechanism #2 ? ? CD4 count AIDS #1 CCR5 defect ? ? AIDS CD4 not adjusted for CD4 count CD4 adjusted for Crude (unadjusted) association: - rate ratio: 0.71 Crude (unadjusted) association: - rate ratio: 0.71 Stratified (adjusted) by CD4 count -rate ratio: 0.93; -Conclude: no mechanism other than via CD4 Stratified (adjusted) by CD4 count -rate ratio: 0.93; -Conclude: no mechanism other than via CD4
Hep B and C virus infection Hep B/C are not “common causes” but they do form another extraneous path from IDU to mortality; adjusting for Hep B/C blocks the path IDU Early Mortality ? [via bacterial infections] RQ: Does injection drug use (IDU) cause earlier mortality independent of its effect on hepatitis infections?
Poor Diet Poverty Mortality ? [access to care] RQ: Does poverty cause early mortality independent of effects on diet? Adjust for diet to remove the extraneous pathway
Exercise and Coronary Heart Disease When evaluating the relationship between exercise and CHD, what should be done with HDL cholesterol? Exercise Coronary Heart Disease ? ? HDL cholesterol RQ: Does exercise prevent coronary heart disease?
It depends on the pathway under investigation If interest is in a pathway other than through HDL, then HDL should be adjusted for Termed the “direct effect, independent of HDL” Exercise CAD [not yet specified mechanism] HDL ? ? Adjust for HDL to remove the extraneous pathway
Exercise and CAD If no particular mechanistic pathway is being studied e.g., Does exercise influence CAD risk in a newly studied population (elderly Asians)? Here, HDL as well as a variety of other mechanistic explanations are on the pathway in question Therefore, HDL is an intermediary variable. Exercise CAD Do not adjust for HDL [HDL..+.. other mechanisms]
DAGs point out special issue when estimating direct effects RQ: Does aspirin prevent CHD in a pathway other than through platelet aggregation –Assumes no common cause of platelet agg. and D Would be correct to adjust But if –Assume common cause (e.g., genetic component) –Need other statistical methods to resolve Aspirin Coronary Heart Disease Platelet Aggregation ? ? Aspirin Coronary Heart Disease Platelet Aggregation ? ? Genetic factors (not measured) Would be incorrect to adjust OR not to adjust for platelet aggregation Cole and Hernan IJE 2002
When Planning a Study, Which Factors Should be Measured as Potential Confounders or Extraneous Pathways? Draw a DAG With previously studied exposures-diseases: –consider/measure any factor for which prior evidence indicates is a confounder »e.g., effect of diet on CAD? must deal with smoking as potential confounder When studying new exposures for which little is known: –plan on measuring ALL factors associated with the disease –i.e. If you don’t, you may regret it later Confounding can be dealt with in the analysis phase of a study but NOT if the factor is not measured
Seeking cause of high Marin cancer rates Activists canvass residents to search for trends Thousands of volunteers scattered across Marin County under baleful skies Saturday in an unprecedented grassroots campaign against the region's soaring cancer rate. Armed with surveys, some 2,000 volunteers went door to door in every neighborhood in the county.... The volunteers hope to collect enough money to hire an epidemiologist...