Download presentation
1
Causal Diagrams for Epidemiological Research
Eyal Shahar, MD, MPH Professor Division of Epidemiology & Biostatistics Mel and Enid Zuckerman College of Public Health The University of Arizona
2
What is it and why does it matter?
A tool (method) that: clarifies our wordy or vague causal thoughts about the research topic helps us to decide which covariates should enter the statistical model—and which should not unifies our understanding of confounding bias, selection bias, and information bias
3
What is the key question in a non-randomized study?
When estimating the effect of E (“exposure”) on D (“disease”), what should we adjust for? or Confounder selection strategy
4
Adjusting for Confounders Common Practice
The “change-in-estimate” method List “potential confounders” Adjust for (condition on) potential confounders Compare adjusted estimate to crude estimate (or “fully adjusted” to “partially adjusted”) Decide whether “potential confounders” were “real confounders” Decide how much confounding existed Premise: The data informs us about confounding. Are we asking too much from the data?
5
Adjusting for Confounders Common Practice
What is “a potential confounder”? Typically, “a cause of the disease that is associated with the exposure” Confounder E D What is the effect of a confounder? Contributes to the crude (observed, marginal) association between E and D
6
Adjusting for Confounders Common Practice
Extension to multiple confounders C1 C2 C3 E D E D E D C4 C5 C6 E D E D E D
7
Adjusting for Confounders Common Practice Problems
A sequence of isolated, independent, causal diagrams but C1, C2, C3, C4, C5,.. might be connected causally Unidirectional arrow = a causal direction but what is the meaning of the bidirectional arrow? Even with a single confounder, the “change-in-estimate” method could fail
8
Adjusting for Confounders Problems
An example where the “change-in-estimate” method fails U1 U2 C E D The crude estimate may be closer to the truth than the C-adjusted estimate To be explained
9
Alternative A Causal Diagram
A method for selecting covariates Extension of the confounder triangle Premises displayed in the diagram New terms: Path Collider on a path Confounding path
10
Selected references Pearl J. Causality: models, reasoning, and inference Cambridge University Press Greenland S et al. Causal diagrams for epidemiologic research. Epidemiology 1999;10:37-48 Robins JM. Data, design, and background knowledge in etiologic inference. Epidemiology 2001;11: Hernan MA et al. A structural approach to selection bias. Epidemiology 2004;15: Shahar E. Causal diagrams for encoding and evaluation of information bias. J Eval Clin Pract (forthcoming)
11
A Causal Diagram Notation and Terms
An arrow=causal direction between two variables E D An arrow could abbreviate both direct and indirect effects U1 E D E D could summarize U2 U3
12
A Causal Diagram Notation and Terms
A path between E and D: any sequence of causal arrows that connects E to D E D E U1 U2 D E U1 U2 D E U1 U2 D
13
A Causal Diagram Notation and Terms
Circularity (self-causation) does not exist: Directed Acyclic Graph E U1 D U2 A collider on the path between E and D E U1 U2 D E and U2 collide at U1
14
A Causal Diagram Notation and Terms
A confounding path for the effect of E on D: Any path between E and D that meets the following criteria: The arrow next to E points to E There are no colliders on the path C U1 V1 U2 V2 U3 E D In short: a path showing a common cause of E and D
15
C U1 V1 U2 C V2 U3 U1 V1 E D U2 V2 C U3 U1 V1 E D U2 V2 U3 E D
The paths below are NOT confounding paths for the effect of E on D C U1 V1 U2 C V2 U3 U1 V1 E D U2 V2 C U3 U1 E V1 D U2 V2 U3 E D
16
What can affect the association between E and D
What can affect the association between E and D? (Why do we observe an association between two variables?) Causal path: E causes D Causal path: D causes E Confounding paths Adjustment for colliders on a path from E to D E D D E C E D Later…
17
Why does a confounding path affect the crude (marginal) association between E and D?
Intuitively: Association= being able to “guess” the value of one variable (D) from the value of another (E) ED allows us to guess D from E (and E from D) A confounding path allows for sequential guesses along the path C U1 V1 U2 V2 U3 E D
18
How can we block a confounding path between E and D?
Condition on a variable on the path (on any variable) Methods for conditioning Restriction Stratification Regression C U1 V1 U2 V2 U3 E D
19
A point to remember C U1 V1 U2 V2 U3 E D
We don’t need to adjust for confounders (the top of the triangle.) Adjustment for any U or V below will do. U and V are surrogates for the confounder C C U1 V1 U2 V2 U3 E D
20
Example If the diagram below corresponds to reality, then we have several options for conditioning For example: On C and U2 Only on U2 Only on U3 C U1 V1 U2 V2 U3 E D
21
What can affect the association between E and D?
Causal path: E causes D Causal path: D causes E Confounding paths Adjustment for colliders on a path from E to D E D D E C E D NOW!
22
Conditioning on a Collider A Trap
A collider may be viewed as the opposite of a confounder Collider and confounder are symmetrical entities, like matter and anti-matter Confounder Collider C U1 V1 U2 V2 U3 E D
23
Conditioning on a Collider A Trap
A path from E to D that contains a collider is NOT a confounding path. There is no transfer of “guesses” across a collider. A path from E to D that contains a collider does NOT generate an association between E and D Conditioning on the collider, however, will turn that path into a confounding path. Why?
24
Conditioning on a Collider A Trap
V1 U1 U2 V2 U3 E D The horizontal line indicates an association (the possibility of “guesses”) that was induced by conditioning on a collider
25
Properties of a Collider Intuitive Explanation
A dataset contains three variables for N cars: Brake condition (good/bad) Street condition in the owner’s town (good/bad) Involved in an accident in the owner’s town? (yes/no) Brake condition (good, bad) Accident (yes, no) Street condition (good, bad) Accident is a collider. Brake condition and street condition are not associated in the dataset. We cannot use the data to guess one from the other.
26
Properties of a Collider Intuitive Explanation
Why can’t we make a guess from the data? Let’s try. Suppose we are told: Car A has good brakes and car B has bad brakes. This information tells us nothing about the street condition in each owner’s town. Brake condition Street condition Car A Good ? Car B Bad Intuition: a common effect (collider) does not induce an association between its causes (colliding variables)
27
Properties of a Collider Intuitive Explanation
If, however, we condition (stratify) on the collider “accident”, we can make some guesses about the street condition from the brake condition. Stratum #1 Accident = yes Brake condition Accident Street condition Car A Good Yes Bad (a guess) Car B ?
28
Properties of a Collider Intuitive Explanation
Similarly, in the other stratum Stratum #2 Accident = no Brake condition Accident Street condition Car A Good No (a guess) Car B Bad ?
29
Properties of a Collider
In summary: Conditioning on a collider creates an association between the colliding variables and, therefore, may open a confounding path Before conditioning on C After conditioning on C U1 U2 U1 U2 C C E D E D
30
Derivations The “change-in-estimate” method could fail if we condition on colliders, and thereby open confounding paths To (rationally) select covariates for adjustment, we must commit to a causal diagram (premises) (But we often say that we don’t know and can’t commit, and hope that the change-in-estimate method will work.) Causal inference, like all scientific inference, is conditional on premises (which may be false)—not on ignorance
31
Derivations U1 U2 U1 U2 C C E D E D
Do not condition on colliders, if possible If you condition on a collider, Connect the colliding variables by a line Check if you opened a new confounding path Condition on another variable to block that new path Conditioning on C alone Conditioning on C and (U1 or U2) U1 U2 U1 U2 C C E D E D
32
Practical advice Study one exposure at a time
A model that may be good for exposure A might not be good for exposure B (even if B is in the model) Never adjust for an effect of the exposure Never adjust for an effect of the disease Never select covariates by stepwise regression Never look at p-values to decide on confounding (actually, never look at p-values…)
33
Extension to other problems of causal inquiry
Causation always remains uncertain, even if we deal with a single confounder Unbeknown to us the reality happens to be We draw U1 U2 C C E D E D And naively condition on C And our adjustment may fail
34
Extension to other problems of causal inquiry
Estimating the “direct” effect by conditioning on an intermediary variable, I I E D We should remember that variable I may be a collider D U E I D U I E
35
Extension to other problems of causal inquiry
Causal diagrams explain the mechanism of selection bias Example: What happens if we estimate the effect of marital status on dementia in a sample of nursing home residents? Assume: no effect both variables affect “place of residence” (home, or nursing home)
36
Extension to other problems of causal inquiry
Marital status Dementia Place of residence (home, nursing home) By studying a sample of nursing home residents, we are conditioning on a collider (on a “sampling collider”) and might create an association between marital status and dementia in that stratum
37
Extension to other problems of causal inquiry
Marital status Dementia Place of residence (home, nursing home) “Stratification” Home Nursing home Marital status Dementia
38
Extensions: control selection bias (Source: Hernan et al, Epidemiology 2004)
Source cohort: no effect Estrogen MI DS because disease status affects selection. Diseased members of the cohort are over-sampled (cases) relative to non-diseased (controls) S (0,1) Selection into a case-control sample S=1 E D S (0,1) Estrogen MI Suppose: F is hip fracture F Suppose: EF Suppose: Controls preferentially selected from women with hip fracture
39
Extensions: control selection bias (Source: Hernan et al, Epidemiology 2004)
Estrogen MI E D F S (0,1) S=0 (remainder of the source cohort) S=1 (our case-control sample) HRT MI E D Association of E and D was created
40
Extensions: information bias (LAST EXAMPLE)
D Estrogen use Endometrial cancer D* Diagnosed endometrial cancer ? Z Frequency of exams Vaginal bleeding
41
Summary Points The “change-in-estimate” method could fail if we condition on colliders, and thereby open confounding paths The theory of causal diagrams extends the idea of a confounder to the multi-confounder case Unification of confounding bias, selection bias, and information bias under a single theoretical framework
42
“Back-door algorithm”
Sufficient set for adjustment Minimally sufficient set Differential losses to follow-up Time-dependent confounders Interpretation of hazard ratios Conditioning on a common effect always induced an association between its causes, but this association could be restricted to some levels of the common effect
43
? Age (young, old) Smoking drive (low, high) Sex Physical activity
Asthma (yes, no) Smoking status ? FEV1
44
? Age (young, old) Smoking drive (low, high) Sex Physical activity
Asthma (yes, no) Smoking status ? FEV1
45
? Age (young, old) Smoking drive (low, high) Sex Physical activity
Asthma (yes, no) Smoking status ? FEV1
46
? ? Pneumonia Ulcer Hospitalization Status hospitalized
not hospitalized Coughing ? Abdominal Pain Stratification other patients hospitalized patients Pneumonia Ulcer Coughing ? Abdominal Pain
47
Example: Do men have higher systolic blood pressure than women
Example: Do men have higher systolic blood pressure than women? (In other words: estimate the gender effect on systolic blood pressure) The following table summarizes the answer to this question from two regression models Association with gender Mean SBP in men Mean SBP in women Mean difference (coefficient) Inference Marginal (crude) 123.8 mmHg 122.1 mmHg +1.7 mmHg Men’s SBP is higher Conditional on (“adjusted for”) BMI and WHR 121.2 mmHg 124.3 mmHg -3.1 mmHg Women’s SBP is higher So, which is the true estimate and which is biased?
48
WHR Gender SBP BMI Z1 Z2 .
49
U WHR Gender SBP BMI Z1 Z2 .
50
U WHR Gender SBP BMI Z1 Z2 .
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.