Download presentation
Presentation is loading. Please wait.
1
DAGs intro with exercises 6h
DAGs = Directed Acyclic Graphs Hein Stigum courses Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. H.S. H.S. 1 1 1 1 1
2
Agenda Background DAG concepts Analyzing DAGs More on DAGs Exercises
Association, Cause Confounder, Collider Paths Analyzing DAGs Examples More on DAGs Exercises DAG concepts Define a few main concepts Paths: Surprisingly few rules needed Analyzing DAGs Examples: conf, intermediate, collider Selection bias, Information bias Rand, Mend Rand The two former: manual for use More on DAGs Deeper thoughts and problems With exercises, difficult to guess time Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. H.S. H.S. 2 2 2 2 2
3
Why causal graphs? Problem Causal graphs help:
Association measures are biased Causal graphs help: Understanding Confounding, mediation, selection bias Analysis Adjust or not Discussion Precise statement of prior assumptions Nov-18 H.S.
4
Causal versus casual CONCEPTS Nov-18 H.S.
5
Precision and validity
Measures of populations precision - random error - statistics validity - systematic error - epidemiology True value Estimate Precision Bias Ignore errors at individual level measure pop, measure sample aim: measure as precisely and as correctly as possible DAGs only consider bias H.S.
6
DAG=Directed Acyclic Graph
god-DAG Causal Graph: Node = variable Arrow = cause E=exposure, D=disease DAG=Directed Acyclic Graph Read of the DAG: Causality = arrows Associations = paths Independencies = no paths Estimations: E-D association has two parts: ED causal effect keep open ECUD bias try to close Arrows=lead to or causes Time E- exposure D- disease C, V - cofactor, variable U- unmeasured Directed= arrows Acyclic = nothing can cause itself Conditioning (Adjusting): E[C]UD Time Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. 6 6 6
7
Association and Cause Association 3 possible causal structure
Assume E precedes D in time Association: observe Cause: infer (extra knowledge) Causal structure force on the data Basic structures, may generalize with many more variables: use paths + more complicated structures Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. 7 7 7
8
Confounder idea + A common cause Adjust for smoking Smoking
Yellow fingers Smoking Lung cancer + + + Yellow fingers Lung cancer + A confounder induces an association between its effects Conditioning on a confounder removes the association Condition = (restrict, stratify, adjust) Simplest form “+” (assume monotonic effects) Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. H.S. 8 8 8 8 8
9
Two causes for selection to study
Collider idea Two causes for selection to study Selected subjects Selected Yellow fingers Selected Lung cancer + + + Yellow fingers Lung cancer or + and Conditioning on a collider induces an association between its causes “And” and “or” selection leads to different bias Simplest form “+” (assume monotonic effects) Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. H.S. 9 9 9 9 9
10
Data driven analysis C E D C C C E D E D E D
- Want the effect of E on D (E precedes D) - Observe the two associations C-E and C-D - Assume criteria dictates adjusting for C (likelihood ratio, Akaike (赤池 弘次) or 10% change in estimate) C E D The undirected graph above is compatible with three DAGs: C C C E D E D E D Hirotugu Akaike 赤池 弘次 Confounder 1. Adjust Mediator 2. Direct: adjust 3. Total: not adjust Collider 4. Not adjust Conclusion: The data driven method is correct in 2 out of 4 situations Need information from outside the data to do a proper analysis Nov-18 H.S.
11
Paths The Path of the Righteous Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. 11
Ezekiel 25:17. "The Path of the Righteous Man Is Beset on All Sides by The inequities of the Selfish and the Tyranny of Evil Men." (Pulp Fiction version) Paths Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. 11 11
12
Path definitions Path: any trail from E to D (without repeating itself) Type: causal, non-causal State: open, closed Path 1 E®D 2 E®M®D 3 E¬C®D 4 E®K¬D Four paths: Notice: path with or against the arrows Paths show potential association Goal: Keep causal paths of interest open Close all non-causal paths Nov-18 Nov-18 H.S. H.S. 12
13
Four rules 1. Causal path: ED 2. Closed path: K
(all arrows in the same direction) otherwise non-causal Before conditioning: 2. Closed path: K (closed at a collider, otherwise open) Conditioning on: 3. a non-collider closes: [M] or [C] 4. a collider opens: [K] (or a descendant of a collider) Nov-18 H.S.
14
ANALYZING DAGs Nov-18 H.S.
15
Confounding examples Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 H.S.
Informal, no strict notation/def Casual about the causal! Confounding examples Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. H.S. H.S. 15 15 15 15 15 15
16
Vitamin and birth defects
Is there a bias in the crude E-D effect? Should we adjust for C? What happens if age also has a direct effect on D? Unconditional Path Type Status 1 E®D Causal Open 2 E¬C®U®D Non-causal Bias This is an example of confounding Noncausal open=biasing path Both C and U are confounders Problem that we have ”forgotten” arrow C->D? Conditioning on C Path Type Status 1 E®D Causal Open 2 E¬[C]®U®D Non-causal Closed No bias Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. 16 16 16
17
Exercise: Physical activity and Coronary Heart Disease (CHD)
We want the total effect of Physical Activity on CHD. What should we adjust for? Noncausal open=biasing path 5 minutes Nov-18 Nov-18 Nov-18 H.S. H.S. 17 17
18
Intermediate variables
Direct and indirect effects Intermediate variables Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. H.S. H.S. 18 18 18 18 18 18
19
Exercise: Tea and depression
Write down the paths. You want the total effect of tea on depression. What would you adjust for? You want the direct effect of tea on depression. What would you adjust for? Is caffeine an intermediate variable or a confounder? Tea and depression: Finnish study Caffeine reduces depression: Nurses Health Study 10 minutes Nov-18 H.S.
20
Exercise: Statin and CHD
Write down the paths. You want the total effect of statin on CHD. What would you adjust for? If lifestyle is unmeasured, can we estimate the direct effect of statin on CHD (not mediated through cholesterol)? Is cholesterol an intermediate variable or a collider? C cholesterol U lifestyle E statin D CHD Statin: lipid (cholesterol) lowering drug 10 minutes Nov-18 H.S. H.S. 20 20
21
Direct and indirect effects
Total effect: no unmeasured U1 no unmeasured U2 U2 M U3 + E D Direct and indirect effects (linear model) no unmeasured U3 U1 Nov-18 H.S.
22
Mixed Confounder, collider and mediator Nov-18 Nov-18 Nov-18 Nov-18
H.S. H.S. H.S. H.S. H.S. 22 22 22 22 22 22
23
Diabetes and Fractures
We want the total effect of Diabetes (type 2) on fractures Conditional Path Type Status 1 E→D Causal Open 2 E→F→D 3 E→B→D 4 E←[V]→B→D Non-causal Closed 5 E←[P]→B→D Unconditional Path Type Status 1 E→D Causal Open 2 E→F→D 3 E→B→D 4 E←V→B→D Non-causal 5 E←P→B→D Questions: Paths ←→? More paths? B a collider? V and P ind? Diabetes->eye disease->fall, could have ->eye disease->physical activity-> Diabetes II reduces bone density, BMI increases bone density Questions: more paths (E-B-P-E-D)? Two (or three) arrows are colliding in B, is B a collider? Mediators Confounders Nov-18 H.S.
24
Selection bias Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S.
24 24 24 24 24 24
25
Selection bias: concept 1
age smoke CHD Properties: - Need heterogeneity of smoke effect - Can not be adjusted for - True RR=weighted average of stratum effects Name: interaction based? Nov-18 H.S.
26
Selection bias: concept 2
sex age Paths Type Status smoke®CHD Causal Open smoke¬sex®[S] ¬age®CHD Non-causal smoke CHD Properties: Does not need heterogeneity Can be adjusted for (sex and age) (Some counterintuitive results) Name: Collider stratification bias Hernan et al, A structural approach to selection bias, Epidemiology 2004 Nov-18 H.S.
27
Diabetes and Fractures
1. Population based: Path Type Status 1 E→D Causal Open 2 E→H←D Non-causal Closed OK 2. Convenience: Conduct the study among hospital patients? Path Type Status 1 E→D Causal Open 2 E→[H]←D Non-Causal 3. Homogeneous sample: Exclude hospital patients Collider stratification bias: at least on stratum is biased Nov-18 H.S.
28
Selection examples (collider stratification)
← sufficient causes for hospital “Or”-selection: diabetes fractures hospital Bias E D ← sufficient causes for hospital “And”-selection: diabetes fractures hospital Nov-18 H.S.
29
Exercise: Survivior bias
Study exposure early in life (E) on later disease (D) among survivors (S) Early exposure decreases survival A risk factor (R) increases later disease (D) and reduces survival (S) Draw and analyze the DAG 10 minutes Nov-18 Nov-18 H.S. H.S. 29
30
Milk intake and bone density
Study Milk on bone density Exclude Calcium supplements? E milk D bone density S calcium supp. Path Type Status 1 E®D Causal Open 2 E®[S]D Non-causal E milk D bone density S calcium supp. C family history 2 E®[S][C]®D Non-causal Closed Structure: Collider stratification Lessons learned: Biological effect not protected May adjust for selection bias Nov-18 H.S.
31
Examples Differential response Differential loss to follow up E D S C
alcohol D CHD S respond C education Differential response Survey: Alcohol and CHD Differential loss to follow up Randomized trial: drug and disease E drug D disease S loss to follow up C smoking Survey of alcohol on CHD: heavy drinkers do not respond, sex, age and education are common factors for response and disease. Randomized drug trial: side effects of drug causes loss, smoking common to loss and disease Cross-sectional study: dust sensitive change work, health common to work status and (less) lung disease No confounding: for simplicity (1 and 3) or randomization (2) Major selection biases: all have the collider structure! All cases could be adjusted to remove selection bias The effect found among the workers is not true! Note: no confounding Hernan et al, A structural approach to selection bias, Epidemiology 2004 Nov-18 H.S.
32
Exercise: Dust and lung disease
Background We are interested in the effect of dust and heat exposure on lung disease, and do a cross-sectional study among workers in aluminum melt halls. Assumptions Workers who are sensitive to dust and heat are more likely leave melt hall work. Subjects with general good health (genes) are more likely to keep the job, and less likely to develop lung disease. Draw and analyze the DAG What type of bias do you get? Could the ”interaction based” selection bias also operate here? 15 minutes Nov-18 H.S.
33
Information bias Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S.
33 33 33 33 33 33
34
Depicting measurement error
Error in E E=true exposure E*=measured exposure UE=process giving error in E b) Error in E and D D=true disease D*=diagnosed disease UD=process giving error in D E=fat intake, E* fat intake from questionnaire D=infarctions, D*= diagnosed infarctions a) and b) Can test H0 b) shows independent non-differential errors Nov-18 H.S.
35
Dependent errors, differential errors
a) E and D measures both temp dependent b) Alcohol in preg and malformations c) Air pollution and asthma No assumption of additive error or linear effect of E on D needed to explain concepts, but needed to estimate effect of errors Dependent errors: Temp. in lab Differential error: Recall bias in Case-Control study Differential error: Investigator bias in cohort study Hernan and Cole, Causal Diagrams and Measurement Bias, AJE 2009 Nov-18 H.S.
36
Randomized experiments Mendelian randomization
Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. H.S. H.S. 36 36 36 36 36 36
37
Strength of arrow C1 E D C2 C3 C R E D U R E D C1, C2, C3 exogenous
Not deterministic E D C2 C1, C2, C3 exogenous C3 C Randomization: Full compliance no E-D confounding full compliance R E D deterministic U Not full compliance weak E-D confounding but R-D is unconfounded not full compliance R E D Path Type Status 1 R®E®D Causal open 2 R®E¬U®D non-causal closed Sub analysis conditioning on E may lead to bias Nov-18 H.S.
38
Randomized experiments
Observational study Randomized experiment with full compliance R= randomized treatment E= actual treatment, R=E Randomized experiment with less than full compliance (c) c b If linear model: a=c*b, c<1 No way of drawing that R=E or that R has strong effect on E If R has strong effect on E then U must have weak effect, that is little confounding. ITT weaker than IV effect, combination of behavior (compliance) and biology a Intention To Treat (a): effect of R on D (unconfounded) population Per Protocol: crude effect of E on D (confounded by U) Instrumental Variable (b): adjusted effect of E on D (if c is known, 2SLS) individual Nov-18 H.S.
39
RandomizedControlledTrial example
+ - RD R 93 7 0.93 c 100 c IV ITT -0.12 0.09 D + - RD R 43 57 -0.20 ITT 62 38 𝐼𝑉 = 𝐼𝑇𝑇 𝑐 = − = −0.21 D + - RD E 41 59 -0.22 PP 63 37 Stata command for IV: ivregress (2SLS) True value Negative bias from confounding ITT always weaker than IV Nov-18 H.S.
40
Mendelian randomization
U Observational study Suffers from unmeasured confounding U Randomized trial: 3 conditions R affects E: balanced, strong effect No direct R-D effect: R independent of D | E R and D no common causes: R independent of U 3 R 1 E D 2 5% non-compliance gives RR=20 for R,E association (If full compliance, can draw the do-operator) What happens with R-D ass if compliance drops? U Medelian randomization: 3 conditions G must affect E: unbalanced, weak large N No direct G-D effect: depends on gene function G and D no common causes: Mendel’s 2. law 3 G 1 E D 2 Sheehan et al, Mendelian Randomisation and Causal Inference in Observational Epidemiology, PLoS Med 2008 Nov-18 Nov-18 H.S. H.S. 40
41
Ex: Alcohol and blood pressure
BP U Observational study Alcohol use increases blood pressure Many ”lifestyle” confounders Gene: ALDH2, 2 alleles 2,2 type suffer nausea, headache after alcohol low alcohol regardless of lifestyle (U) Medelian randomization Gene ALDH2 is highly associated with alcohol OK, gene function is known Mendel’s 2. law, no ass. to obs. confounders Result: 1,1 type BP +7.4 mmHg Alcohol increases blood pressure ALDH2: aldehyde dehydrogenase 2 Medelian randomization Gene common i Japan, RR=6-7 for alcohol use Gene no associated with age, smoking, BMI, cholesterol. Gene could be ass with coffe Gene function does not affect blood pressure Result: Meta analysis by Chen at al. 2008, alcohol lead to higher bp (1,1 versus 2,2 =+7.4mmHg) U 3 G 1 A BP 2 Chen et al, Alcohol Intake and Blood Pressure, PLoS Med 2008 Nov-18 Nov-18 H.S. H.S. 41
42
MORE ON DAGs Nov-18 H.S.
43
Methods of adjusting E D C Method Action Effect
Conditioning, Stratification Close path Matching in cohort Remove arrow Stratification: non-parametric adjustment Regression: parametric adjustment E D C Matching in Case-Control Remove arrow InverseProbabilityWeighting ~ matching Nov-18 H.S. MSM, NSM ←parametric→ regression
44
Confounding versus selection bias
Path: Any trail from E to D (without repeating itself) Open non-causal path = biasing path Confounding and selection bias not always distinct May use DAG to give distinct definitions: C E D B A Confounding: Non-causal path without colliders C E D B A Selection bias: Non-causal path open due to conditioning on a collider E D B A Causal Note: interaction based selection bias not included Hernan et al, A structural approach to selection bias, Epidemiology 2004 Nov-18 H.S.
45
Exercise: M-structure
Show the paths Should we adjust for C? If the design implies a selection on C, what would you call the resulting bias: selection bias or confounding? A B C E D 5 minutes Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. 45 45 45
46
Variables and arrows Variable at least two values
cause, almost any causal definition will work E D usually on the individual level, “at least one subject with an effect of the exposure” ? age only possible on group level E D the dose response can be linear, threshold, U-shaped or any other (DAGs are non- parametric) DAGs are non-parametric Nov-18 H.S.
47
Causal graphs: definitions
Graph showing causal relations and conditional independencies between variables G={V,E} Vertices=random variables Edges=associations or cause Edges undirected or → directed Path Sequence of connected edges: [(L,A),(A,Y)] Parent → child Ancestors → descendants Exogenous: variables with no parents L,U L A Y U Nov-18 H.S.
48
DirectedAcyclicGraphs
Ordinary DAG Arrows = associations Causal DAG Arrows = cause All common causes of any pair of variables in the DAG are included Two types of variables Immutable sex, age Mutable exposure (actions), smoking Mixing variables in a DAG is OK All dependence/independence conclusions valid L A Y U Nov-18 H.S.
49
DAGs and probability theory
Two assumptions Causal Markov assumption Factorize distributions Faithfulness assumption No perfect cancellations of effects Nov-18 H.S.
50
D-separation, moralization
Directed graph-separation two variables d-separated if no open path otherwise d-connected 2 DAG analyses Paths (Pearl) Moralization (Lauritzen) equivalent Nov-18 H.S.
51
3 strategies for estimating causal effects
D C Back-door criterion Condition to close all no-causal paths (between E and D) E D U M1 M2 Front-door criterion Condition an all intermediate variables (between E and D) InstrumentalVariables Use an IV to control the effect (of E on D) IV criteria: IV must affect E No direct IV-D effect IV and D no common causes U 3 IV 1 E D 2 Pearl 2009, Glymor and Greenland, 2008 Nov-18 Nov-18 H.S. H.S. 51
52
Example: front–door criterion
Weight and CoronaryHeartDisease Assume: adjusted for sex, age and smoke lifestyle is unmeasured no other mediators (between E and D) U lifestyle C cholesterol E weight D CHD B blood pressure Can estimate effect of E on D Weight is not a good “action” Nov-18 H.S.
53
Greenland and Brumback, Causal modeling methods, Int J Epid 2002
DAGs and causal pies DAGs are less specific than causal pies DAGs are scale free, interaction is scale dependent Greenland and Brumback, Causal modeling methods, Int J Epid 2002 Nov-18 H.S.
54
Exercise: causal pies H E D hospital diabetes fractures 10 minutes
Write down the causal pies for getting into hospital based on the DAG. Show that the DAG is compatible with at least 3 different combinations of sufficient causes. Selection bias: Discuss how the different combinations of sufficient causes for getting into hospital might affect the estimate of E on D among hospital patients (perhaps difficult). H hospital E diabetes D fractures 10 minutes Nov-18 H.S.
55
Limitations and problems of DAGs
New tool relevance debated, focus on causality Focus on bias precision also important Bias or not magnitude and direction Interaction scale dependent Drawing capture reality, large enough to be realistic small enough to be useful VanderWeele, The sign of the bias of unmeasured confounding. Biometrics 2008 VanderWeele, Causal directed acyclic graphs and the direction of unmeasured confounding bias" Epidemiology 2008 VanderWeele, Directed acyclic graphs, sufficient causes, and the properties of conditioning on a common effect." Am J Epid 2007 Nov-18 H.S.
56
Drawing DAGs Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. 56 56 56
57
Technical note on drawing DAGs
Drawing tools in Word (Add>Figure) Use Dia Use DAGitty Hand-drawn figure. Nov-18 H.S.
58
Direction of arrow C Does physical activity reduce smoking, or
does smoking reduce physical activity? ? E Phys. Act. D Diabetes 2 H Health con. C Smoking Maybe an other variable (health consciousness) is causing both? E Phys. Act. D Diabetes 2 Nov-18 H.S.
59
Time C Does physical activity reduce smoking, or
does smoking reduce physical activity? ? E Phys. Act. D Diabetes 2 C Smoking -5 Smoking measured 5 years ago Physical activity measured 1 year ago E Phys. Act. -1 D Diabetes 2 Nov-18 Nov-18 H.S. H.S. 59
60
Drawing a causal DAG Start: E and D 1 exposure, 1 disease
add: [S] variables conditioned by design add: C-s all common causes of 2 or more variables in the DAG C C must be included common cause V may be excluded exogenous M may be excluded mediator K may be excluded unless we condition V E D M K Nov-18 H.S.
61
Better discussion based on DAGs before your conclusions
Summing up Data driven analyses do not work. Need (causal) information from outside the data. DAGs are intuitive and accurate tools to display that information. Paths show the flow of causality and of bias and guide the analysis. DAGs clarify concepts like confounding and selection bias, and show that we can adjust for both. Better discussion based on DAGs Draw your assumptions before your conclusions Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. 61 61
62
Recommended reading Books Papers
Hernan, M. A. and J. Robins. Causal Inference. Web: Rothman, K. J., S. Greenland, and T. L. Lash. Modern Epidemiology Veierød, M.B., Lydersen, S. Laake,P. Medical Statistics. 2012 Papers Greenland, S., J. Pearl, and J. M. Robins. "Causal diagrams for epidemiologic research." Epidemiology 1999 Hernandez-Diaz, S., E. F. Schisterman, and M. A. Hernan. "The birth weight "paradox" uncovered?" Am J Epidemiol 2006 Hernan, M. A., S. Hernandez-Diaz, and J. M. Robins. "A structural approach to selection bias." Epidemiology 2004 Greenland, S. and B. Brumback. "An overview of relations among causal modeling methods." Int J Epidemiol 2002 Weinberg, C. R. "Can DAGs clarify effect modification?" Epidemiology 2007 Hernan and Robins Causal inference (web) Hernan a struct approach Hernandez- From causal Shahar Rothman Nov-18 Nov-18 Nov-18 Nov-18 H.S. H.S. H.S. 62 62 62
63
References Chen, L., et al. "Alcohol intake and blood pressure: a systematic review implementing a Mendelian randomization approach." PLoS Med 2008 Greenland, S. and B. Brumback. "An overview of relations among causal modelling methods." Int J Epidemiol 2002 Hernan, M. A., S. Hernandez-Diaz, and J. M. Robins. "A structural approach to selection bias." Epidemiology 2004 Hernan, M. A. and S. R. Cole. "Causal diagrams and measurement bias." Am J Epidemiol 2009 Sheehan, N. A., et al. "Mendelian randomisation and causal inference in observational epidemiology." PLoS Med 2008 VanderWeele, T. J. and J. M. Robins. "Directed acyclic graphs, sufficient causes, and the properties of conditioning on a common effect." Am J Epidemiol 2007 VanderWeele, T. J., M. A. Hernan, and J. M. Robins. "Causal directed acyclic graphs and the direction of unmeasured confounding bias." Epidemiology 2008 VanderWeele, T. J. "The sign of the bias of unmeasured confounding." Biometrics 2008 Nov-18 H.S.
64
Exercise: Collider stratification
Hospital risk: Nov-18 H.S.
65
Exercise: Collider stratification
Selection: Confirm that exposure doubles the risk of hospitalization (hint RREH=0.6/0.3=2) Confirm that disease triples the risk of hospitalization Confirm (roughly) that the hospital 2*2 table is correct (hint 35*0.6=22,…) Relative risk of exposure on disease: What is the RR (of E on D) in the hospital group? What is the RR (of E on D) in the non-hospital group? Are the RRs biased? 10 minutes Nov-18 H.S.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.