Download presentation
Presentation is loading. Please wait.
Published byCharles Anderson Modified over 9 years ago
1
1 REASONING WITH CAUSES AND COUNTERFACTUALS Judea Pearl UCLA (www.cs.ucla.edu/~judea/)
2
2 1.Unified conceptualization of counterfactuals, structural-equations, and graphs 2.Confounding demystified 3.Direct and indirect effects (Mediation) 4.Transportability and external validity mathematized OUTLINE
3
3 TRADITIONAL STATISTICAL INFERENCE PARADIGM Data Inference Q(P) (Aspects of P ) P Joint Distribution e.g., Infer whether customers who bought product A would also buy product B. Q = P(B | A)
4
4 Data Inference Q(M) (Aspects of M ) Data Generating Model M – Invariant strategy (mechanism, recipe, law, protocol) by which Nature assigns values to variables in the analysis. Joint Distribution THE STRUCTURAL MODEL PARADIGM M “Think Nature, not experiment!”
5
5 Z Y X INPUTOUTPUT FAMILIAR CAUSAL MODEL ORACLE FOR MANIPILATION
6
6 Y : = 2X Correct notation: Y = 2X WHY PHYSICS DESERVES A NEW ALGEBRA Had X been 3, Y would be 6. If we raise X to 3, Y would be 6. Must “wipe out” X = 1. X = 1 Y = 2 The solution Process information X = 1 e.g., Length ( Y ) equals a constant (2) times the weight ( X ) Scientific Equations (e.g., Hooke’s Law) are non-algebraic
7
7 STRUCTURAL CAUSAL MODELS Definition: A structural causal model is a 4-tuple V,U, F, P(u) , where V = {V 1,...,V n } are endogeneas variables U = {U 1,...,U m } are background variables F = {f 1,..., f n } are functions determining V, v i = f i (v, u) P(u) is a distribution over U P(u) and F induce a distribution P(v) over observable variables e.g.,
8
8 CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: “ Y would be y (in situation u ), had X been x,” denoted Y x (u) = y, means: The solution for Y in a mutilated model M x, (i.e., the equations for X replaced by X = x ) with input U=u, is equal to y. The Fundamental Equation of Counterfactuals:
9
9 READING COUNTERFACTUALS FROM SEM Data shows: = 0.7, = 0.5, = 0.4 A student named Joe, measured X =0.5, Z =1, Y =1.9 Q 1 : What would Joe’s score be had he doubled his study time?
10
10 Q 1 : What would Joe’s score be had he doubled his study time? Answer: Joe’s score would be 1.9 Or, In counterfactual notation: READING COUNTERFACTUALS
11
11 Q 2 : What would Joe’s score be, had the treatment been 0 and had he studied at whatever level he would have studied had the treatment been 1? READING COUNTERFACTUALS
12
12 POTENTIAL AND OBSERVED OUTCOMES PREDICTED BY A STRUCTURAL MODEL
13
13 In particular: CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: “ Y would be y (in situation u ), had X been x,” denoted Y x (u) = y, means: The solution for Y in a mutilated model M x, (i.e., the equations for X replaced by X = x ) with input U=u, is equal to y. Joint probabilities of counterfactuals:
14
14 Define: Assume: Identify: Estimate: Test: THE FIVE NECESSARY STEPS OF CAUSAL ANALYSIS Express the target quantity Q as a function Q ( M ) that can be computed from any model M. Formulate causal assumptions A using some formal language. Determine if Q is identifiable given A. Estimate Q if it is identifiable; approximate it, if it is not. Test the testable implications of A (if any).
15
15 Express the target quantity Q as a function Q ( M ) that can be computed from any model M. Formulate causal assumptions A using some formal language. Determine if Q is identifiable given A. Estimate Q if it is identifiable; approximate it, if it is not. Test the testable implications of A (if any). Define: Assume: Identify: Estimate: Test: THE FIVE NECESSARY STEPS OF CAUSAL ANALYSIS
16
16 CAUSAL MODEL (M A ) A - CAUSAL ASSUMPTIONS Q Queries of interest Q(P) - Identified estimands Data ( D ) Q - Estimates of Q(P) Causal inference T(M A ) - Testable implications Statistical inference Goodness of fit Model testingProvisional claims A* - Logical implications of A CAUSAL MODEL (M A ) THE LOGIC OF CAUSAL ANALYSIS
17
17 IDENTIFICATION IN SCM Find the effect of X on Y, P(y|do(x)), given the causal assumptions shown in G, where Z 1,..., Z k are auxiliary variables. Z6Z6 Z3Z3 Z2Z2 Z5Z5 Z1Z1 X Y Z4Z4 G Can P(y|do(x)) be estimated if only a subset, Z, can be measured?
18
18 ELIMINATING CONFOUNDING BIAS THE BACK-DOOR CRITERION P(y | do(x)) is estimable if there is a set Z of variables such that Z d -separates X from Y in G x. Z6Z6 Z3Z3 Z2Z2 Z5Z5 Z1Z1 X Y Z4Z4 Z6Z6 Z3Z3 Z2Z2 Z5Z5 Z1Z1 X Y Z4Z4 Z GxGx G Moreover, (“adjusting” for Z ) Ignorability
19
19 Front Door EFFECT OF WARM-UP ON INJURY (After Shrier & Platt, 2008) No, no! Watch out! Warm-up Exercises ( X ) Injury ( Y ) ???
20
20 PROPENSITY SCORE ESTIMATOR (Rosenbaum & Rubin, 1983) Z6Z6 Z3Z3 Z2Z2 Z5Z5 Z1Z1 X Y Z4Z4 Adjustment for L replaces Adjustment for Z Theorem: P(y | do(x)) = ? L Can L replace {Z 1, Z 2, Z 3, Z 4, Z 5 } ?
21
21 WHAT PROPENSITY SCORE (PS) PRACTITIONERS NEED TO KNOW 1.The asymptotic bias of PS is EQUAL to that of ordinary adjustment (for same Z ). 2.Including an additional covariate in the analysis CAN SPOIL the bias-reduction potential of others. 3.In particular, instrumental variables tend to amplify bias. 4.Choosing sufficient set for PS, requires knowledge of the model. Z XYXYXY Z XY Z Z
22
22 U c1c1 XY Z c2c2 c3c3 c0c0 SURPRISING RESULT: Instrumental variables are Bias-Amplifiers in linear models (Bhattarcharya & Vogt 2007; Wooldridge 2009) “Naive” bias Adjusted bias (Unobserved)
23
23 INTUTION: When Z is allowed to vary, it absorbs (or explains) some of the changes in X. When Z is fixed the burden falls on U alone, and transmitted to Y (resulting in a higher bias) U c1c1 XY Z c2c2 c3c3 c0c0 U c1c1 XY Z c2c2 c3c3 c0c0 U c1c1 XY Z c2c2 c3c3 c0c0
24
24 c0c0 c2c2 Z c3c3 U Y X c4c4 T1T1 c1c1 WHAT’S BETWEEN AN INSTRUMENT AND A CONFOUNDER? Should we adjust for Z ? T2T2 ANSWER: CONCLUSION: Yes, if No, otherwise Adjusting for a parent of Y is safer than a parent of X
25
25 WHICH SET TO ADJUST FOR Should we adjust for {T}, {Z}, or {T, Z} ? Answer 1: (From bias-amplification considerations) {T} is better than {T, Z} which is the same as {Z} Answer 2: (From variance considerations) {T} is better than {T, Z} which is better than {Z} ZT XY
26
26 CONCLUSIONS The prevailing practice of adjusting for all covariates, especially those that are good predictors of X (the “treatment assignment,” Rubin, 2009) is totally misguided. The “outcome mechanism” is as important, and much safer, from both bias and variance viewpoints As X-rays are to the surgeon, graphs are for causation
27
27 REGRESSION VS. STRUCTURAL EQUATIONS (THE CONFUSION OF THE CENTURY) Regression (claimless, nonfalsifiable): Y = ax + Y Structural (empirical, falsifiable): Y = bx + u Y Claim: (regardless of distributions): E(Y | do(x)) = E(Y | do(x), do(z)) = bx Q. When is b estimable by regression methods? A. Graphical criteria available The mothers of all questions: Q. When would b equal a ? A. When all back-door paths are blocked, (u Y X)
28
28 TWO PARADIGMS FOR CAUSAL INFERENCE Observed: P(X, Y, Z,...) Conclusions needed: P(Y x =y), P(X y =x | Z=z)... How do we connect observables, X,Y,Z,… to counterfactuals Y x, X z, Z y,… ? N-R model Counterfactuals are primitives, new variables Super-distribution Structural model Counterfactuals are derived quantities Subscripts modify the model and distribution
29
29 “SUPER” DISTRIBUTION IN N-R MODEL X0 0 0 1 X0 0 0 1 Y 0 1 0 0 Y 0 1 0 0 Y x=0 0 1 1 1 Z0 1 0 0 Z0 1 0 0 Y x=1 1 0 0 0 X z=0 0 1 0 0 X z=1 0 0 1 1 X y=0 0 1 1 0 Uu1 u2 u3 u4 Uu1 u2 u3 u4 inconsistency: x = 0 Y x=0 = Y Y = xY 1 + (1-x) Y 0
30
30 ARE THE TWO PARADIGMS EQUIVALENT? Yes (Galles and Pearl, 1998; Halpern 1998) In the N-R paradigm, Y x is defined by consistency: In SCM, consistency is a theorem. Moreover, a theorem in one approach is a theorem in the other. Difference: Clarity of assumptions and their implications
31
31 AXIOMS OF STRUCTURAL COUNTERFACTUALS 1.Definiteness 2.Uniqueness 3.Effectiveness 4.Composition (generalized consistency) 5.Reversibility Y x (u)=y: Y would be y, had X been x (in state U = u ) (Galles, Pearl, Halpern, 1998):
32
32 FORMULATING ASSUMPTIONS THREE LANGUAGES 2. Counterfactuals: 1. English: Smoking ( X ), Cancer ( Y ), Tar ( Z ), Genotypes ( U ) X Y Z U ZXY 3. Structural:
33
33 1.Expressing scientific knowledge 2.Recognizing the testable implications of one's assumptions 3.Locating instrumental variables in a system of equations 4.Deciding if two models are equivalent or nested 5.Deciding if two counterfactuals are independent given another 6.Algebraic derivations of identifiable estimands COMPARISON BETWEEN THE N-R AND SCM LANGUAGES
34
34 GRAPHICAL – COUNTERFACTUALS SYMBIOSIS Every causal graph expresses counterfactuals assumptions, e.g., X Y Z consistent, and readable from the graph. Express assumption in graphs Derive estimands by graphical or algebraic methods 2. Missing arcs YZ 1.Missing arrows Y Z
35
35 EFFECT DECOMPOSITION (direct vs. indirect effects) 1.Why decompose effects? 2.What is the definition of direct and indirect effects? 3.What are the policy implications of direct and indirect effects? 4.When can direct and indirect effect be estimated consistently from experimental and nonexperimental data?
36
36 WHY DECOMPOSE EFFECTS? 1.To understand how Nature works 2.To comply with legal requirements 3.To predict the effects of new type of interventions: Signal routing, rather than variable fixing
37
37 XZ Y LEGAL IMPLICATIONS OF DIRECT EFFECT What is the direct effect of X on Y ? (averaged over z ) (Qualifications) (Hiring) (Gender) Can data prove an employer guilty of hiring discrimination? Adjust for Z ? No! No!
38
38 XZ Y FISHER’S GRAVE MISTAKE (after Rubin, 2005) Compare treated and untreated lots of same density No! (Plant density) (Yield) (Soil treatment) What is the direct effect of treatment on yield? (Latent factor) Proposed solution (?): “Principal strata”
39
39 z = f (x, u) y = g (x, z, u) XZ Y NATURAL INTERPRETATION OF AVERAGE DIRECT EFFECTS Natural Direct Effect of X on Y : The expected change in Y, when we change X from x 0 to x 1 and, for each u, we keep Z constant at whatever value it attained before the change. In linear models, DE = Natural Direct Effect Robins and Greenland (1992) – “Pure”
40
40 DEFINITION AND IDENTIFICATION OF NESTED COUNTERFACTUALS Consider the quantity Given M, P(u) , Q is well defined Given u, Z x * (u) is the solution for Z in M x *, call it z is the solution for Y in M xz Can Q be estimated from data? Experimental: nest-free expression Nonexperimental: subscript-free expression
41
41 z = f (x, u) y = g (x, z, u) XZ Y DEFINITION OF INDIRECT EFFECTS Indirect Effect of X on Y : The expected change in Y when we keep X constant, say at x 0, and let Z change to whatever value it would have attained had X changed to x 1. In linear models, IE = TE - DE No Controlled Indirect Effect
42
42 POLICY IMPLICATIONS OF INDIRECT EFFECTS f GENDERQUALIFICATION HIRING What is the indirect effect of X on Y ? The effect of Gender on Hiring if sex discrimination is eliminated. XZ Y IGNORE Deactivating a link – a new type of intervention
43
43 1.The natural direct and indirect effects are identifiable in Markovian models (no confounding), 2.And are given by: 3.Applicable to linear and non-linear models, continuous and discrete variables, regardless of distributional form. MEDIATION FORMULAS
44
44 Z m2m2 XY m1m1 In linear systems xz Linear + interaction
45
45 MEDIATION FORMULAS IN UNCONFOUNDED MODELS X Z Y
46
46 Z m2m2 XY m1m1 Disabling mediation Disabling direct path DE TE - DE TE IE In linear systems Is NOT equal to:
47
47 MEDIATION FORMULA FOR BINARY VARIABLES X Z YXZYE(Y|x,z)=g xz E(Z|x)=h x n1n1n1n1000 n2n2n2n2001 n3n3n3n3010 n4n4n4n4011 n5n5n5n5100 n6n6n6n6101 n7n7n7n7110 n8n8n8n8111
48
48 RAMIFICATION OF THE MEDIATION FORMULA DE should be averaged over mediator levels, IE should NOT be averaged over intervention levels. TE-DE need not equal IE TE-DE = proportion for whom mediation is necessary IE = proportion for whom mediation is sufficient TE-DE informs interventions on indirect pathways IE informs intervention on direct pathways.
49
49 1.A Theory of causal transportability When can causal relations learned from experiments be transferred to a different environment in which no experiment can be conducted? 2.A Theory of statistical transportability When can statistical information learned in one domain be transferred to a different domain in which a.only a subset of variables can be observed? Or, b.only a few samples are available? 3.A Theory of Selection Bias Selection bias occurs when samples used in learning are chosen differently than those used in testing. When can this bias be eliminated? RECENT PROGRESS
50
50 TRANSPORTABILITY — WHEN CAN WE EXTRAPOLATE EXPERIMENTAL FINDINGS TO A DIFFERENT ENVIRONMENT? Experimental study in LA Measured: Needed: Observational study in NYC Measured: X (Intervention) Y (Outcome) Z (Observation) X Y Z
51
51 TRANSPORTABILITY -- WHEN CAN WE EXTRAPOLATE EXPERIMENTAL FINDINGS TO DIFFERENT POPULATIONS? Experimental study in LA Measured: Problem: We find (LA population is younger) What can we say about Intuition: Observational study in NYC Measured: X Y Z = age X Y
52
52 TRANSPORT FORMULAS DEPEND ON THE STORY X Y Z (a) X Y Z (b) a) Z represents age b) Z represents language skill c) Z represents a bio-marker X Y (c) Z
53
53 TRANSPORT FORMULAS DEPEND ON THE STORY a) Z represents age b) Z represents language skill c) Z represents a bio-marker X Y Z (b) S X Y Z (a) S X Y (c) Z S ? ?
54
54 TRANSPORTABILITY (Pearl and Bareinboim, 2011) Definition 1 (Transportability) Given two environments, and *, characterized by graphs G and G*, a causal relation R is transportable from to * if 1. R( ) is estimable from the set I of interventional studies on , and 2. R( * ) is identified from I, P *, G, and G*.
55
55 U W RESULT : ALGORITHM TO DETERMINE IF AN EFFECT IS TRANSPORTABLE X Y Z V S T S INPUT: Causal Graph OUTPUT: 1.Transportable or not? 2.Measurements to be taken in the experimental study 3.Measurements to be taken in the target population 4.A transport formula S Factors creating differences
56
56 XY (f) Z S XY (d) Z S W WHICH MODEL LICENSES THE TRANSPORT OF THE CAUSAL EFFECT X Y XY (e) Z S W (c) XYZ S XYZ S WXYZ S W (b) YX S (a) YX S S External factors creating disparities Yes No YesNoYes
57
57 STATISTICAL TRANSPORTABILITY Why should we transport statistical information? Why not re-learn things from scratch ? 1.Measurements are costly Limit measurements to a subset V * of variables called “scope”. 2.Samples are costly Share samples from diverse domains to illuminate their common components.
58
58 STATISTICAL TRANSPORTABILITY Definition: (Statistical Transportability) A statistical relation R ( P ) is said to be transportable from to * over V * if R ( P *) is identified from P, P *( V *), and G * where P *( V *) is the marginal distribution of P * over a subset of variables V *. XYZ S R=P* ( y | x ) is estimable without re-measuring Y XYZ S If few samples ( N 2 ) are available from * and many samples ( N 1 ) from, then estimating R = P *( y | x ) by decomposition: achieves much higher precision
59
59 X S UYUY Y c0c0 11 22 No selection bias Selection bias activated by a virtual collider Selection bias activated by both a virtual collider and real collider THE ORIGIN OF SELECTION BIAS: UXUX
60
60 Can be eliminated by randomization or adjustment X UYUY Y S2S2 CONTROLLING SELECTION BIAS BY ADJUSTMENT U1U1 U2U2 S3S3 S1S1
61
61 Cannot be eliminated by randomization, requires adjustment for U 2 X UYUY Y S2S2 CONTROLLING SELECTION BIAS BY ADJUSTMENT U1U1 U2U2 S3S3 S1S1
62
62 Cannot be eliminated by randomization or adjustment X UYUY Y S2S2 CONTROLLING SELECTION BIAS BY ADJUSTMENT U1U1 U2U2 S3S3 S1S1
63
63 Cannot be eliminated by adjustment or by randomization CONTROLLING SELECTION BIAS BY ADJUSTMENT X S UYUY Y c0c0 11 22 UXUX
64
64 Adjustment for U 2 gives If all we have is P(u 2 | S 2 = 1), not P(u 2 ), then only the U 2 -specific effect is recoverable X UYUY Y S2S2 CONTROLLING BY ADJUSTMENT MAY REQUIRE EXTERNAL INFORMATION U1U1 U2U2 S3S3 S1S1
65
65 WHEN WOULD THE ODDS RATIO BE RECOVERABLE? (a)OR is recoverable, despite the virtual collider at Y. whenever (X Y | Y,Z) G or (Y S | X, Z) G, giving: OR (X,Y | S = 1) = OR(X,Y) (Cornfield 1951, Wittemore 1978; Geng 1992) (b) the Z -specific OR is recoverable, but is meaningless. OR (X,Y | Z) = OR (X,Y | Z, S = 1) (c) the C -specific OR is recoverable, which is meaningful. OR (X,Y | C) = OR (X,Y | W,C, S = 1) YXS (a)(b)(c) YX Z S YX W C S
66
66 Theorem (Bareinboim and Pearl, 2011) A necessary condition for the recoverability of OR(Y,X) is that every ancestor A i of S that is also a descendant of X have a separating set that either: 1.d -separates A i from X given Y, or 2.d -separates A i from Y given X. Result: Polynomial algorithm to decide recoverability 2. GRAPHICAL CONDITION FOR ODDS-RATIO RECOVERABILITY
67
67 W2W2 EXAMPLES OF ODDS RATIO RECOVERABILITY (a) (b) W3W3 W4W4 S Y X W1W1 W3W3 W4W4 S Y X W2W2 W1W1 RecoverableNon-recoverable {W1, W2, W4}{W1, W2, W4} {W4, W3, W1}{W4, W3, W1} Separator
68
68 W1W1 EXAMPLES OF ODDS RATIO RECOVERABILITY (a) (b) W3W3 W4W4 S Y X W2W2 W1W1 W3W3 W4W4 S Y X W2W2 W1W1 RecoverableNon-recoverable Separator
69
69 0 EXAMPLES OF ODDS RATIO RECOVERABILITY (a) (b) W3W3 W4W4 S Y X W2W2 W1W1 W3W3 W4W4 S Y X W2W2 W1W1 RecoverableNon-recoverable W2W2 W1W1 Separator
70
70 W4W4 EXAMPLES OF ODDS RATIO RECOVERABILITY (a) (b) W3W3 W4W4 S Y X W2W2 W1W1 W3W3 W4W4 S Y X W2W2 W1W1 RecoverableNon-recoverable Separator
71
71 W1W1 EXAMPLES OF ODDS RATIO RECOVERABILITY (a) (b) W3W3 W4W4 S Y X W2W2 W3W3 W4W4 S Y X W2W2 W1W1 RecoverableNon-recoverable W2W2 0 Separator
72
72 I TOLD YOU CAUSALITY IS SIMPLE Formal basis for causal and counterfactual inference (complete) Unification of the graphical, potential-outcome and structural equation approaches Friendly and formal solutions to century-old problems and confusions. No other method can do better (theorem) CONCLUSIONS
73
73 Thank you for agreeing with everything I said.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.