1 WHAT'S NEW IN CAUSAL INFERENCE: From Propensity Scores And Mediation To External Validity Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea/)

Slides:



Advertisements
Similar presentations
Department of Computer Science
Advertisements

From Propensity Scores And Mediation To External Validity
RELATED CLASS CS 262 Z – SEMINAR IN CAUSAL MODELING CURRENT TOPICS IN COGNITIVE SYSTEMS INSTRUCTOR: JUDEA PEARL Spring Quarter Monday and Wednesday, 2-4pm.
Improving health worldwide George B. Ploubidis The role of sensitivity analysis in the estimation of causal pathways from observational.
1 WHAT'S NEW IN CAUSAL INFERENCE: From Propensity Scores And Mediation To External Validity Judea Pearl University of California Los Angeles (
Omitted Variable Bias Methods of Economic Investigation Lecture 7 1.
CAUSES AND COUNTERFACTUALS Judea Pearl University of California Los Angeles (
1 THE SYMBIOTIC APPROACH TO CAUSAL INFERENCE Judea Pearl University of California Los Angeles (
Linear Regression and Binary Variables The independent variable does not necessarily need to be continuous. If the independent variable is binary (e.g.,
TRYGVE HAAVELMO AND THE EMERGENCE OF CAUSAL CALCULUS Judea Pearl University of California Los Angeles (
THE MATHEMATICS OF CAUSAL MODELING Judea Pearl Department of Computer Science UCLA.
RECONSIDERING DEFINITIONS OF DIRECT AND INDIRECT EFFECTS IN MEDIATION ANALYSIS WITH A SOLUTION FOR A CONTINUOUS FACTOR Ilya Novikov, Michal Benderly, Laurence.
COMMENTS ON By Judea Pearl (UCLA). notation 1990’s Artificial Intelligence Hoover.
Judea Pearl University of California Los Angeles CAUSAL REASONING FOR DECISION AIDING SYSTEMS.
Sampling Theory Determining the distribution of Sample statistics.
1 CAUSAL INFERENCE: MATHEMATICAL FOUNDATIONS AND PRACTICAL APPLICATIONS Judea Pearl University of California Los Angeles (
SIMPSON’S PARADOX, ACTIONS, DECISIONS, AND FREE WILL Judea Pearl UCLA
CAUSES AND COUNTERFACTUALS OR THE SUBTLE WISDOM OF BRAINLESS ROBOTS.
CAUSAL INFERENCE IN THE EMPIRICAL SCIENCES Judea Pearl University of California Los Angeles (
1 REASONING WITH CAUSES AND COUNTERFACTUALS Judea Pearl UCLA (
Judea Pearl Computer Science Department UCLA DIRECT AND INDIRECT EFFECTS.
EVAL 6970: Cost Analysis for Evaluation Dr. Chris L. S. Coryn Nick Saxton Fall 2014.
Judea Pearl University of California Los Angeles ( THE MATHEMATICS OF CAUSE AND EFFECT.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Judea Pearl University of California Los Angeles ( THE MATHEMATICS OF CAUSE AND EFFECT.
THE MATHEMATICS OF CAUSE AND EFFECT: With Reflections on Machine Learning Judea Pearl Departments of Computer Science and Statistics UCLA.
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
V13: Causality Aims: (1) understand the causal relationships between the variables of a network (2) interpret a Bayesian network as a causal model whose.
ECON 3039 Labor Economics By Elliott Fan Economics, NTU Elliott Fan: Labor 2015 Fall Lecture 21.
REASONING WITH CAUSE AND EFFECT Judea Pearl Department of Computer Science UCLA.
Propensity Score Matching for Causal Inference: Possibilities, Limitations, and an Example sean f. reardon MAPSS colloquium March 6, 2007.
Public Policy Analysis ECON 3386 Anant Nyshadham.
BIOST 536 Lecture 11 1 Lecture 11 – Additional topics in Logistic Regression C-statistic (“concordance statistic”)  Same as Area under the curve (AUC)
1 G Lect 14M Review of topics covered in course Mediation/Moderation Statistical power for interactions What topics were not covered? G Multiple.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
CAUSES AND COUNTERFACTIALS IN THE EMPIRICAL SCIENCES Judea Pearl University of California Los Angeles (
REASONING WITH CAUSE AND EFFECT Judea Pearl Department of Computer Science UCLA.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
THE MATHEMATICS OF CAUSE AND COUNTERFACTUALS Judea Pearl University of California Los Angeles (
Chapter 8: Simple Linear Regression Yang Zhenlin.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Impact Evaluation Sebastian Galiani November 2006 Causal Inference.
Applying Causal Inference Methods to Improve Identification of Health and Healthcare Disparities, and the Underlying Mediators and Moderators of Disparities.
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
Judea Pearl Computer Science Department UCLA ROBUSTNESS OF CAUSAL CLAIMS.
CAUSAL REASONING FOR DECISION AIDING SYSTEMS COGNITIVE SYSTEMS LABORATORY UCLA Judea Pearl, Mark Hopkins, Blai Bonet, Chen Avin, Ilya Shpitser.
Mediation: The Causal Inference Approach David A. Kenny.
1 CONFOUNDING EQUIVALENCE Judea Pearl – UCLA, USA Azaria Paz – Technion, Israel (
Summary: connecting the question to the analysis(es) Jay S. Kaufman, PhD McGill University, Montreal QC 26 February :40 PM – 4:20 PM National Academy.
Variable selection in Regression modelling Simon Thornley.
Experimental Evaluations Methods of Economic Investigation Lecture 4.
THE MATHEMATICS OF CAUSE AND EFFECT: Thinking Nature and Talking Counterfactuals Judea Pearl Departments of Computer Science and Statistics UCLA.
CAUSAL INFERENCE IN STATISTICS: A Gentle Introduction Judea Pearl Departments of Computer Science and Statistics UCLA.
Methods of Presenting and Interpreting Information Class 9.
University of California
Department of Computer Science
Chen Avin Ilya Shpitser Judea Pearl Computer Science Department UCLA
Center for Causal Discovery: Summer Short Course/Datathon
Department of Computer Science
A MACHINE LEARNING EXERCISE
Computer Science and Statistics
CAUSAL INFERENCE IN STATISTICS
From Propensity Scores And Mediation To External Validity
THE MATHEMATICS OF PROGRAM EVALUATION
Counterfactual models Time dependent confounding
Department of Computer Science
CAUSAL REASONING FOR DECISION AIDING SYSTEMS
Björn Bornkamp, Georgina Bermann
Presentation transcript:

1 WHAT'S NEW IN CAUSAL INFERENCE: From Propensity Scores And Mediation To External Validity Judea Pearl University of California Los Angeles (

2 1.Unified conceptualization of counterfactuals, structural-equations, and graphs 2.Propensity scores demystified 3.Direct and indirect effects (Mediation) 4.External validity mathematized OUTLINE

3 TRADITIONAL STATISTICAL INFERENCE PARADIGM Data Inference Q(P) (Aspects of P ) P Joint Distribution e.g., Infer whether customers who bought product A would also buy product B. Q = P(B | A)

4 Data Inference Q(M) (Aspects of M ) Data Generating Model M – Invariant strategy (mechanism, recipe, law, protocol) by which Nature assigns values to variables in the analysis. Joint Distribution THE STRUCTURAL MODEL PARADIGM M “Think Nature, not experiment!”

5 Z Y X INPUTOUTPUT FAMILIAR CAUSAL MODEL ORACLE FOR MANIPILATION

6 STRUCTURAL CAUSAL MODELS Definition: A structural causal model is a 4-tuple  V,U, F, P(u) , where V = {V 1,...,V n } are endogeneas variables U = {U 1,...,U m } are background variables F = {f 1,..., f n } are functions determining V, v i = f i (v, u) P(u) is a distribution over U P(u) and F induce a distribution P(v) over observable variables e.g.,

7 CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: “ Y would be y (in situation u ), had X been x,” denoted Y x (u) = y, means: The solution for Y in a mutilated model M x, (i.e., the equations for X replaced by X = x ) with input U=u, is equal to y. The Fundamental Equation of Counterfactuals:

8 READING COUNTERFACTUALS FROM SEM Data shows:  = 0.7,  = 0.5,  = 0.4 A student named Joe, measured X =0.5, Z =1, Y =1.9 Q 1 : What would Joe’s score be had he doubled his study time?

9 Answer: Joe’s score would be 1.9 Or, In counterfactual notaion: READING COUNTERFACTUALS

10 Q 2 : What would Joe’s score be, had the treatment been 0 and had he studied at whatever level he would have studied had the treatment been 1? READING COUNTERFACTUALS

11 In particular: CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: “ Y would be y (in situation u ), had X been x,” denoted Y x (u) = y, means: The solution for Y in a mutilated model M x, (i.e., the equations for X replaced by X = x ) with input U=u, is equal to y. Joint probabilities of counterfactuals:

12 Define: Assume: Identify: Estimate: Test: THE FIVE NECESSARY STEPS OF CAUSAL ANALYSIS Express the target quantity Q as a function Q ( M ) that can be computed from any model M. Formulate causal assumptions A using some formal language. Determine if Q is identifiable given A. Estimate Q if it is identifiable; approximate it, if it is not. Test the testable implications of A (if any).

13 CAUSAL MODEL (M A ) A - CAUSAL ASSUMPTIONS Q Queries of interest Q(P) - Identified estimands Data ( D ) Q - Estimates of Q(P) Causal inference T(M A ) - Testable implications Statistical inference Goodness of fit Model testingProvisional claims A* - Logical implications of A CAUSAL MODEL (M A ) THE LOGIC OF CAUSAL ANALYSIS

14 IDENTIFICATION IN SCM Find the effect of X on Y, P ( y | do ( x )), given the causal assumptions shown in G, where Z 1,..., Z k are auxiliary variables. Z6Z6 Z3Z3 Z2Z2 Z5Z5 Z1Z1 X Y Z4Z4 G Can P ( y | do ( x )) be estimated if only a subset, Z, can be measured?

15 ELIMINATING CONFOUNDING BIAS THE BACK-DOOR CRITERION P(y | do(x)) is estimable if there is a set Z of variables such that Z d -separates X from Y in G x. Z6Z6 Z3Z3 Z2Z2 Z5Z5 Z1Z1 X Y Z4Z4 Z6Z6 Z3Z3 Z2Z2 Z5Z5 Z1Z1 X Y Z4Z4 Z GxGx G Moreover, (“adjusting” for Z )  Ignorability

16 Front Door EFFECT OF WARM-UP ON INJURY (After Shrier & Platt, 2008) No, no! Watch out! Warm-up Exercises ( X ) Injury ( Y ) ???

17 PROPENSITY SCORE ESTIMATOR (Rosenbaum & Rubin, 1983) Z6Z6 Z3Z3 Z2Z2 Z5Z5 Z1Z1 X Y Z4Z4 Adjustment for L replaces Adjustment for Z Theorem: P(y | do(x)) = ? L Can L replace {Z 1, Z 2, Z 3, Z 4, Z 5 } ?

18 WHAT PROPENSITY SCORE (PS) PRACTITIONERS NEED TO KNOW 1.The asymptotic bias of PS is EQUAL to that of ordinary adjustment (for same Z ). 2.Including an additional covariate in the analysis CAN SPOIL the bias-reduction potential of others. 3.In particular, instrumental variables tend to amplify bias. 4.Choosing sufficient set for PS, requires knowledge of the model. Z XYXYXY Z XY Z Z

19 U c1c1 XY Z c2c2 c3c3 c0c0 SURPRISING RESULT: Instrumental variables are Bias-Amplifiers in linear models (Bhattarcharya & Vogt 2007; Wooldridge 2009) “Naive” bias Adjusted bias (Unobserved)

20 INTUTION: When Z is allowed to vary, it absorbs (or explains) some of the changes in X. When Z is fixed the burden falls on U alone, and transmitted to Y (resulting in a higher bias) U c1c1 XY Z c2c2 c3c3 c0c0 U c1c1 XY Z c2c2 c3c3 c0c0 U c1c1 XY Z c2c2 c3c3 c0c0

21 c0c0 c2c2 Z c3c3 U Y X c4c4 T1T1 c1c1 WHAT’S BETWEEN AN INSTRUMENT AND A CONFOUNDER? Should we adjust for Z ? T2T2 ANSWER: CONCLUSION: Yes, if No, otherwise Adjusting for a parent of Y is safer than a parent of X

22 WHICH SET TO ADJUST FOR Should we adjust for {T}, {Z}, or {T, Z} ? Answer 1: (From bias-amplification considerations) {T} is better than {T, Z} which is the same as {Z} Answer 2: (From variance considerations) {T} is better than {T, Z} which is better than {Z} ZT XY

23 CONCLUSIONS The prevailing practice of adjusting for all covariates, especially those that are good predictors of X (the “treatment assignment,” Rubin, 2009) is totally misguided. The “outcome mechanism” is as important, and much safer, from both bias and variance viewpoints As X-rays are to the surgeon, graphs are for causation

24 REGRESSION VS. STRUCTURAL EQUATIONS (THE CONFUSION OF THE CENTURY) Regression (claimless, nonfalsifiable): Y = ax +  Y Structural (empirical, falsifiable): Y = bx + u Y Claim: (regardless of distributions): E(Y | do(x)) = E(Y | do(x), do(z)) = bx Q. When is b estimable by regression methods? A. Graphical criteria available  The mothers of all questions: Q. When would b equal a ? A. When all back-door paths are blocked, ( u Y X )

25 TWO PARADIGMS FOR CAUSAL INFERENCE Observed: P(X, Y, Z,...) Conclusions needed: P(Y x =y), P(X y =x | Z=z)... How do we connect observables, X,Y,Z,… to counterfactuals Y x, X z, Z y,… ? N-R model Counterfactuals are primitives, new variables Super-distribution Structural model Counterfactuals are derived quantities Subscripts modify the model and distribution

26 “SUPER” DISTRIBUTION IN N-R MODEL X X Y Y Y x= Z Z Y x= X z= X z= X y=0  0  1  1  0  Uu1 u2 u3 u4 Uu1 u2 u3 u4 inconsistency: x = 0  Y x=0 = Y Y = xY 1 + (1-x) Y 0

27 ARE THE TWO PARADIGMS EQUIVALENT? Yes (Galles and Pearl, 1998; Halpern 1998) In the N-R paradigm, Y x is defined by consistency: In SCM, consistency is a theorem. Moreover, a theorem in one approach is a theorem in the other. Difference: Clarity of assumptions and their implications

28 AXIOMS OF STRUCTURAL COUNTERFACTUALS 1.Definiteness 2.Uniqueness 3.Effectiveness 4.Composition (generalized consistency) 5.Reversibility Y x (u)=y: Y would be y, had X been x (in state U = u ) (Galles, Pearl, Halpern, 1998):

29 FORMULATING ASSUMPTIONS THREE LANGUAGES 2. Counterfactuals: 1. English: Smoking ( X ), Cancer ( Y ), Tar ( Z ), Genotypes ( U ) X Y Z U ZXY 3. Structural:

30 1.Expressing scientific knowledge 2.Recognizing the testable implications of one's assumptions 3.Locating instrumental variables in a system of equations 4.Deciding if two models are equivalent or nested 5.Deciding if two counterfactuals are independent given another 6.Algebraic derivations of identifiable estimands COMPARISON BETWEEN THE N-R AND SCM LANGUAGES

31 GRAPHICAL – COUNTERFACTUALS SYMBIOSIS Every causal graph expresses counterfactuals assumptions, e.g., X  Y  Z consistent, and readable from the graph. Express assumption in graphs Derive estimands by graphical or algebraic methods 1.Missing arrows Y  Z 2. Missing arcs YZ

32 EFFECT DECOMPOSITION (direct vs. indirect effects) 1.Why decompose effects? 2.What is the definition of direct and indirect effects? 3.What are the policy implications of direct and indirect effects? 4.When can direct and indirect effect be estimated consistently from experimental and nonexperimental data?

33 WHY DECOMPOSE EFFECTS? 1.To understand how Nature works 2.To comply with legal requirements 3.To predict the effects of new type of interventions: Signal routing, rather than variable fixing

34 XZ Y LEGAL IMPLICATIONS OF DIRECT EFFECT What is the direct effect of X on Y ? (averaged over z ) (Qualifications) (Hiring) (Gender) Can data prove an employer guilty of hiring discrimination? Adjust for Z ? No! No!

35 XZ Y FISHER’S GRAVE MISTAKE (after Rubin, 2005) Compare treated and untreated lots of same density No! (Plant density) (Yield) (Soil treatment) What is the direct effect of treatment on yield? (Latent factor) Proposed solution (?): “Principal strata”

36 z = f (x, u) y = g (x, z, u) XZ Y NATURAL INTERPRETATION OF AVERAGE DIRECT EFFECTS Natural Direct Effect of X on Y : The expected change in Y, when we change X from x 0 to x 1 and, for each u, we keep Z constant at whatever value it attained before the change. In linear models, DE = Natural Direct Effect Robins and Greenland (1992) – “Pure”

37 DEFINITION AND IDENTIFICATION OF NESTED COUNTERFACTUALS Consider the quantity Given  M, P(u) , Q is well defined Given u, Z x * (u) is the solution for Z in M x *, call it z is the solution for Y in M xz Can Q be estimated from data? Experimental: nest-free expression Nonexperimental: subscript-free expression

38 z = f (x, u) y = g (x, z, u) XZ Y DEFINITION OF INDIRECT EFFECTS Indirect Effect of X on Y : The expected change in Y when we keep X constant, say at x 0, and let Z change to whatever value it would have attained had X changed to x 1. In linear models, IE = TE - DE No Controlled Indirect Effect

39 POLICY IMPLICATIONS OF INDIRECT EFFECTS f GENDERQUALIFICATION HIRING What is the indirect effect of X on Y? The effect of Gender on Hiring if sex discrimination is eliminated. XZ Y IGNORE Deactivating a link – a new type of intervention

40 1.The natural direct and indirect effects are identifiable in Markovian models (no confounding), 2.And are given by: 3.Applicable to linear and non-linear models, continuous and discrete variables, regardless of distributional form. MEDIATION FORMULAS

41 Z m2m2 XY m1m1 In linear systems  xz Linear + interaction

42 MEDIATION FORMULAS IN UNCONFOUNDED MODELS X Z Y

43 Z m2m2 XY m1m1 Disabling mediation Disabling direct path DE TE - DE TE IE In linear systems Is NOT equal to:

44 MEDIATION FORMULA FOR BINARY VARIABLES X Z YXZY E(Y|x,z)= E(Y|x,z)=g xz E(Z|x)=h x n1n1n1n1000 n2n2n2n2001 n3n3n3n3010 n4n4n4n4011 n5n5n5n5100 n6n6n6n6101 n7n7n7n7110 n8n8n8n8111

45 RAMIFICATION OF THE MEDIATION FORMULA DE should be averaged over mediator levels, IE should NOT be averaged over exposure levels. TE-DE need not equal IE TE-DE = proportion for whom mediation is necessary IE = proportion for whom mediation is sufficient TE-DE informs interventions on indirect pathways IE informs intervention on direct pathways.

46 TRANSPORTABILITY -- WHEN CAN WE EXTRPOLATE EXPERIMENTAL FINDINGS TO DIFFERENT POPULATIONS? Experimental study in LA Measured: Problem: We find (LA population is younger) What can we say about Intuition: Observational study in NYC Measured: X Y Z = age X Y

47 TRANSPORT FORMULAS DEPEND ON THE STORY X Y Z (a) X Y Z (b) a) Z represents age b) Z represents language skill c) Z represents a bio-marker X Y (c) Z

48 TRANSPORTABILITY (Pearl and Bareinboim, 2010) Definition 1 (Transportability) Given two populations, denoted  and  *, characterized by models M = and M* =, respectively, a causal relation R is said to be transportable from  to  * if 1. R (  ) is estimable from the set I of interventional studies on , and 2. R (  *) is identified from I, P *, G, and G + S. S = external factors responsible for M  M*

49 TRANSPORT FORMULAS DEPEND ON THE STORY a) Z represents age b) Z represents language skill c) Z represents a bio-marker X Y Z (b) S X Y Z (a) S X Y (c) Z S ? ?

50 XY (d) Z S W WHICH MODEL LICENSES THE TRANSPORT OF THE CAUSAL EFFECT XY (e) Z S W (c) XYZ S XY ( Z S XYZ S W XY (f) Z S XYZ S W (b) YX S (a) YX S

51 U W DETERMINE IF THE CAUSAL EFFECT IS TRANSPORTABLE X Y Z V S T S The transport formula What measurements need to be taken in the study and in the target population?

52 SURROGATE ENDPOINTS – A CAUSAL PERSPECTIVE The problem: Infer effects of (randomized) treatment (X) on outcome (Y) from measurements taken on a surrogate variable (Z), which is more readily measurable, and is sufficiently well correlated with the first (Ellenberg and Hamilton 1989). Prentice 1989: "strong correlation is not sufficient," there should be "no pathways that bypass the surrogate" (2005) Everyone agrees that correlation is not sufficient, and no one explains why.

53 WHY STRONG CORRELATION IS NOT SUFFICIENT FOR SURROGACY Joffe and Green (2009): "A surrogate outcome is an outcome for which knowing the effect of treatment on the surrogate allows prediction of the effect of treatment on the more clinically relevant outcome.'' Two effects = Two experiments conducted under two different conditions. Surrogacy = ``Strong correlation,'' + robustness to the new conditions. New condition = Interventions to change the surrogate Z.

54 WHO IS A GOOD SURROGATE? S S S X S XY (a) Z S XY (b) Z Y (c) Z X (d) ZY XY (e) Z S XY (f) Z U W

55 Definition (Pearl and Bareinboim, 2011): A variable Z is said to be a surrogate endpoint relative the effect of X on Y if and only if: 1.P(y|do(x), z) is highly sensitive to Z in the experimental study, and 2.P(y|do(x), z, s) = P(y|do(x), z, s ) where S is a selection variable added to G and directed towards Z. In words, the causal effect of X on Y can be reliably predicted from measurements of Z, regardless of the mechanism responsible for variations in Z. SURROGACY: CORRELATIONS AND ROBUSTNESS

56 Z and X d -separate S from Y. SURROGACY: A GRAPHICAL CRITERION S S S X S XY (a) Z S XY (b) Z Y (c) Z X (d) ZY XY (e) Z S XY (f) Z U W

57 I TOLD YOU CAUSALITY IS SIMPLE Formal basis for causal and counterfactual inference (complete) Unification of the graphical, potential-outcome and structural equation approaches Friendly and formal solutions to century-old problems and confusions. No other method can do better (theorem) CONCLUSIONS

58 Thank you for agreeing with everything I said.

59 WHAT PROPENSITY SCORE (PS) PRACTITIONERS NEED TO KNOW 1.The asymptotic bias of PS is EQUAL to that of ordinary adjustment (for same Z ). 2.Including an additional covariate in the analysis CAN SPOIL the bias-reduction potential of others. 3.In particular, instrumental variables tend to amplify bias. 4.Choosing sufficient set for PS, requires knowledge of the model.

60 Q 1 : What would Joe’s score be had he doubled his study time? Answer: Joe’s score would be 1.9 Or, In counterfactual notaion: READING COUNTERFACTUALS

61 Q 2 : What would Joe’s score be, had the treatment been 0 and had he studied at whatever level he would have studied had the treatment been 1? READING COUNTERFACTUALS Q 1 : What would Joe’s score be had he doubled his study time?