University of California ON MEASUREMENT BIAS IN CAUSAL INFERENCE Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea/)
THE MEASUREMENT BIAS PROBLEM Given causal diagram Z-unobserved (latent) Find P(y | do (x)) We know that But Minimize the bias Z W X Y
OUTLINE Effect Restoration using Matrix Inversion Example: Restoration in binary models Extension to multivariate confounders Effect Restoration in linear models The 3-proxy principle and its variations Model testing with measurement errors
MEASUREMENT BIAS AND EFFECT RESTORATION Unobserved Z P(w|z) P(y | do(x)) is identifiable from measurement of W, if P(w | z) is given (Selen, 1986; Greenland & Lash, 2008) W X Y Assume: (local independence) Solution:
EFFECT RESTORATION IN BINARY MODELS Z 1 W X Y 1 Weight distribution from cell (x,y) To cell (x,y,z0) W X Y To cell (x,y,z1) undefined undefined 1
WHAT IF Z IS MULTI-VARIATE? G Z1 Z2 W1 W2 Z3 W3 Z4 Z5 W4 W5 X Z6 Y If Z is high-dimensional, most cells will be empty of samples and, even if P(w | z) is known, P(x,w,y) cannot be estimated
PROPENSITY SCORE ESTIMATOR (Rosenbaum & Rubin, 1983) Z1 Z2 P(y | do(x)) = ? L Z4 Z3 Z5 X Z6 Y Adjustment for L replaces Adjustment for Z Theorem:
PROPENSITY SCORE RESTORATION G Z1 Z2 W1 W2 Z3 W3 Z4 Z5 W4 W5 X Z6 Y From observed samples (x,w,y) to synthetic samples (x,z,y), to L(z), to
EFFECT RESTORATION IN LINEAR MODELS Z c1 c2 c3 W X c0 Y The pivotal parameter needed is
EFFECT RESTORATION FROM A SECOND PROXY W Z X Y c1 c2 c3 c0 V Z X Y c1 (b)
THE THREE-PROXIES PRINCIPLE Z Y c2 c0 c1 c3 c4 W V X Cai and Kuroki (2008) c0 is identifiable
MODEL TESTING WITH MEASUREMENT ERRORS Z unobserved Problem: Test if c1 Solution: Test if c0 = 0 Theorem 1: If a latent variable Z d-separates two measured variables, X and Y, and Z has a proxy W, W = cZ + , then cov(XY) must satisfy: cov(XY)=cov(XW) cov(WY) / c2 var(Z) Corollary: is testable if k=c2 var(Z) is estimable Example: Given W and V, k is estimable, and X c0 Y V Z X Y c1 c2 c4 c0 W c3
CONCLUSIONS Effect restoration is feasible Rests on two principles 2.1 Matrix inversion in discrete models Requires synthetic population and propensity score estimation 2.2 3-proxies per latent in linear models proxies can be decoupled by clever conditioning 3. Conditional independence tests can be replaced by tetrad-like tests (using 2SLS) for model testing