Penn State - March 23, The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others Department of Philosophy Carnegie Mellon
Penn State - March 23, Agenda 1.Morning I: Theoretical Overview Representation, Axioms, Search 2.Morning II: Research Problems 3.Afternoon: TETRAD Demo - Workshop
Penn State - March 23, Part I: Agenda 1.Motivation 2.Representation 3.Connecting Causation to Probability (Independence) 4.Searching for Causal Models 5.Improving on Regression for Causal Inference
Penn State - March 23, Motivation Non-experimental Evidence Typical Predictive Questions Can we predict aggressiveness from the amount of violent TV watched Causal Questions: Does watching violent TV cause Aggression? I.e., if we intervene to change TV watching, will the level of Aggression change?
Penn State - March 23, Causal Estimation Manipulated Probability P(Y | X set= x, Z=z) from Unmanipulated Probability P(Y,X,Z) When and how can we use non-experimental data to tell us about the effect of an intervention?
Penn State - March 23, Spartina in the Cape Fear Estuary
Penn State - March 23, What FactorsDirectly Influence Spartina Growth in the Cape Fear Estuary? pH, salinity, sodium, phosphorus, magnesium, ammonia, zinc, potassium…, what? 14 variables for 45 samples of Spartina from Cape Fear Estuary. Biologist concluded salinity must be a factor. Bayes net analysis says only pH directly affects Spartina biomass Biologist’s subsequent greenhouse experiment says: if pH is controlled for, variations in salinity do not affect growth; but if salinity is controlled for, variations in pH do affect growth.
Penn State - March 23, Representation 1.Association & causal structure - qualitatively 2.Interventions 3.Statistical Causal Models 1.Bayes Networks 2.Structural Equation Models
Penn State - March 23, Causation & Association X is a cause of Y iff x 1 x 2 P(Y | X set= x 1 ) P(Y | X set= x 2 ) Causation is asymmetric: X Y Y X X and Y are associated (X _||_ Y) iff x 1 x 2 P(Y | X = x 1 ) P(Y | X = x 2 ) Association is symmetric: X _||_ Y Y _||_ X
Penn State - March 23, Direct Causation X is a direct cause of Y relative to S, iff z,x 1 x 2 P(Y | X set= x 1, Z set= z) P(Y | X set= x 2, Z set= z) where Z = S - {X,Y}
Penn State - March 23, Causal Graphs Causal Graph G = {V,E} Each edge X Y represents a direct causal claim: X is a direct cause of Y relative to V Chicken Pox
Penn State - March 23, Causal Graphs Not Cause Complete Common Cause Complete
Penn State - March 23, Sweaters On Room Temperature Pre-experimental SystemPost Modeling Ideal Interventions Interventions on the Effect
Penn State - March 23, Modeling Ideal Interventions Sweaters On Room Temperature Pre-experimental SystemPost Interventions on the Cause
Penn State - March 23, Ideal Interventions & Causal Graphs Model an ideal intervention by adding an “intervention” variable outside the original system Erase all arrows pointing into the variable intervened upon Intervene to change Inf Post-intervention graph ? Pre-intervention graph
Penn State - March 23, Conditioning vs. Intervening P(Y | X = x 1 ) vs. P(Y | X set= x 1 ) Teeth Slides
Penn State - March 23, Causal Bayes Networks P(S = 0) =.7 P(S = 1) =.3 P(YF = 0 | S = 0) =.99P(LC = 0 | S = 0) =.95 P(YF = 1 | S = 0) =.01P(LC = 1 | S = 0) =.05 P(YF = 0 | S = 1) =.20P(LC = 0 | S = 1) =.80 P(YF = 1 | S = 1) =.80P(LC = 1 | S = 1) =.20 P(S,YF, L) = P(S) P(YF | S) P(LC | S) The Joint Distribution Factors According to the Causal Graph, i.e., for all X in V P(V) = P(X|Immediate Causes of(X))
Penn State - March 23, Structural Equation Models 1. Structural Equations 2. Statistical Constraints Statistical Model Causal Graph
Penn State - March 23, Structural Equation Models Structural Equations: One Equation for each variable V in the graph: V = f(parents(V), error V ) for SEM (linear regression) f is a linear function Statistical Constraints: Joint Distribution over the Error terms Causal Graph
Penn State - March 23, Structural Equation Models Equations: Education = ed Income = Education income Longevity = Education Longevity Statistical Constraints: ( ed, Income, Income ) ~N(0, 2 ) 2 diagonal - no variance is zero Causal Graph SEM Graph (path diagram)
Penn State - March 23, Connecting Causation to Probability
Penn State - March 23, Causal Structure Statistical Predictions The Markov Condition Causal Graphs Independence X _||_ Z | Y i.e., P(X | Y) = P(X | Y, Z) Causal Markov Axiom
Penn State - March 23, Causal Markov Axiom If G is a causal graph, and P a probability distribution over the variables in G, then in P: every variable V is independent of its non-effects, conditional on its immediate causes.
Penn State - March 23, Causal Markov Condition Two Intuitions: 1) Immediate causes make effects independent of remote causes (Markov). 2) Common causes make their effects independent (Salmon).
Penn State - March 23, Causal Markov Condition 1) Immediate causes make effects independent of remote causes (Markov). E || S | I E = Exposure to Chicken Pox I = Infected S = Symptoms Markov Cond.
Penn State - March 23, Causal Markov Condition 2) Effects are independent conditional on their common causes. YF || LC | S Markov Cond.
Penn State - March 23, Causal Structure Statistical Data
Penn State - March 23, Causal Markov Axiom In SEMs, d-separation follows from assuming independence among error terms that have no connection in the path diagram - i.e., assuming that the model is common cause complete.
Penn State - March 23, Causal Markov and D-Separation In acyclic graphs: equivalent Cyclic Linear SEMs with uncorrelated errors: D-separation correct Markov condition incorrect Cyclic Discrete Variable Bayes Nets: If equilibrium --> d-separation correct Markov incorrect
Penn State - March 23, D-separation: Conditioning vs. Intervening P(X 3 | X 2 ) P(X 3 | X 2, X 1 ) X 3 _||_ X 1 | X 2 P(X 3 | X 2 set= ) = P(X 3 | X 2 set=, X 1 ) X 3 _||_ X 1 | X 2 set=
Penn State - March 23, Search From Statistical Data to Probability to Causation
Penn State - March 23, Causal Discovery Statistical Data Causal Structure Background Knowledge - X 2 before X 3 - no unmeasured common causes Statistical Inference
Penn State - March 23, Faithfulness
Penn State - March 23, Faithfulness Assumption Statistical Constraints arise from Causal Structure, not Coincidence All independence relations holding in a probability distribution P generated by a causal structure G are entailed by d- separation applied to G.
Penn State - March 23, Faithfulness Assumption Revenues = aRate + cEconomy + Rev. Economy = bRate + Econ. a -bc
Penn State - March 23, Representations of D-separation Equivalence Classes We want the representations to: Characterize the Independence Relations Entailed by the Equivalence Class Represent causal features that are shared by every member of the equivalence class
Penn State - March 23, Patterns & PAGs Patterns (Verma and Pearl, 1990): graphical representation of an acyclic d-separation equivalence - no latent variables. PAGs: (Richardson 1994) graphical representation of an equivalence class including latent variable models and sample selection bias that are d-separation equivalent over a set of measured variables X
Penn State - March 23, Patterns
Penn State - March 23, Patterns: What the Edges Mean
Penn State - March 23, Patterns
Penn State - March 23, PAGs: Partial Ancestral Graphs What PAG edges mean.
Penn State - March 23, PAGs: Partial Ancestral Graphs
Penn State - March 23, Overview of Search Methods Constraint Based Searches TETRAD Scoring Searches Scores: BIC, AIC, etc. Search: Hill Climb, Genetic Alg., Simulated Annealing Difficult to extend to latent variable models Heckerman, Meek and Cooper (1999). “A Bayesian Approach to Causal Discovery” chp. 4 in Computation, Causation, and Discovery, ed. by Glymour and Cooper, MIT Press, pp
Penn State - March 23, Search - Illustration
Penn State - March 23, Search: Adjacency
Penn State - March 23,
Penn State - March 23, Search: Orientation in Patterns
Penn State - March 23, Search: Orientation X1 || X2 X1 || X4 | X3 X2 || X4 | X3 After Orientation Phase
Penn State - March 23, The theory of interventions, simplified Start with an graphical causal model, without feedback. Simplest Problem: To predict the probability distribution of other represented variables resulting from an intervention that forces a value x on a variable X, (e.g., everybody has to smoke) but does not otherwise alter the causal structure.
Penn State - March 23, First Thing Remember: The probability distribution for values of Y conditional on X = x is not in general the same as the probability distribution for values of Y on an intervention that sets X = x. Recent work by Waldemann gives evidence that adults are sensitive to the difference.
Penn State - March 23, Example XYZWXYZW Because X influences Y, the value of X gives information about the value of Y, and vice versa. X and Y are dependent in probability. But: An intervention that forces a value Y = y on Y, and otherwise does not disturb the system should not change the probability distribution for values of X. It should, necessarily, make the value of Y independent of X— informally, the value of Y should give no information about the value of X, and vice-versa.
Penn State - March 23, Representing a Simple Manipulation Observed Structure: Structure upon Manipulating Yellow Fingers:
Penn State - March 23, Intervention Calculations XYZWXYZW 1.Set Y = y 2.Do “surgery” on the graph: eliminate edges into Y 3.Use the Markov factorization of the resulting graph and probability distribution to compute the probability distribution for X, Z, W—various effective rules incorporated in what Pearl calls the “Do” calculus.
Penn State - March 23, Intervention Calculations X Y= yZW Original Markov Factorization Pr(X, Y, Z, W) = Pr(W | X,Z) Pr(Z | Y) Pr(Y | X) Pr(X) The Factorization After Intervention Pr(X, Y, W | Do(Y = y) = Pr(W | X,Z) Pr(Z | Y = y) Pr(X)
Penn State - March 23, What’s The Point? Pr(X, Y, W | Do(Y = y) = Pr(W | X,Z) Pr(Z | Y = y) Pr(X) The probability distribution on the left hand side is a prediction of the effects of an intervention. The probabilities on the right are all known before the intervention. So causal structure plus probabilities => prediction of intervention effects; provide a basis for planning.
Penn State - March 23, Surprising Results XYZWXYZW Suppose we know: the causal structure the joint probability for Y, Z, W only—not for X We CAN predict the effect on Z of an intervention on Y—even though Y and W are confounded by unobserved X.
Penn State - March 23, Surprising Result XYZWXYZW The effect of Z on W is confounded by the the probabilistic effect of Y on Z, X on Y and X on W. But the probabilistic effect on W of an intervention on Z CAN be computed from the probabilities before the intervention. How? By conditioning on Y.
Penn State - March 23, Surprising Result XYZWXYZW Pr(W, Y, X | Do(Z = z )) = Pr(W | Z =z, X) Pr(Y | X) Pr(X) (by surgery) Pr(W, X | Do(Z = z), Y) = Pr(W | Z =z, X) Pr(X | Y) (condition on Y) Pr(W | Do(Z = z), Y) = x Pr(W | Z = z, X, Y) Pr(X | Y) (marginalize out X) Pr(W | Do(Z = z), Y) = Pr(W | Z = z, Y = y) (obscure probability theorem) Pr(W | Do(Z = z)) = y Pr(W | Z = z, Y) Pr(Y) (marginalize out Y) The right hand side is composed entirely of observed probabilities.
Penn State - March 23, Pearl’s “Do” Calculus Provides rules that permit one to avoid the probability calculations we just went through—graphical properties determine whether effects of an intervention can be predicted.
Penn State - March 23, Applications Spartina Grass Parenting among Single, Black Mothers Pneumonia Photosynthesis Lead - IQ College Retention Corn Exports Rock Classification College Plans Political Exclusion Satellite Calibration Naval Readiness
Penn State - March 23, References Causation, Prediction, and Search, 2 nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press Causality in Crisis?, (1997) V. McKim and S. Turner (eds.), Univ. of Notre Dame Press. TETRAD IV: Web Course on Causal and Statistical Reasoning : Causality Lab: