Penn State - March 23, 20071 The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others.

Slides:

Advertisements

Similar presentations

1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.

Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.

Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.

Topic Outline Motivation Representing/Modeling Causal Systems

Weakening the Causal Faithfulness Assumption

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003.

1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.

Challenges posed by Structural Equation Models Thomas Richardson Department of Statistics University of Washington Joint work with Mathias Drton, UC Berkeley.

Learning Causality Some slides are from Judea Pearl’s class lecture

1 Automatic Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour Dept. of Philosophy & CALD Carnegie Mellon.

Causal Networks Denny Borsboom. Overview The causal relation Causality and conditional independence Causal networks Blocking and d-separation Excercise.

From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.

The Simple Linear Regression Model: Specification and Estimation

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

1Causality & MDL Causal Models as Minimal Descriptions of Multivariate Systems Jan Lemeire June 15 th 2006.

Ambiguous Manipulations

Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.

1 gR2002 Peter Spirtes Carnegie Mellon University.

Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.

Bayes Net Perspectives on Causation and Causal Inference

1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.

1 Tetrad: Machine Learning and Graphcial Causal Models Richard Scheines Joe Ramsey Carnegie Mellon University Peter Spirtes, Clark Glymour.

Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.

A Brief Introduction to Graphical Models

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?

1 Tutorial: Causal Model Search Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Joe Ramsey, others.

1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.

1 Center for Causal Discovery: Summer Workshop June 8-11, 2015 Carnegie Mellon University.

Nov. 13th, Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon.

V13: Causality Aims: (1) understand the causal relationships between the variables of a network (2) interpret a Bayesian network as a causal model whose.

Methodological Problems in Cognitive Psychology David Danks Institute for Human & Machine Cognition January 10, 2003.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Lecture 2: Statistical learning primer for biologists

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

CAUSAL REASONING FOR DECISION AIDING SYSTEMS COGNITIVE SYSTEMS LABORATORY UCLA Judea Pearl, Mark Hopkins, Blai Bonet, Chen Avin, Ilya Shpitser.

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.

1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.

1Causal Inference and the KMSS. Predicting the future with the right model Stijn Meganck Vrije Universiteit Brussel Department of Electronics and Informatics.

1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.

Tetrad 1)Main website: 2)Download:

1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:

Day 3: Search Continued Center for Causal Discovery June 15, 2015

URBDP 591 A Lecture 10: Causality

Markov Properties of Directed Acyclic Graphs

Center for Causal Discovery: Summer Short Course/Datathon

Causal Data Mining Richard Scheines

Inferring Causal Graphs

Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour,

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Searching for Graphical Causal Models of Education Data

Presentation transcript:

Penn State - March 23, The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others Department of Philosophy Carnegie Mellon

Penn State - March 23, Agenda 1.Morning I: Theoretical Overview Representation, Axioms, Search 2.Morning II: Research Problems 3.Afternoon: TETRAD Demo - Workshop

Penn State - March 23, Part I: Agenda 1.Motivation 2.Representation 3.Connecting Causation to Probability (Independence) 4.Searching for Causal Models 5.Improving on Regression for Causal Inference

Penn State - March 23, Motivation Non-experimental Evidence Typical Predictive Questions Can we predict aggressiveness from the amount of violent TV watched Causal Questions: Does watching violent TV cause Aggression? I.e., if we intervene to change TV watching, will the level of Aggression change?

Penn State - March 23, Causal Estimation Manipulated Probability P(Y | X set= x, Z=z) from Unmanipulated Probability P(Y,X,Z) When and how can we use non-experimental data to tell us about the effect of an intervention?

Penn State - March 23, Spartina in the Cape Fear Estuary

Penn State - March 23, What FactorsDirectly Influence Spartina Growth in the Cape Fear Estuary? pH, salinity, sodium, phosphorus, magnesium, ammonia, zinc, potassium…, what? 14 variables for 45 samples of Spartina from Cape Fear Estuary. Biologist concluded salinity must be a factor. Bayes net analysis says only pH directly affects Spartina biomass Biologist’s subsequent greenhouse experiment says: if pH is controlled for, variations in salinity do not affect growth; but if salinity is controlled for, variations in pH do affect growth.

Penn State - March 23, Representation 1.Association & causal structure - qualitatively 2.Interventions 3.Statistical Causal Models 1.Bayes Networks 2.Structural Equation Models

Penn State - March 23, Causation & Association X is a cause of Y iff  x 1  x 2 P(Y | X set= x 1 )  P(Y | X set= x 2 ) Causation is asymmetric: X Y  Y X X and Y are associated (X _||_ Y) iff  x 1  x 2 P(Y | X = x 1 )  P(Y | X = x 2 ) Association is symmetric: X _||_ Y  Y _||_ X

Penn State - March 23, Direct Causation X is a direct cause of Y relative to S, iff  z,x 1  x 2 P(Y | X set= x 1, Z set= z)  P(Y | X set= x 2, Z set= z) where Z = S - {X,Y}

Penn State - March 23, Causal Graphs Causal Graph G = {V,E} Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V Chicken Pox

Penn State - March 23, Causal Graphs Not Cause Complete Common Cause Complete

Penn State - March 23, Sweaters On Room Temperature Pre-experimental SystemPost Modeling Ideal Interventions Interventions on the Effect

Penn State - March 23, Modeling Ideal Interventions Sweaters On Room Temperature Pre-experimental SystemPost Interventions on the Cause

Penn State - March 23, Ideal Interventions & Causal Graphs Model an ideal intervention by adding an “intervention” variable outside the original system Erase all arrows pointing into the variable intervened upon Intervene to change Inf Post-intervention graph ? Pre-intervention graph

Penn State - March 23, Conditioning vs. Intervening P(Y | X = x 1 ) vs. P(Y | X set= x 1 ) Teeth Slides

Penn State - March 23, Causal Bayes Networks P(S = 0) =.7 P(S = 1) =.3 P(YF = 0 | S = 0) =.99P(LC = 0 | S = 0) =.95 P(YF = 1 | S = 0) =.01P(LC = 1 | S = 0) =.05 P(YF = 0 | S = 1) =.20P(LC = 0 | S = 1) =.80 P(YF = 1 | S = 1) =.80P(LC = 1 | S = 1) =.20 P(S,YF, L) = P(S) P(YF | S) P(LC | S) The Joint Distribution Factors According to the Causal Graph, i.e., for all X in V P(V) =  P(X|Immediate Causes of(X))

Penn State - March 23, Structural Equation Models 1. Structural Equations 2. Statistical Constraints Statistical Model Causal Graph

Penn State - March 23, Structural Equation Models  Structural Equations: One Equation for each variable V in the graph: V = f(parents(V), error V ) for SEM (linear regression) f is a linear function  Statistical Constraints: Joint Distribution over the Error terms Causal Graph

Penn State - March 23, Structural Equation Models Equations: Education =  ed Income =    Education  income Longevity =    Education  Longevity Statistical Constraints: (  ed,  Income,  Income ) ~N(0,  2 )  2  diagonal - no variance is zero Causal Graph SEM Graph (path diagram)

Penn State - March 23, Connecting Causation to Probability

Penn State - March 23, Causal Structure Statistical Predictions The Markov Condition Causal Graphs Independence X _||_ Z | Y i.e., P(X | Y) = P(X | Y, Z) Causal Markov Axiom

Penn State - March 23, Causal Markov Axiom If G is a causal graph, and P a probability distribution over the variables in G, then in P: every variable V is independent of its non-effects, conditional on its immediate causes.

Penn State - March 23, Causal Markov Condition Two Intuitions: 1) Immediate causes make effects independent of remote causes (Markov). 2) Common causes make their effects independent (Salmon).

Penn State - March 23, Causal Markov Condition 1) Immediate causes make effects independent of remote causes (Markov). E || S | I E = Exposure to Chicken Pox I = Infected S = Symptoms Markov Cond.

Penn State - March 23, Causal Markov Condition 2) Effects are independent conditional on their common causes. YF || LC | S Markov Cond.

Penn State - March 23, Causal Structure  Statistical Data

Penn State - March 23, Causal Markov Axiom In SEMs, d-separation follows from assuming independence among error terms that have no connection in the path diagram - i.e., assuming that the model is common cause complete.

Penn State - March 23, Causal Markov and D-Separation In acyclic graphs: equivalent Cyclic Linear SEMs with uncorrelated errors: D-separation correct Markov condition incorrect Cyclic Discrete Variable Bayes Nets: If equilibrium --> d-separation correct Markov incorrect

Penn State - March 23, D-separation: Conditioning vs. Intervening P(X 3 | X 2 )  P(X 3 | X 2, X 1 ) X 3 _||_ X 1 | X 2 P(X 3 | X 2 set= ) = P(X 3 | X 2 set=, X 1 ) X 3 _||_ X 1 | X 2 set=

Penn State - March 23, Search From Statistical Data to Probability to Causation

Penn State - March 23, Causal Discovery Statistical Data  Causal Structure Background Knowledge - X 2 before X 3 - no unmeasured common causes Statistical Inference

Penn State - March 23, Faithfulness

Penn State - March 23, Faithfulness Assumption Statistical Constraints arise from Causal Structure, not Coincidence All independence relations holding in a probability distribution P generated by a causal structure G are entailed by d- separation applied to G.

Penn State - March 23, Faithfulness Assumption Revenues = aRate + cEconomy +  Rev. Economy = bRate +  Econ. a  -bc

Penn State - March 23, Representations of D-separation Equivalence Classes We want the representations to: Characterize the Independence Relations Entailed by the Equivalence Class Represent causal features that are shared by every member of the equivalence class

Penn State - March 23, Patterns & PAGs Patterns (Verma and Pearl, 1990): graphical representation of an acyclic d-separation equivalence - no latent variables. PAGs: (Richardson 1994) graphical representation of an equivalence class including latent variable models and sample selection bias that are d-separation equivalent over a set of measured variables X

Penn State - March 23, Patterns

Penn State - March 23, Patterns: What the Edges Mean

Penn State - March 23, Patterns

Penn State - March 23, PAGs: Partial Ancestral Graphs What PAG edges mean.

Penn State - March 23, PAGs: Partial Ancestral Graphs

Penn State - March 23, Overview of Search Methods Constraint Based Searches TETRAD Scoring Searches Scores: BIC, AIC, etc. Search: Hill Climb, Genetic Alg., Simulated Annealing Difficult to extend to latent variable models Heckerman, Meek and Cooper (1999). “A Bayesian Approach to Causal Discovery” chp. 4 in Computation, Causation, and Discovery, ed. by Glymour and Cooper, MIT Press, pp

Penn State - March 23, Search - Illustration

Penn State - March 23, Search: Adjacency

Penn State - March 23,

Penn State - March 23, Search: Orientation in Patterns

Penn State - March 23, Search: Orientation X1 || X2 X1 || X4 | X3 X2 || X4 | X3 After Orientation Phase

Penn State - March 23, The theory of interventions, simplified Start with an graphical causal model, without feedback. Simplest Problem: To predict the probability distribution of other represented variables resulting from an intervention that forces a value x on a variable X, (e.g., everybody has to smoke) but does not otherwise alter the causal structure.

Penn State - March 23, First Thing Remember: The probability distribution for values of Y conditional on X = x is not in general the same as the probability distribution for values of Y on an intervention that sets X = x. Recent work by Waldemann gives evidence that adults are sensitive to the difference.

Penn State - March 23, Example XYZWXYZW Because X influences Y, the value of X gives information about the value of Y, and vice versa. X and Y are dependent in probability. But: An intervention that forces a value Y = y on Y, and otherwise does not disturb the system should not change the probability distribution for values of X. It should, necessarily, make the value of Y independent of X— informally, the value of Y should give no information about the value of X, and vice-versa.

Penn State - March 23, Representing a Simple Manipulation Observed Structure: Structure upon Manipulating Yellow Fingers:

Penn State - March 23, Intervention Calculations XYZWXYZW 1.Set Y = y 2.Do “surgery” on the graph: eliminate edges into Y 3.Use the Markov factorization of the resulting graph and probability distribution to compute the probability distribution for X, Z, W—various effective rules incorporated in what Pearl calls the “Do” calculus.

Penn State - March 23, Intervention Calculations X Y= yZW Original Markov Factorization Pr(X, Y, Z, W) = Pr(W | X,Z) Pr(Z | Y) Pr(Y | X) Pr(X) The Factorization After Intervention Pr(X, Y, W | Do(Y = y) = Pr(W | X,Z) Pr(Z | Y = y) Pr(X)

Penn State - March 23, What’s The Point? Pr(X, Y, W | Do(Y = y) = Pr(W | X,Z) Pr(Z | Y = y) Pr(X) The probability distribution on the left hand side is a prediction of the effects of an intervention. The probabilities on the right are all known before the intervention. So causal structure plus probabilities => prediction of intervention effects; provide a basis for planning.

Penn State - March 23, Surprising Results XYZWXYZW Suppose we know: the causal structure the joint probability for Y, Z, W only—not for X We CAN predict the effect on Z of an intervention on Y—even though Y and W are confounded by unobserved X.

Penn State - March 23, Surprising Result XYZWXYZW The effect of Z on W is confounded by the the probabilistic effect of Y on Z, X on Y and X on W. But the probabilistic effect on W of an intervention on Z CAN be computed from the probabilities before the intervention. How? By conditioning on Y.

Penn State - March 23, Surprising Result XYZWXYZW Pr(W, Y, X | Do(Z = z )) = Pr(W | Z =z, X) Pr(Y | X) Pr(X) (by surgery) Pr(W, X | Do(Z = z), Y) = Pr(W | Z =z, X) Pr(X | Y) (condition on Y) Pr(W | Do(Z = z), Y) =  x Pr(W | Z = z, X, Y) Pr(X | Y) (marginalize out X) Pr(W | Do(Z = z), Y) = Pr(W | Z = z, Y = y) (obscure probability theorem) Pr(W | Do(Z = z)) =  y Pr(W | Z = z, Y) Pr(Y) (marginalize out Y) The right hand side is composed entirely of observed probabilities.

Penn State - March 23, Pearl’s “Do” Calculus Provides rules that permit one to avoid the probability calculations we just went through—graphical properties determine whether effects of an intervention can be predicted.

Penn State - March 23, Applications Spartina Grass Parenting among Single, Black Mothers Pneumonia Photosynthesis Lead - IQ College Retention Corn Exports Rock Classification College Plans Political Exclusion Satellite Calibration Naval Readiness

Penn State - March 23, References Causation, Prediction, and Search, 2 nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press Causality in Crisis?, (1997) V. McKim and S. Turner (eds.), Univ. of Notre Dame Press. TETRAD IV: Web Course on Causal and Statistical Reasoning : Causality Lab: