1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

Slides:



Advertisements
Similar presentations
1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.
Advertisements

Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
Topic Outline Motivation Representing/Modeling Causal Systems
Weakening the Causal Faithfulness Assumption
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003.
Peter Spirtes, Jiji Zhang 1. Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated.
The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
Introduction of Probabilistic Reasoning and Bayesian Networks
Challenges posed by Structural Equation Models Thomas Richardson Department of Statistics University of Washington Joint work with Mathias Drton, UC Berkeley.
Learning Causality Some slides are from Judea Pearl’s class lecture
1 Automatic Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour Dept. of Philosophy & CALD Carnegie Mellon.
Finding Optimal Bayesian Networks with Greedy Search
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
Bayesian Network Representation Continued
Mediating Between Causes and Probabilities: the Use of Graphical Models in Econometrics Alessio Moneta Max Planck Institute of Economics, Jena, and Sant’Anna.
Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson.
Ambiguous Manipulations
Learning Equivalence Classes of Bayesian-Network Structures David M. Chickering Presented by Dmitry Zinenko.
Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.
6. Gene Regulatory Networks
1 gR2002 Peter Spirtes Carnegie Mellon University.
Bayesian Networks Alan Ritter.
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.
Bayes Net Perspectives on Causation and Causal Inference
1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
1 Tetrad: Machine Learning and Graphcial Causal Models Richard Scheines Joe Ramsey Carnegie Mellon University Peter Spirtes, Clark Glymour.
A Brief Introduction to Graphical Models
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.
1 Tutorial: Causal Model Search Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Joe Ramsey, others.
1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Nov. 13th, Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Penn State - March 23, The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.
Talking Points Joseph Ramsey. LiNGAM Most of the algorithms included in Tetrad (other than KPC) assume causal graphs are to be inferred from conditional.
Lecture 2: Statistical learning primer for biologists
1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:
Pattern Recognition and Machine Learning
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
1 CONFOUNDING EQUIVALENCE Judea Pearl – UCLA, USA Azaria Paz – Technion, Israel (
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
Variable selection in Regression modelling Simon Thornley.
1Causal Inference and the KMSS. Predicting the future with the right model Stijn Meganck Vrije Universiteit Brussel Department of Electronics and Informatics.
1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.
Day 3: Search Continued Center for Causal Discovery June 15, 2015
Workshop Files
Markov Properties of Directed Acyclic Graphs
Causal Inference and Graphical Models
Center for Causal Discovery: Summer Short Course/Datathon
Causal Data Mining Richard Scheines
Inferring Causal Graphs
Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour,
Extra Slides.
Searching for Graphical Causal Models of Education Data
Presentation transcript:

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery

Outline Day 2: Search 1.Bridge Principles: Causation  Probability 2.D-separation 3.Model Equivalence 4.Search Basics (PC, GES) 5.Latent Variable Model Search (FCI) 6.Examples 2

3 Causal Structure Testable Statistical Predictions Causal Graphs e.g., Conditional Independence X _||_ Z | Y  x,y,z P(X = x, Z=z | Y=y) = P(X = x | Y=y) P(Z=z | Y=y)

4 Bridge Principles: Acyclic Causal Graph over V  Constraints on P(V) Weak Causal Markov Assumption V 1,V 2 causally disconnected  V 1 _||_ V 2 V 1,V 2 causally disconnected  i.V 1 not a cause of V 2, and ii.V 1 not an effect of V 2, and iii.No common cause Z of V 1 and V 2

5 Bridge Principles: Acyclic Causal Graph over V  Constraints on P(V) Weak Causal Markov Assumption V 1,V 2 causally disconnected  V 1 _||_ V 2 Causal Markov Axiom If G is a causal graph, and P a probability distribution over the variables in G, then in satisfy the Markov Axiom iff: every variable V is independent of its non-effects, conditional on its immediate causes. Determinism (Structural Equations)

6 Causal Markov Axiom Acyclicity d-separation criterion Graphical Independence Oracle Causal Graph Z X Y1Y1 Z _||_ Y 1 | XZ _||_ Y 2 | X Z _||_ Y 1 | X,Y 2 Z _||_ Y 2 | X,Y 1 Y 1 _||_ Y 2 | XY 1 _||_ Y 2 | X,Z Y2Y2 Bridge Principles: Acyclic Causal Graph over V  Constraints on P(V)

7 D-separation Undirected Paths Colliders vs. Non-Colliders

8 D-separation: Undirected Paths X Y Z1Z1 Z2Z2 V W U Undirected Path from X to Y: any sequence of edges beginning with X and ending at Y in which no edge repeats Paths from X to Y:

9 D-separation: Undirected Paths X Y Z1Z1 Z2Z2 V W U Undirected Path from X to Y: any sequence of edges beginning with X and ending at Y in which no edge repeats Paths from X to Y: 1) X  V  Y

10 D-separation: Undirected Paths X Y Z1Z1 Z2Z2 V W U Undirected Path from X to Y: any sequence of edges beginning with X and ending at Y in which no edge repeats Paths from X to Y: 1) X  V  Y 2) X  Y

11 D-separation: Undirected Paths X Y Z1Z1 Z2Z2 V W U Undirected Path from X to Y: any sequence of edges beginning with X and ending at Y in which no edge repeats Paths from X to Y: 1) X  V  Y 2) X  Y 3) X  Z1  W  Y

12 D-separation: Undirected Paths X Y Z1Z1 Z2Z2 V W U Undirected Path from X to Y: any sequence of edges beginning with X and ending at Y in which no edge repeats Paths from X to Y: 1) X  V  Y 2) X  Y 3) X  Z1  W  Y 4) X  Z1  W  U  Y

13 D-separation: Undirected Paths X Y Z1Z1 Z2Z2 V W U Undirected Path from X to Y: any sequence of edges beginning with X and ending at Y in which no edge repeats Paths from X to Y: 1) X  V  Y 2) X  Y 3) X  Z1  W  Y 4) X  Z1  W  U  Y 5) X  Z1  Z2  U  Y

14 D-separation: Undirected Paths X Y Z1Z1 Z2Z2 V W U Undirected Path from X to Y: any sequence of edges beginning with X and ending at Y in which no edge repeats Paths from X to Y: 1) X  V  Y 2) X  Y 3) X  Z1  W  Y 4) X  Z1  W  U  Y 5) X  Z1  Z2  U  Y 6) X  Z1  Z2  U  W  Y

15 D-separation: Undirected Paths X Y Z1Z1 Z2Z2 V W U Undirected Path from X to Y: any sequence of edges beginning with X and ending at Y in which no edge repeats Illlegal Path from X to Y: 1) X  Z1  Z2  U  W  Z1  Z2  U  Y

16 Colliders Y: Collider Shielded Unshielded X Y Z X Y Z X Y Z Y: Non-Collider X Y Z X Y Z X Y Z

17 A variable is or is not a collider on a path X Y Z1Z1 Z2Z2 V W U X  Z1  W  U  Y Variable: U Paths from X to Y Paths on which U is a non-collider:

18 Colliders – a variable on a path X Y Z1Z1 Z2Z2 V W U Variable: U Paths from X to Y Paths on which U is a non-collider: Path on which U is a collider: X  Z1  W  U  Y X  Z1  Z2  U  Y

19 Colliders – a variable on a path X Y Z1Z1 Z2Z2 V W U Variable: U Paths from X to Y Paths on which U is a non-collider: Path on which U is a collider: X  Z1  W  U  Y X  Z1  Z2  U  Y X  Z1  Z2  U  W  Y

20 Conditioning on Colliders induce Association Gas [y,n] Battery [live, dead] Car Starts [y,n] Gas _||_ Battery Gas _||_ Battery | Car starts = noExp _||_ Symptoms | Infection Exp_||_ Symptoms Exp [y,n] Symptoms [yes, no] Infection [y,n] Conditioning on Non-Colliders screen-off Association

21 D-separation X is d-separated from Y by Z in G iff Every undirected path between X and Y in G is inactive relative to Z An undirected path is inactive relative to Z iff any node on the path is inactive relative to Z A node N (on a path) is inactive relative to Z iff a) N is a non-collider in Z, or b) N is a collider that is not in Z, and has no descendant in Z X Y Z1Z1 Z2Z2 V W Undirected Paths between X, Y: 1) X  Z 1  W  Y 2) X  V  Y A node N (on a path) is active relative to Z iff a) N is a non-collider not in Z, or b) N is a collider that is in Z, or has a descendant in Z X d-sep Y relative to Z =  ?

22 D-separation X is d-separated from Y by Z in G iff Every undirected path between X and Y in G is inactive relative to Z An undirected path is inactive relative to Z iff any node on the path is inactive relative to Z A node N (on a path) is inactive relative to Z iff a) N is a non-collider in Z, or b) N is a collider that is not in Z, and has no descendant in Z X Y Z1Z1 Z2Z2 V W X  Z 1  W  Y active? 1)Z1 active? 2)W active? A node N (on a path) is active relative to Z iff a) N is a non-collider not in Z, or b) N is a collider that is in Z, or has a descendant in Z X d-sep Y relative to Z =  ? No Yes No

23 D-separation X is d-separated from Y by Z in G iff Every undirected path between X and Y in G is inactive relative to Z An undirected path is inactive relative to Z iff any node on the path is inactive relative to Z A node N (on a path) is inactive relative to Z iff a) N is a non-collider in Z, or b) N is a collider that is not in Z, and has no descendant in Z X Y Z1Z1 Z2Z2 V W X  V  Y active? 1)V active? A node N (on a path) is active relative to Z iff a) N is a non-collider not in Z, or b) N is a collider that is in Z, or has a descendant in Z X d-sep Y relative to Z =  ? Yes No

D-separation X is d-separated from Y by Z in G iff Every undirected path between X and Y in G is inactive relative to Z An undirected path is inactive relative to Z iff any node on the path is inactive relative to Z A node N is inactive relative to Z iff a) N is a non-collider in Z, or b) N is a collider that is not in Z, and has no descendant in Z X Y Z1Z1 Z2Z2 V W X d-sep Y relative to Z = {W, Z 2 } ? Undirected Paths between X, Y: 1) X  Z 1  W  Y 2) X  V  Y A node N (on a path) is active relative to Z iff a) N is a non-collider not in Z, or b) N is a collider that is in Z, or has a descendant in Z

D-separation X is d-separated from Y by Z in G iff Every undirected path between X and Y in G is inactive relative to Z An undirected path is inactive relative to Z iff any node on the path is inactive relative to Z A node N is inactive relative to Z iff a) N is a non-collider in Z, or b) N is a collider that is not in Z, and has no descendant in Z X Y Z1Z1 Z2Z2 V W X d-sep Y relative to Z = {W, Z 2 } ? 1) X  Z 1  W  Y Z1 active? A node N (on a path) is active relative to Z iff a) N is a non-collider not in Z, or b) N is a collider that is in Z, or has a descendant in Z W active? Yes No

26 D-separation X Y Z1Z1 Z2Z2 X d-sep Z 2 given  ? X d-sep Z 2 given {Z 1 } ? X YZ1Z1 Z2Z2 X d-sep Y given  ? X d-sep Y given {Z 1 } ? No

27 Statistical Control ≠ Experimental Control Experimentally control for X 2 D-separation + Intervention: I Question: Does X 1 directly cause X 3 ? Truth: No, X 2 mediates How to find out?

28 Statistical Control ≠ Experimental Control No! X 3 _||_ X 1 | X 2 Yes: X 3 _||_ X 1 | X 2 (set) Statistically control for X 2 Experimentally control for X 2 D-separation + Intervention: I X 3 d-sep X 1 by {X 2 } ??? X 3 d-sep X 1 by {X 2 set} ???

29 Break

30 Equivalence Classes Independence (d-separation equivalence) DAGs : Patterns PAGs : Partial Ancestral Graphs Intervention Equivalence Classes Measurement Model Equivalence Classes Linear Non-Gaussian Model Equivalence Classes Etc. Equivalence: Independence Equivalence: M 1 ╞ (X _||_ Y | Z)  M 2 ╞ (X _||_ Y | Z) Distribution Equivalence:  1  2 M 1 (  1 ) = M 2 (  2 ), and vice versa)

31 D-separation Equivalence Theorem (Verma and Pearl, 1988) Two acyclic graphs over the same set of variables are d-separation equivalent iff they have: the same adjacencies the same unshielded colliders d-separation/Independence Equivalence

32 Colliders Y: Collider Shielded Unshielded X Y Z X Y Z X Y Z Y: Non-Collider X Y Z X Y Z X Y Z

33 D-separation Equivalence Theorem (Verma and Pearl, 1988) Two acyclic graphs over the same set of variables are d-separation equivalent iff they have: the same adjacencies the same unshielded colliders d-separation/Independence Equivalence Exercises 1)Create a 4-variable DAG 2)Specify a 1-edge variant that is equivalent 3)Specify a 1-edge variant that is not 4)Show with IM and Estimators that you have succeeded in steps 2 and 3

34 Independence Equivalence Classes: Patterns & PAGs Patterns (Verma and Pearl, 1990): graphical representation of d-separation equivalence class (among models with no latent common causes) PAGs: (Richardson 1994) graphical representation of a d- separation equivalence class that includes models with latent common causes and sample selection bias that are d-separation equivalent over a set of measured variables X

35 Patterns

36 Patterns: What the Edges Mean

37 Patterns

38 Patterns Specify all the causal graphs represented by the Pattern: 1) 2) ??

39 Patterns Specify all the causal graphs represented by the Pattern: 1) 2)

40 Tetrad Demo: Generating Patterns

41 Break

42 Causal Search Spaces are Large

43 Experimental Setup(V) V = {Obs, Manip} P(Manip) P Manip (V) Data Discovery Algorithm Causal Knowledge e.g., Markov Equivalence Class of Causal Graphs Causal Search as a Method General Assumptions -Markov, -Faithfulness -Linearity -Gaussianity -Acyclicity Background Knowledge -Salary  Gender -Infection  Symptoms Statistical Inference

44 For Example Statistical Inference Background Knowledge X 2 prior in time to X 3 Passive Observation General Assumptions Markov, Faithfulness, No latents, no cycles,

45 Faithfulness Constraints on a probability distribution P generated by a causal structure G hold for all parameterizations of G. Revenues :=  1 Rate +  2 Economy +  Rev Economy :=  3 Rate +  Econ Faithfulness:  1 ≠ -  3  2  2 ≠ -  3  1 Tax Rate Economy Tax Revenues 11 33 22

46 Faithfulness Constraints on a probability distribution P generated by a causal structure G hold for all parameterizations of G. All and only the constraints that hold in P(V) are entailed by the causal structure G(V), rather than lower dimensional surfaces in the parameter space. Causal Markov Axiom: X and Y causally disconnected ╞ X _|| _ Y Faithfulness: X and Y causally disconnected ╡ X _|| _ Y

47 Challenges to Faithfulness Gene A Gene B Protein By evolutionary design: Gene A _||_ Protein 24 Air Temp Core Body Temp Homeostatic Regulator By evolutionary design: Air temp _||_ Core Body Temp Sampling Rate vs. Equilibration rate

48 Search Methods Constraint Based Searches PC, FCI Pointwise, but not uniformly consistent Scoring Searches Scores: BIC, AIC, etc. Search: Hill Climb, Genetic Alg., Simulated Annealing Difficult to extend to latent variable models Meek and Chickering Greedy Equivalence Class (GES) Pointwise, but not uniformly consistent Latent Variable Psychometric Model Search BPC, MIMbuild, etc. Linear non-Gaussian models (Lingam) Models with cycles And more!!!

49 Score Based Search Background Knowledge e.g., X 2 prior in time to X 3 Model Score Model Scores: AIC, BIC, etc.