Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

Slides:

Advertisements

Similar presentations

1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.

Advertisements

A Tutorial on Learning with Bayesian Networks

Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.

1. Person 1 1.Stress 2.Depression 3. Religious Coping Task: learn causal model 2 Data from Bongjae Lee, described in Silva et al

Topic Outline Motivation Representing/Modeling Causal Systems

Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

Weakening the Causal Faithfulness Assumption

Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Simon Fraser University Vancouver, Canada ` with Wei.

Peter Spirtes, Jiji Zhang 1. Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated.

. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.

The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada

Introduction of Probabilistic Reasoning and Bayesian Networks

Learning Causality Some slides are from Judea Pearl’s class lecture

1 Automatic Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour Dept. of Philosophy & CALD Carnegie Mellon.

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Mediating Between Causes and Probabilities: the Use of Graphical Models in Econometrics Alessio Moneta Max Planck Institute of Economics, Jena, and Sant’Anna.

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

Independent Component Analysis (ICA) and Factor Analysis (FA)

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Learning Equivalence Classes of Bayesian-Network Structures David M. Chickering Presented by Dmitry Zinenko.

1 gR2002 Peter Spirtes Carnegie Mellon University.

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.

Bayes Net Perspectives on Causation and Causal Inference

1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.

Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey.

1 Tetrad: Machine Learning and Graphcial Causal Models Richard Scheines Joe Ramsey Carnegie Mellon University Peter Spirtes, Clark Glymour.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.

1 Tutorial: Causal Model Search Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Joe Ramsey, others.

Nov. 13th, Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Penn State - March 23, The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

Slides for “Data Mining” by I. H. Witten and E. Frank.

The Simple Linear Regression Model: Specification and Estimation ECON 4550 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s.

Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Talking Points Joseph Ramsey. LiNGAM Most of the algorithms included in Tetrad (other than KPC) assume causal graphs are to be inferred from conditional.

Lecture 2: Statistical learning primer for biologists

The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.

Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

1 Acceleration of Inductive Inference of Causal Diagrams Olexandr S. Balabanov Institute of Software Systems of NAS of Ukraine

Mediation: The Causal Inference Approach David A. Kenny.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

1Causal Inference and the KMSS. Predicting the future with the right model Stijn Meganck Vrije Universiteit Brussel Department of Electronics and Informatics.

1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.

CS 2750: Machine Learning Review

Day 3: Search Continued Center for Causal Discovery June 15, 2015

ACM SIGKDD 2017: Workshop on Causal Discovery

Markov Properties of Directed Acyclic Graphs

Center for Causal Discovery: Summer Short Course/Datathon

Center for Causal Discovery: Summer Short Course/Datathon

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Causal Data Mining Richard Scheines

Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour,

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Searching for Graphical Causal Models of Education Data

Presentation transcript:

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI 1

Outline Search I: Causal Bayes Nets 1)Bridge Principles: Causal Structure  Testable Statistical Constraints 2)Equivalence Classes 3)Pattern Search 4)PAG Search 5)Variants 6)Simulation Studies on the Tetrad workbench 2

3 Bridge Principles: Acyclic Causal Graph over V  Constraints on P(V) Weak Causal Markov Assumption V 1,V 2 causally disconnected  V 1 _||_ V 2 V 1 _||_ V 2   v 1,v 2 P(V 1 =v 1 | V 2 = v 2 ) = P(V 1 =v 1 )

4 Bridge Principles: Acyclic Causal Graph over V  Constraints on P(V) Weak Causal Markov Assumption V 1,V 2 causally disconnected  V 1 _||_ V 2 Causal Markov Axiom If G is a causal graph, and P a probability distribution over the variables in G, then in satisfy the Markov Axiom iff: every variable V is independent of its non-effects, conditional on its immediate causes. Determinism (Structural Equations)

5 Causal Markov Axiom Acyclicity d-separation criterion Independence Oracle Causal Graph Z X Y1Y1 Z _||_ Y 1 | XZ _||_ Y 2 | X Z _||_ Y 1 | X,Y 2 Z _||_ Y 2 | X,Y 1 Y 1 _||_ Y 2 | XY 1 _||_ Y 2 | X,Z Y2Y2 Bridge Principles: Acyclic Causal Graph over V  Constraints on P(V)

6 Faithfulness Constraints on a probability distribution P generated by a causal structure G hold for all parameterizations of G. Revenues :=  1 Rate +  2 Economy +  Rev Economy :=  3 Rate +  Econ Faithfulness:  1 ≠ -  3  2  2 ≠ -  3  1 Tax Rate Economy Tax Revenues 11 33 22

7 Equivalence Classes Independence (d-separation equivalence) DAGs : Patterns PAGs : Partial Ancestral Graphs Intervention Equivalence Classes Measurement Model Equivalence Classes Linear Non-Gaussian Model Equivalence Classes Etc. Equivalence: Independence Equivalence: M 1 ╞ (X _||_ Y | Z)  M 2 ╞ (X _||_ Y | Z) Distribution Equivalence:  1  2 M 1 (  1 ) = M 2 (  2 ), and vice versa)

8 D-separation Equivalence Theorem (Verma and Pearl, 1988) Two acyclic graphs over the same set of variables are d-separation equivalent iff they have: the same adjacencies the same unshielded colliders d-separation/Independence Equivalence

9 Colliders Y: Collider Shielded Unshielded X Y Z X Y Z X Y Z Y: Non-Collider X Y Z X Y Z X Y Z

10 D-separation X is d-separated from Y by Z in G iff Every undirected path between X and Y in G is inactive relative to Z An undirected path is inactive relative to Z iff any node on the path is inactive relative to Z A node N is inactive relative to Z iff a) N is a non-collider in Z, or b) N is a collider that is not in Z, and has no descendant in Z X Y Z1Z1 Z2Z2 V W Undirected Paths between X, Y: 1) X --> Z 1 Y 2) X Y

11 D-separation X is d-separated from Y by Z in G iff Every undirected path between X and Y in G is inactive relative to Z An undirected path is inactive relative to Z iff any node on the path is inactive relative to Z A node N is inactive relative to Z iff a) N is a non-collider in Z, or b) N is a collider that is not in Z, and has no descendant in Z X Y Z1Z1 Z2Z2 V W Undirected Paths between X, Y: 1) X --> Z 1 Y 2) X Y X d-sep Y relative to Z = {V} ? X d-sep Y relative to Z = {V, Z 1 } ? X d-sep Y relative to Z = {W, Z 2 } ? No Yes No X d-sep Y relative to Z =  ? Yes

12 D-separation X 3 and X 1 d-sep by X 2 ? Yes: X 3 _||_ X 1 | X 2 X 3 and X 1 d-sep by X 2 ? No: X 3 _||_ X 1 | X 2

13 Statistical Control ≠ Experimental Control X 3 _||_ X 1 | X 2 X 3 _||_ X 1 | X 2 (set)

14 Independence Equivalence Classes: Patterns & PAGs Patterns (Verma and Pearl, 1990): graphical representation of d-separation equivalence among models with no latent common causes (i.e., causally sufficient models) PAGs: (Richardson 1994) graphical representation of a d-separation equivalence class that includes models with latent common causes and sample selection bias that are Markov equivalent over a set of measured variables X

15 Patterns

16 Patterns: What the Edges Mean

17 Patterns

18 Tetrad Demo 1)Load Session: patterns1.tet 2)Change Graph3 minimally to reduce number of equivalent DAGs maximally 3)Compute the DAGs that are equivalent to your original 3 variable DAG

19 Constraint Based Search Background Knowledge e.g., X 2 prior in time to X 3 Statistical Inference

20 Score Based Search Background Knowledge e.g., X 2 prior in time to X 3 Model Score

21 Overview of Search Methods Constraint Based Searches TETRAD (PC, FCI) Very fast – capable of handling 1,000 variables Pointwise, but not uniformly consistent Scoring Searches Scores: BIC, AIC, etc. Search: Hill Climb, Genetic Alg., Simulated Annealing Difficult to extend to latent variable models Meek and Chickering Greedy Equivalence Class (GES) Very slow – max N ~ Pointwise, but not uniformly consistent

22 Tetrad Demo 1)Open new session 2)Template: Search from Simulated Data 3)Create Graph, parameterize, instantiate, generate data N=50 4)Choose PC search, execute 5)Attach new search node, choose GES, execute 6)Play (sample size, parameters, alpha value, etc.)

23 Tetrad Demo 1)Open new session 2)Load Charity.txt 3)Create Knowledge: a.Tangibility is exogenous b.AmountDonate is Last c.Tangibility direct cause of Imaginability 4)Perform Search 5)Estimate output

24 PAGs: Partial Ancestral Graphs

25 PAGs: Partial Ancestral Graphs

26 PAGs: Partial Ancestral Graphs What PAG edges mean.

27 1) Adjacency 2) Orientation Constraint-based Search

Constraint-based Search: Adjacency 1.X and Y are adjacent if they are dependent conditional on all subsets that don’t include them 2.X and Y are not adjacent if they are independent conditional on any subset that doesn’t include them

Search: Orientation Patterns Y Unshielded X Y Z X _||_ Z | Y Collider Non-Collider X Y Z XY Z X Y Z X Y Z X Y Z

Search: Orientation PAGs Y Unshielded X Y Z X _||_ Z | Y Collider Non-Collider X Y Z X Y Z

Search: Orientation Away from Collider

Search: Orientation X1 || X2 X1 || X4 | X3 X2 || X4 | X3 After Orientation Phase

34 Interesting Cases X Y Z L X Y Z2 L1 M1 M2 M3 Z1 L2 X1 Y2 L1 Y1 X2

35 Tetrad Demo 1)Open new session 2)Create graph for M1, M2, M3 on previous slide 3)Search with PC and FCI on each graph, compare results

36 Tetrad Demo 1)Open new session 2)Load data: regression_data 3)X is “putative cause”, Y is putative effect, Z1,Z2 prior to both (potential confounders) 4)Use regression to estimate effect of X on Y 5)Apply FCI search to data

37 Variants 1)CPC, CFCI 2)Lingam

LiNGAM 1.Most of the algorithms included in Tetrad (other than KPC) assume causal graphs are to be inferred from conditional independence tests. 2.Usually tests that assume linearity and Gaussianity. 3.LiNGAM uses a different approach. 4.Assumes linearity and non-Gaussianity. 5.Runs Independent Components Analysis (ICA) to estimate the coefficient matrix. 6.Rearranges the coefficient matrix to get a causal order. 7.Prunes weak coefficients by setting them to zero.

ICA  Although complicated, the basic idea is very simple.  a11 X a1n Xn = e1 ...  an1 X ann Xn = en  Assume e1,...,en are i.i.d.  Try to maximize the non-Gaussianity of w1 X wn Xn = ?  There are n ways to do it up to symmetry! (Cf. Central Limit Theorem, Hyavarinen et al., 2002)  You can use the coefficients for e1, or for e2, or for...  All other linear combinations of e1,...,en are more Gaussian.

ICA  This equation is usually denoted Wx = s  But also X = BX + s where B is the coefficient matrix  So Wx = (I – B)x = e  s is the vector of independent components  x is the vector of variables  Just showed that under strong conditions we can estimate W.  So we can estimate B! (But with unknown row order)  Using assumptions of linearity and non-Gaussianity (of all but one variable) alone.  More sophisticated analyses allow errors to be non-i.i.d.

LiNGAM  LiNGAM runs ICA to estimate the coefficient matrix B.  The order of the errors is not fixed by ICA, so some rearranging of the B matrix needs to be done.  Rows of the B matrix are swapped so the it is lower triangular.  a[i][j] should be non-zero (representing an edge) just in case i  j  Typically, a cutoff is used to determine if a matrix element is zero.  The rearranged matrix corresponds to the idea of a causal order.

LiNGAM  Once you know which nodes are adjacent in the graph and what the causal order is, you can infer a complete DAG.  Review:  Use data from a linear non-Gaussian model (all but one variable non- Gaussian)  Infer a complete DAG (more than a pattern!)

Hands On 1)Attach a Generalized SEM IM. 2)Attach a data set, simulate 1000 points. 3)Attach a Search box and run LiNGAM. 4)Attach another search box to Data and run PC. 5)Compare PC to LiNGAM.

Special Variants of Algorithms  PC Pattern  PC Pattern enforces the requirement that the output of the algorithm will be a pattern.  PCD  PCD adds corrective code to PC for the case where some variables stand in deterministic relationships.  This results in fewer edges being removed from the graph.  For example, if X _||_ Y | Z but Z determines Y, X---Y is not taken out.

Special Variants of Algorithms  CPC  The PC algorithm may jump too quickly to the conclusion that a collider and noncolliders should be oriented, X->Y<-Z, X---Y---Z  The CPC algorithm uses a much more conservative test for colliders and noncolliders, double and triple checking to make sure they should be oriented, against different adjacents to X and to Z.  The result is a graph with fewer but more accurate orientations.

Hands On 1.Simulate data from a “complicated” DAG using a SEM IM. 1.Choose the Search from Simulated Data item from the Templates menu. 2.Make a random 20 node 20 edge DAG. 3.Parameterize as a linear SEM, accepting defaults. 4.Run CPC. 5.Attach another search box to data. 6.Run PC. 7.Layout the PC graph using Fruchterman-Reingold. 8.Copy the layout to the CPC graph. 9.Open PC and CPC simultaneously and note the differences.

Special Variants of Algorithms 1.CFCI 1.Same idea as for CPC but for FCI instead. 2.KPC 1.The PC algorithm typically uses independence tests that assume linearity. 2.The KPC algorithm makes two changes: 1.It uses a non-parametric independence test. 2.It adds some steps to orient edges that are unoriented in the PC pattern.

Special Variants of Algorithms 1.PcLiNGAM 1.If some variables are Gaussian (more than one), others non- Gaussian, this algorithm applies. 2.Runs PC, then orients the unoriented edges (if possible) using non- Gaussianity. 2.LiNG 1.Extends LiNGAM to orient cycles using non-Gaussianity

Special Variants of Algorithms 1.JCPC 1.Uses a Markov blanket style test to add/remove individual edges, using CPC style orientation. 2.Allows individual adjacencies in the graph to be revised from the initial estimate using the PC adjacency search.

50 Simulation Studies with Tetrad