Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayes Net Perspectives on Causation and Causal Inference

Similar presentations


Presentation on theme: "Bayes Net Perspectives on Causation and Causal Inference"— Presentation transcript:

1 Bayes Net Perspectives on Causation and Causal Inference
Peter Spirtes Thank organizers

2 Example Problems Genetic regulatory networks
Yeast – ~5000 genes, ~2,500,000 potential edges A gene regulatory network in mouse embryonic stem cells point out Rcor2, Oct4 change yeast to mouse fmri in brain climate science social networks

3 Causal Models → Predictions
Probabilistic – Among the cells that have active Oct4 what percentage have active Rcor2? Causal – If I experimentally set a cell to have active Oct4, what percentage will have active Rcor2? 3 levels of prediction

4 Causal Models → Predictions
Counterfactual – Among the cells that did not have active Oct4 at t-1, what percentage would have active Rcor2 if I had experimentally set a cell to have active Oct4 at t-1?

5 Data → Causal Models Large number of variables
Small observed sample size Overlapping variables Small number of experiments Feedback Hidden common causes Selection bias Many kinds of entities causally interacting

6 Outline Bayesian Networks Search
Limitations and Extensions of Bayesian Networks Dynamic Relational Cycles Counterfactual what Bayesian networks are for search problems limitations of standard bayesian networks – what needs work and recent research

7 Directed Acyclic Graph (DAG)
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Directed Acyclic Graph (DAG) SES SEX PE CP IQ SES – Socioeconomic Status PE – Parental Encouragement CP – College Plans IQ – Intelligence Quotient SEX – Sex Point to DAG terminology State vertices are random variables first. Random variables in italics, sets of random variables in bold Then give list of random variables. Sewell and Shah – 1968, high school seniors The vertices are random variables. All edges are directed. There are no directed cycles.

8 Population Bayesian Networks Search Limitations and Extensions
Dynamic Relational Cycles Counterfactual Population SES SEX PE CP IQ SES SEX PE CP IQ SES SEX PE CP IQ connect to set of probability distributions and causal relations Independent, identically distributed

9 P Factoring According to G
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual P Factoring According to G SES SEX PE CP IQ P(SES,SEX,PE,IQ,CP) = P(SEX)P(SES)P(IQ|SES) P(PE|SES,SEX,IQ) P(CP|PE,SES,IQ) If then P factors according to G G represents all of the distributions that factor according to G what the parameters are can’t measure the causal relations directly – could do experiments; but if can’t give relationship between Causal and probabilistic interpretations so can infer as much as possible about DAG from samples from probability distribution

10 Conditional Independence
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Conditional Independence X is independent of Y conditional on Z (denoted IP(X,Y|Z)) iff P(X|Y,Z) = P(X|Z). IP(CP,SEX|{SES,IQ,PE}) iff P(CP|{SES,IQ,PE,SEX}) = P(CP|{SES,IQ,PE}) notation

11 Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Graphical Entailment If for every P that factors according to G, IP(X,Y|Z) holds, then G entails I(X,Y|Z). Examples: G entails I(IQ,SEX|∅) I(IQ,SEX|SES) Can read entailments off of graph through d-separation SES SEX PE CP IQ First is local Markov. Second is entailed by local Markov. Can read off of graph.

12 D-separation and D-connection
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual D-separation and D-connection X d-separated from Y conditional on Z in G iff G entails X independent of Y conditional on Z D-separation between X and Y conditional on Z holds when certain kinds of paths do not exist between X and Y SES SEX PE CP IQ won’t give full definition here because too complicated – trust me SES and SEX d-separated conditional on empty set. For conditioning on empty set no path without a collider. SES and SEX d-connected conditional on PE. Path with collider that contains conditioning set. D-connection (the negation of d-separation) between X and Y conditional on Z holds when certain kinds of paths do exist between X and Y

13 Definition of D-connection
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Definition of D-connection A node X is active on a path U conditional on Z iff X is a collider (→ X ←) and there is a directed path from X to a member of Z or X is in Z; or X is not a collider and X is not in Z. SES SEX PE CP IQ won’t give full definition here because too complicated – trust me SES and SEX d-separated conditional on empty set. For conditioning on empty set no path without a collider. SES and SEX d-connected conditional on PE. Path with collider that contains conditioning set. SES → IQ → PE ← SEX is a path U. PE is active on U conditional on {CP, IQ}. IQ is inactive on U conditional on {CP, IQ}.

14 Definition of D-connection
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Definition of D-connection SES SEX PE CP IQ A path U is active conditional on Z iff every vertex on U is active relative to Z. X is d-connected to Y conditional on Z iff there is an active path between X and Y conditional on Z. won’t give full definition here because too complicated – trust me SES and SEX d-separated conditional on empty set. For conditioning on empty set no path without a collider. SES and SEX d-connected conditional on PE. Path with collider that contains conditioning set. SES → IQ → PE ← SEX is inactive conditional on {CP, IQ}. SES is d-connected to SEX conditional on {CP, IQ} because SES → PE ← SEX is active conditional on {CP, IQ}

15 Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual If I is Not Entailed by G SES SEX PE CP IQ If conditional independence relation I is not entailed by G, then I may hold in some (but not every) distribution P that factors according to G. for some values of the paramters, but not for others Example: There are P and P’ that factor according to G such that ~IP(SES,CP|∅) and IP’(SES,CP|∅). P’ is said to be unfaithful to G.

16 Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Manipulations An ideal manipulation assigns a density to a set X of properties (random variables) as a function of the values of a set Z of properties (random variables) Directly affects only the variables in X Successful Example – randomized experiment can’t tell if particular action is an ideal manipulation

17 Manipulations and Causal Graph
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Manipulations and Causal Graph There is an edge SES → CP in G because there are two ways of manipulating {SES,SEX,IQ,PE} that differ only in the value they assign to SES that changes the probability of CP. SES SEX PE CP IQ not defining causal in terms of non-causal, but giving relationships of causal terms relation to experiments Stable Unit Treatment Value Assumption

18 Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Causal Sufficiency SES SEX PE CP IQ A set S of variables is causally sufficient if there are no variables not in S that are direct causes of more than one variable in S. S = {SES,IQ} is causally sufficient. S = {SES,PE,CP} is not causally sufficient.

19 Causal Markov Assumption
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Causal Markov Assumption In a population Pop with distribution P and causal graph G, if V is causally sufficient, P(V) factors according to G. P(SES,SEX,PE,IQ,CP) = P(SEX)P(SES)P(IQ|SES) P(PE|SES,SES,IQ) P(CP|PE,SES,IQ) SES SEX PE CP IQ the reason it is called a Markov assumption is because in an equivalent form – related to Reichenbach common cause this relates the causal interpretation of the graph and the probabilitistic representation of the graph

20 Representation of Manipulation
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Representation of Manipulation P(SES,SEX,PE=1,IQ,CP||PE=1) = P(SEX)P(SES)P(IQ|SES) * 1 * P(CP|PE,SES,IQ) = P(SES,SEX,PE=1,IQ,CP)/P(PE|SEX,SES,IQ) SES SEX PE CP IQ truncation division

21 Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual FCI Algorithm Looks for set of DAGs (possibly with latent variables and selection bias) that entail all and only the conditional independence relations that hold in the data according to statistical tests.

22 Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Markov Equivalence Two DAGs G1 and G2 are Markov equivalent when they contain the same variables, and for all disjoint X, Y, Z, X is entailed to be independent from Y conditional on Z in G1 if and only if X is entailed to be independent from Y conditional on Z in G2 this does not mean that they represent the same set of dsitributions in general, although it does for certain case equivalently, every distribution that factors according to G1 also factors according to G2 and vice-versa

23 Markov Equivalence Class
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Markov Equivalence Class SES SEX PE CP IQ SES SEX PE CP IQ Can’t tell difference if using conditional independence DAG G DAG G’

24 Causal Faithfulness Assumption
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Causal Faithfulness Assumption  In a population Pop with causal graph G and distribution P(V), if V is causally sufficient, IP(X,Y|Z) only if G entails I(X,Y|Z). ~IP(SES,CP|∅) because I(SES,CP|∅)is not entailed by G +… SES SEX PE CP IQ

25 Causal Faithfulness Assumption
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Causal Faithfulness Assumption  Causal Faithfulness is too strong because can prove consistency with assumptions about fewer conditional independencies is unlikely to hold, especially when there are many variables. SES SEX PE CP IQ  Causal Faithfulness is too weak because it is not sufficient to prove uniform consistency (put error bounds at finite sample sizes.)

26 Good Features of FCI Algorithm
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Good Features of FCI Algorithm Is pointwise consistent: As sample size → ∞, P(error in output pattern) → 0. Can be applied to distributions where tests of conditional independence are known Can be applied to hidden variable models (and selection bias models)

27 Bad Features of FCI Algorithm
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Bad Features of FCI Algorithm There is no reliable way to set error bounds on the pattern without making stronger assumptions. Can only get set of Markov equivalent DAGs, not a single DAG Doesn’t allow for comparing how much better one model is than another Need to assume some version of Causal Faithfulness Assumption

28 Non Independence Constraints
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Non Independence Constraints Depending on the parametric family, a DAG can entail constraints that are not conditional independence constraints Assuming linearity and non-Gaussian error terms, if a distribution is compatible with X → Y it is not compatible with X ← Y, even though they are Markov equivalent. this does not mean that they represent the same set of dsitributions in general, although it does for certain case equivalently, every distribution that factors according to G1 also factors according to G2 and vice-versa

29 Score-Based Search Strategy
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Score-Based Search Strategy Assign score to graph and sample based on maximum likelihood of data given graph simplicity of model Do search over graph space for highest score

30 Advantages of Score-Based Search Strategy
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Advantages of Score-Based Search Strategy Get more information about graph Additive noise models, unique DAG Doesn’t rely on binary decisions Local mistakes don’t propagate

31 Disadvantages of Score-Based Search Strategy
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Disadvantages of Score-Based Search Strategy Often slower to calculate or not known how to calculate exactly if include unmeasured variables selection bias unusual distributions Search over graph space is often heuristic

32 Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Dynamic Bayes Nets If measure same variable at different times, then the samples from the variable are not i.i.d. Solution: index each variable by time (time series)

33 Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Dynamic Bayes Nets Make a template for the causal structure that can be filled in with actual times Xt Xt Xt Yt Yt Yt Continuous time or differential equations?

34 Population Bayesian Networks Search Limitations and Extensions Dynamic
Relational Cycles Counterfactual Population parent-of parent-of parent-of SES SEX PE CP IQ SES SEX PE CP IQ SES SEX PE CP IQ

35 Population Not i.i.d. distribution Violations of SUTVA
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Population SES SEX PE CP IQ parent-of parent-of parent-of Not i.i.d. distribution Violations of SUTVA Causal relations between relations (e.g. sibling causes rivalry)

36 Extended Manipulation Specification
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Extended Manipulation Specification A manipulation assigns a density to a set of properties or relations at a set of times (measurable set of times T) for a set of units as a function of the values of a set of properties of relations

37 Extended Factorization Assumption
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Extended Factorization Assumption Alice&Jim SES SEX PE CP IQ parent-of parent-of Sue Bob P([Alice&Jim.SES, Sue.SEX, Sue.PE, Sue.IQ, Sue.CP, Alice&Jim.SES, Bob.SEX, Bob.PE, Bob.IQ, Bob.CP) =

38 Extended Factorization Assumption
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual Extended Factorization Assumption P(Sue.SEX) P(Alice&Jim.SES) P(Sue.IQ|Alice&Jim .SES) P(Sue.PE|Alice&Jim.SES, Sue.SEX, Sue.IQ) P(Sue.CP|Sue.PE, Alice&Jim.SES, Sue.IQ) P(Bob.SEX) P(Alice&Jim.SES) P(Bob.IQ|Alice&Jim.SES) P(Bob.PE|Alice&Jim.SES, Bob.SEX, Bob.IQ) P(Bob.CP|Bob.PE, Alice&Jim.SES, Bob.IQ)

39 3 Interpretation of Cycles: PE ⇆ CP
Bayesian Networks Search Limitations and Extensions Dynamic Relational Cycles Counterfactual 3 Interpretation of Cycles: PE ⇆ CP Equilibrium values of PE and CP cause each other. Average of values of PE and CP while reaching equilibrium influence each other. Mixture of PE → CP and CP → PE SES SEX PE CP IQ relation to experiments other kinds of equilibrium with different representation


Download ppt "Bayes Net Perspectives on Causation and Causal Inference"

Similar presentations


Ads by Google