Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Similar presentations


Presentation on theme: "Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal."— Presentation transcript:

1

2 Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal hypothesis and you want to see if the empirical data agree with it. Do the data agree with my hypothesis? I think that: A B C (hypothesis) My data has this pattern of correlation within it What causal processes could have generated this pattern?

3 A BC D E “3-D” causal process“2-D” correlational shadow Hypothesis generation B & C independent given A A & D independent given B & C B & D independent given D and so on...

4 Besides the notion of d-separation, we need one other notion: faithfulness of data to a causal graph Is there one Bighorn Sheep in this picture, or are there two, except that the second is hidden behind the first? Both cases are possible, but the second case requires a very special combination of factors, i.e. that the second animal is positioned so that it gives the illusion of being absent. If the second case happens, then we can say that this is unfaithful to our normal experience.

5 AB C +10 -2 +5 Overall effect of A on B: +10 + (-2*5)=0 Because the two paths exactly cancel out, the overall correlation between A and B is zero; i.e. uncorrelated! The joint probability distribution over A,B & C is unfaithful to the graph because it gives the illusion of of independence between A and B contrary to d-separation. AB C This will only occur when positive and negative values exactly cancel out (very special conditions) like seeing one sheep because the other one is hiding behind the first! Unfaithfulness

6 Obtaining the undirected dependency graph A B CD E True process which we can’t see! Step 1: create a saturated undirected dependency graph. A B CD E

7 Obtaining the undirected dependency graph A B CD E True process which we can’t see! Step 2: let the order n (i.e the number) of conditioning variables be zero (i.e. no conditioning variables) - For each unique pair of variables (X,Y) that are still adjacent in the graph…) - For each unique set Q of the n other variables in the graph (in this case none…) Test the data to see if variables X and Y are independent given the conditioning set Q. A B CD E If X and Y are independent in the data, remove the line between them in the graph

8 Obtaining the undirected dependency graph A B CD E True process which we can’t see! Step 2: let the order n (i.e the number) of conditioning variables be zero (i.e. no conditioning variables) A B CD E Is A & B independent given no others? Is A & C independent given no others? No; don’t remove the line And so on... Result: we don’t remove any lines at this stage.

9 Obtaining the undirected dependency graph A B CD E True process which we can’t see! Step 3: let the order n (i.e the number) of conditioning variables be one (i.e. one conditioning variable) A B CD E Is A & B independent give C? No. Is A & B independent given D? E? No. Is A & C independent given B? Yes. Therefore, remove the line between A and C and go to next pair (A,D)

10 Obtaining the undirected dependency graph A B CD E True process which we can’t see! Step 3: let the order n (i.e the number) of conditioning variables be one (i.e. one conditioning variable) A B CD E Is A & B independent give C? No. Is A & B independent given D? E? No. Is A & C independent given B? Yes. Therefore, remove the line between A and C and go to next pair (A,D) Is A & D independent given B? Yes. Therefore, remove the line between A and D and go to next pair (A,E)

11 Obtaining the undirected dependency graph A B CD E True process which we can’t see! Step 3: let the order n (i.e the number) of conditioning variables be one (i.e. one conditioning variable) A B CD E Is A & B independent give C? No. Is A & B independent given D? E? No. Is A & C independent given B? Yes. Therefore, remove the line between A and C and go to next pair (A,D) Is A & D independent given B? Yes. Therefore, remove the line between A and D and go to next pair (A,E) And so on for each unique pair of variables and each unique conditioning set.

12 Obtaining the undirected dependency graph A B CD E True process which we can’t see! Step 4: let the order n (i.e the number) of conditioning variables be two (i.e. two conditioning variables) A B CD E Is A & B independent give any two others? No. Therefore, remove the line between B and E and go to next pair (A,D) Is B & C independent given any two others? No. Is B & D independent given any two others? No. Is B & E independent given any two others? Yes (C & D).

13 Obtaining the undirected dependency graph A B CD E True process which we can’t see! Step 4: let the order n (i.e the number) of conditioning variables be two (i.e. two conditioning variables) A B CD E Is A & B independent give any two others? No. Is A & E independent given any two others? No. Therefore, remove the line between B and E and go to next pair (A,D) Is B & C independent given any two others? No. Is B & D independent given any two others? No. Is B & E independent given any two others? Yes (C & D).

14 Obtaining the undirected dependency graph This algorithm is provably correct for any probability distribution, and for any functional relationship between variables, and for both cyclic and acyclic causal structures assuming: 1. Faithfulness; 3. No incorrect statistical decisions have been made when deciding upon statistical independence between variables in the data (i.e. lots of data and tests appropriate to the variables in question) The fewer data you have, the greater the chance of missing small, but real, statistical dependencies (statistical power) 2. All data are generated by the same causal process;

15 Interpreting the undirected dependency graph A B CD E If there is a line between two variables in this undirected dependency graph then: 1. There is a direct causal relationship between the two and/or... 2. There is a latent variable that is a common cause of the two and/or... 3. There is a more complicated type of undirected path between the two (an inducing path) A B CD A B CD latent

16 Orienting the undirected dependency graph C D E E Shielded collidersUnshielded collider C D E E E E Unshielded non-colliders XYZ Unshielded pattern A B CD E True process which we can’t see! A B CD E We can’t see this! We’ve learned this!

17 Orienting the undirected dependency graph In an unshielded collider, C & D will never be independent conditional on E plus every possible combination of remaining variables. C D E In an unshielded non-collider, C & D must be independent conditional on E plus (possibly) some other combination of remaining variables; this is why the line between C & D was removed in the undirected dependency graph! C D E E E C & D dependent given all of Q={ E, E+A, E+B,E+A+B} C & D independent given one of Q= { E, E+A,E+B,E+A+B}

18 Orienting the undirected dependency graph A B CD E True process which we can’t see! A B CD E We can’t see this! We’ve learned this! A B CA B D C B DB C E B D EC E D A B CD E

19 Orienting the undirected dependency graph A B CD E True process which we can’t see! We can’t see this! We’ve learned this! A B CD E A B CD E We can’t learn any more by just looking at the data. We can orient the rest of the edges any way we want, so long as we don’t: Partially-oriented acyclic graph - create or destroy any unshielded colliders that are found in the partially-oriented graph - create any cycles in the graph. All such graphs are statistically equivalent and we can’t test between them.

20

21 There are some further algorithms that can sometimes allow us to orient more lines, but they are more complicated and require more specialized patterns. There are also algorithms for oriented cyclic causal processes, but these are even more complicated and require stronger assumptions (linearity of relationships and continuous variables). There are also algorithms for detecting latent variables, but these assume both linearity and normality. The TETRAD Project: Causal Models and Statistical Data http://www.phil.cmu.edu/projects/tetrad/ Causal toolbox: http://callisto.si.usherb.ca:8080/bshipley/


Download ppt "Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal."

Similar presentations


Ads by Google