Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.

Similar presentations


Presentation on theme: "1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon."— Presentation transcript:

1 1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon

2 2 Causal Graphs Causal Graph G = {V,E} Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V Chicken Pox

3 3 Causal Bayes Networks P(S = 0) =.7 P(S = 1) =.3 P(YF = 0 | S = 0) =.99P(LC = 0 | S = 0) =.95 P(YF = 1 | S = 0) =.01P(LC = 1 | S = 0) =.05 P(YF = 0 | S = 1) =.20P(LC = 0 | S = 1) =.80 P(YF = 1 | S = 1) =.80P(LC = 1 | S = 1) =.20 P(S,YF, LC) = P(S) P(YF | S) P(LC | S) The Joint Distribution Factors According to the Causal Graph, i.e., for all X in V P(V) =  P(X|Immediate Causes of(X))

4 4 Structural Equation Models Structural Equations: One Equation for each variable V in the graph: V = f(parents(V), error V ) for SEM (linear regression) f is a linear function Statistical Constraints: Joint Distribution over the Error terms Causal Graph

5 5 Structural Equation Models Equations: Education =  ed Income =    Education  income Longevity =    Education  Longevity Statistical Constraints: (  ed,  Income,  Income ) ~N(0,  2 )  2  diagonal - no variance is zero Causal Graph SEM Graph (path diagram)

6 6 Tetrad 4: Demo www.phil.cmu.edu/projects/tetrad

7 7 Causal Datamining in Ed. Research 1.Collect Raw Data 2.Build Meaningful Variables 3.Constrain Model Space with Background Knowledge 4.Search for Models 5.Estimate and Test 6.Interpret

8 8 CSR Online Are Online students learning as much? What features of online behavior matter?

9 9 CSR Online Are Online students learning as much? Raw Data : Pitt 2001, 87 students For everyone: Pre-test, Recitation attendance, final exam For Online Students: logged: Voluntary question attempts, online quizzes, requests to print modules

10 10 CSR Online Build Meaningful Variables: 1.Online [0,1] 2.Pre-test [%] 3.Recitation Attendance [%] 4.Final Exam [%]

11 11 CSR Online Data: Correlation Matrix (corrs.dat, N=83) PreOnlineRecFinal Pre1.0 Online.0231.0 Rec-.004-.2551.0 Final.287.182.2971.0

12 12 CSR Online Background Knowledge: Temporal Tiers: 1.Online, Pre 2.Rec 3.Final

13 13 CSR Online Model Search: No latents (patterns – with PC or GES) - no time order : 729 models - temporal tiers: 96 models) With Latents (PAGs – with FCI search) - no time order : 4,096 - temporal tiers: 2,916

14 14 Tetrad Demo Online vs. Lecture Data file: corrs.dat

15 15 Estimate and Test: Results Model fit excellent Online students attended 10% fewer recitations Each recitation gives an increase of 2% on the final exam Online students did 1/2 a Stdev better than lecture students (p =.059)

16 16 References An Introduction to Causal Inference, (1997), R. Scheines, in Causality in Crisis?, V. McKim and S. Turner (eds.), Univ. of Notre Dame Press, pp. 185-200.An Introduction to Causal Inference Causation, Prediction, and Search, 2 nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) Causality: Models, Reasoning, and Inference, (2000), Judea Pearl, Cambridge Univ. Press “Causal Inference,” (2004), Spirtes, P., Scheines, R.,Glymour, C., Richardson, T., and Meek, C. (2004), in Handbook of Quantitative Methodology in the Social Sciences, ed. David Kaplan, Sage Publications, 447-478 Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press


Download ppt "1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon."

Similar presentations


Ads by Google