Searching for Graphical Causal Models of Education Data

Searching for Graphical Causal Models of Education Data
Richard Scheines Carnegie Mellon University

Outline Causal Model Learning
Non-experimental studies: learning causes Experimental studies: learning mechanisms

Causal Learning is Harder than Prediction
Data(X,Y) Causal Structure Learning Algorithm Statistical Estimation/ Machine Learning Equivalence Class of Causal Structure(s) P(Y,X) P(Y | Xset ) P(Y | X) Maybe

Statistical Estimation/
Prediction Markov Blanket for Y: MB(Y)  {X} Y _||_ X | MB(Y), i.e., P(Y | X) = P(Y | MB(Y)) Estimate P(Y | X) with P(Y | MB(Y)) Prediction Data(X,Y) Statistical Estimation/ Machine Learning No confounding: MB(Y) = Parents of Y & Children of Y & Parents of Children of Y Z1 X1 Y X2 X3 X4 Z2 P(Y,X) P(Y | X)

Causation:= Causal Bayes Networks
Causal Graph P(S,YF, L) = P(S) P(YF | S) P(LC | S) The Joint Distribution Factors According to the Causal Graph, i.e., for all X in V P(V) =  P(X | Immediate Causes of(X)) P(V)

Interventions P(YF,S,L) = P(S) P(YF|S) P(L|S)
Replace pre-manipulation causes with manipulation P(YF,S,L)m = P(S) P(YF|Manip) P(L|S)

Causal Learning Statistical Data  Causal Structure
Statistical Inference Background Knowledge e.g., X2 prior in time to X3

Causal BN Search Methods
Constraint Based Searches TETRAD (SGS, PC, FCI) Very fast – capable of handling >1,000 variables Pointwise, but not uniformly consistent Scoring Searches Scores: BIC, AIC, etc. Search: Hill Climb, Simulated Annealing, etc. Difficult to extend to latent variable models Meek and Chickering Greedy Equivalence Class (GES) Slower – ~100

Equivalence Classes of Causal Structures: Patterns & PAGs
Patterns (Verma and Pearl, 1990): graphical representation of d-separation (Markov) equivalence - with no latent variables. PAGs: (Richardson 1994) graphical representation of an equivalence class including latent variable models and sample selection bias that are d-separation over a set of measured variables X

Patterns 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?

A PAG and its Super-DAGs
Z X1 Y X2 Z X1 Y X2 Z X1 Y X2 Z X1 Y X2 1

Regression Example Markov Blanket for Y: MB(Y) = {X1, X2, X3} Truth
PAG X1 ?? a cause ?? of Y X2 not a cause of Y X3 not a cause of Y X1 is a cause of Y X2 not a cause of Y X3 not a cause of Y X2 and X3 : latent common cause X3 and Y : latent common cause

Full Semester Online Course in Causal & Statistical Reasoning

Data 2000 : Online vs. Lecture, UCSD
Winter (N = 180) Spring (N = 120) 2001: Online vs. Lecture, Pitt & UCSD UCSD - winter (N = 190) Pitt (N = 80) UCSD - spring (N = 110)

Variables Recitation attendance (%) Online Lecture attendance (%)
Gender Printing Voluntary Exercise Completion Computer comfort Etc. (9 others) Online Pre-test Midterm Final Exam

Online students only: 2002 --> 2003

Learning the Mechanisms of Teaching Strategies
Teaching Condition Pre-test M1 M2 .... Mn Post-test Mechanisms: more helpful and inviting hints, Inreased time on task, Lower irrelevant cognitive load If variables are appropriate: Condition _||_ Post-test | {Pre-test, M1, M2 …Mn}

Elida Laski, PhD Thesis (CMU PIER/Psychology): Internal and External Influences on Learning: A Microgenetic Analysis of the Acquisition of Numerical Knowledge from Board Games df=30 p = .75

Entire effect of condition through learning in training
Effect maximal at the beginning of training Effect of condition on posttest linearity through encoding important

Model Search: Fractions Tutor Log Data
Error-rate mediates negative effect Positive direct effects No mechanism through hints or time c² = 6.89, df = 10, p = .74

Learning the Mechanisms of Teaching Strategies Ongoing/Future Work
Teaching Condition Pre-test Logged Student Events Learned Features Post-test If Learned Features and Pre-test are appropriate: Condition _||_ Post-test | {Pre-test, Learned Features}

Tetrad: www.phil.cmu.edu/projects/tetrad
Software Tetrad: Thanks

Example 3: Log Data What Drives Learning in a Cognitive Tutor?
"Assistance" studies: what is "gaming" – is it always bad?

Example 3: Log Data Micro-Learning Study on Assistance in a Cognitive Tutor
Hints: Level 1: General (your goal is X, try strategy Y) Level 2: Mid-level (compare angle A-B-C to X-Y-Z) Level 3: Answer (the angle is 45o )

Example 3: Log Data Micro-Learning Study on Assistance
Time-indexed sequences: time Hint L1 Answer attempt Hint L2 Next Problem 1) Hint L1 Hint L3 Hint L2 Answer attempt Next Problem 2) Hint L1 Hint L3 Answer attempt Hint L2 Next Problem 3) Hint L1 Hint L3 Hint L2 Answer attempt Next Problem 4)

Example 3: Log Data Micro-Learning Study on Assistance in a Cognitive Tutor
Results: Bottom-out Hint frequency not correlated with learning Bottom-out Hints with reflection time = self-requested worked examples, frequency highly correlated with learning gain (r  .6)

Searching for Graphical Causal Models of Education Data

Similar presentations

Presentation on theme: "Searching for Graphical Causal Models of Education Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Searching for Graphical Causal Models of Education Data

Similar presentations

Presentation on theme: "Searching for Graphical Causal Models of Education Data"— Presentation transcript:

Similar presentations

About project

Feedback