Searching for Graphical Causal Models of Education Data

Slides:



Advertisements
Similar presentations
1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.
Advertisements

Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Topic Outline Motivation Representing/Modeling Causal Systems
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
Introduction of Probabilistic Reasoning and Bayesian Networks
1 Automatic Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour Dept. of Philosophy & CALD Carnegie Mellon.
Ambiguous Manipulations
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
1 gR2002 Peter Spirtes Carnegie Mellon University.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Bayes Net Perspectives on Causation and Causal Inference
1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.
Read R&N Ch Next lecture: Read R&N
1 Tetrad: Machine Learning and Graphcial Causal Models Richard Scheines Joe Ramsey Carnegie Mellon University Peter Spirtes, Clark Glymour.
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.
A Brief Introduction to Graphical Models
Causal Inference and Graphical Models Peter Spirtes Carnegie Mellon University.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
1 Tutorial: Causal Model Search Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Joe Ramsey, others.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Nov. 13th, Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon.
Course files
Penn State - March 23, The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Lecture 2: Statistical learning primer for biologists
Chapter Two Methods in the Study of Personality. Gathering Information About Personality Informal Sources of Information: Observations of Self—Introspection,
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.
1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
8 Experimental Research Design.
Learning Deep Generative Models by Ruslan Salakhutdinov
Day 3: Search Continued Center for Causal Discovery June 15, 2015
Qian Liu CSE spring University of Pennsylvania
From Brain Images to Causal Connections Center for Causal Discovery (CCD) BD2K All Hands Meeting University of Pittsburgh Carnegie Mellon University Pittsburgh.
Read R&N Ch Next lecture: Read R&N
Workshop Files
Markov Properties of Directed Acyclic Graphs
CSCI 5822 Probabilistic Models of Human and Machine Learning
Markov Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Read R&N Ch Next lecture: Read R&N
Center for Causal Discovery: Summer Short Course/Datathon
CHAPTER 7 BAYESIAN NETWORK INDEPENDENCE BAYESIAN NETWORK INFERENCE MACHINE LEARNING ISSUES.
Machine Learning: Lecture 3
Causal Data Mining Richard Scheines
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour,
Extra Slides.
NRES 746: Laura Cirillo, Cortney Hulse, Rosie Perash
CS 188: Artificial Intelligence Spring 2007
Educational Data Mining Success Stories
Parameter Learning 2 Structure Learning 1: The good
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Read R&N Ch Next lecture: Read R&N
BN Semantics 3 – Now it’s personal! Parameter Learning 1
CS 188: Artificial Intelligence Spring 2006
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Read R&N Ch Next lecture: Read R&N
Markov Networks.
Presentation transcript:

Searching for Graphical Causal Models of Education Data Richard Scheines Carnegie Mellon University

Outline Causal Model Learning Non-experimental studies: learning causes Experimental studies: learning mechanisms

Causal Learning is Harder than Prediction Data(X,Y) Causal Structure Learning Algorithm Statistical Estimation/ Machine Learning Equivalence Class of Causal Structure(s) P(Y,X) P(Y | Xset ) P(Y | X) Maybe

Statistical Estimation/ Prediction Markov Blanket for Y: MB(Y)  {X} Y _||_ X | MB(Y), i.e., P(Y | X) = P(Y | MB(Y)) Estimate P(Y | X) with P(Y | MB(Y)) Prediction Data(X,Y) Statistical Estimation/ Machine Learning No confounding: MB(Y) = Parents of Y & Children of Y & Parents of Children of Y Z1 X1 Y X2 X3 X4 Z2 P(Y,X) P(Y | X)

Causation:= Causal Bayes Networks Causal Graph P(S,YF, L) = P(S) P(YF | S) P(LC | S) The Joint Distribution Factors According to the Causal Graph, i.e., for all X in V P(V) =  P(X | Immediate Causes of(X)) P(V)

Interventions P(YF,S,L) = P(S) P(YF|S) P(L|S) Replace pre-manipulation causes with manipulation P(YF,S,L)m = P(S) P(YF|Manip) P(L|S)

Causal Learning Statistical Data  Causal Structure Statistical Inference Background Knowledge e.g., X2 prior in time to X3

Causal BN Search Methods Constraint Based Searches TETRAD (SGS, PC, FCI) Very fast – capable of handling >1,000 variables Pointwise, but not uniformly consistent Scoring Searches Scores: BIC, AIC, etc. Search: Hill Climb, Simulated Annealing, etc. Difficult to extend to latent variable models Meek and Chickering Greedy Equivalence Class (GES) Slower – ~100

Equivalence Classes of Causal Structures: Patterns & PAGs Patterns (Verma and Pearl, 1990): graphical representation of d-separation (Markov) equivalence - with no latent variables. PAGs: (Richardson 1994) graphical representation of an equivalence class including latent variable models and sample selection bias that are d-separation over a set of measured variables X

Patterns 1. represents set of conditional independence and distribution equivalent graphs 2. same adjacencies 3. undirected edges mean some contain edge one way, some contain other way 4. directed edge means they all go same way 5. Pearl and Verma -complete rules for generating from Meek, Andersson, Perlman, and Madigan, and Chickering 6. instance of chain graph 7. since data can’t distinguish, in absence of background knowledge is right output for search 8. what are they good for?

A PAG and its Super-DAGs Z X1 Y X2 Z X1 Y X2 Z X1 Y X2 Z X1 Y X2 1

Regression Example Markov Blanket for Y: MB(Y) = {X1, X2, X3} Truth PAG X1 ?? a cause ?? of Y X2 not a cause of Y X3 not a cause of Y X1 is a cause of Y X2 not a cause of Y X3 not a cause of Y X2 and X3 : latent common cause X3 and Y : latent common cause

Outline Causal Model Learning Non-experimental studies: learning causes Experimental studies: learning mechanisms

Full Semester Online Course in Causal & Statistical Reasoning

Data 2000 : Online vs. Lecture, UCSD Winter (N = 180) Spring (N = 120) 2001: Online vs. Lecture, Pitt & UCSD UCSD - winter (N = 190) Pitt (N = 80) UCSD - spring (N = 110)

Variables Recitation attendance (%) Online Lecture attendance (%) Gender Printing Voluntary Exercise Completion Computer comfort Etc. (9 others) Online Pre-test Midterm Final Exam

Online students only: 2002 --> 2003

Outline Causal Model Learning Non-experimental studies: learning causes Experimental studies: learning mechanisms

Learning the Mechanisms of Teaching Strategies Teaching Condition Pre-test M1 M2 .... Mn Post-test Mechanisms: more helpful and inviting hints, Inreased time on task, Lower irrelevant cognitive load If variables are appropriate: Condition _||_ Post-test | {Pre-test, M1, M2 …Mn}

Elida Laski, PhD Thesis (CMU PIER/Psychology): Internal and External Influences on Learning: A Microgenetic Analysis of the Acquisition of Numerical Knowledge from Board Games df=30 p = .75

Entire effect of condition through learning in training Effect maximal at the beginning of training Effect of condition on posttest linearity through encoding important

Model Search: Fractions Tutor Log Data Error-rate mediates negative effect Positive direct effects No mechanism through hints or time c² = 6.89, df = 10, p = .74

Learning the Mechanisms of Teaching Strategies Ongoing/Future Work Teaching Condition Pre-test Logged Student Events Learned Features Post-test If Learned Features and Pre-test are appropriate: Condition _||_ Post-test | {Pre-test, Learned Features}

Tetrad: www.phil.cmu.edu/projects/tetrad Software Tetrad: www.phil.cmu.edu/projects/tetrad Thanks

Example 3: Log Data What Drives Learning in a Cognitive Tutor? "Assistance" studies: what is "gaming" – is it always bad?

Example 3: Log Data Micro-Learning Study on Assistance in a Cognitive Tutor Hints: Level 1: General (your goal is X, try strategy Y) Level 2: Mid-level (compare angle A-B-C to X-Y-Z) Level 3: Answer (the angle is 45o )

Example 3: Log Data Micro-Learning Study on Assistance Time-indexed sequences: time Hint L1 Answer attempt Hint L2 Next Problem 1) Hint L1 Hint L3 Hint L2 Answer attempt Next Problem 2) Hint L1 Hint L3 Answer attempt Hint L2 Next Problem 3) Hint L1 Hint L3 Hint L2 Answer attempt Next Problem 4)

Example 3: Log Data Micro-Learning Study on Assistance in a Cognitive Tutor Results: Bottom-out Hint frequency not correlated with learning Bottom-out Hints with reflection time = self-requested worked examples, frequency highly correlated with learning gain (r  .6)