Causal Data Mining Richard Scheines

Slides:



Advertisements
Similar presentations
1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.
Advertisements

G Lecture 10 SEM methods revisited Multilevel models revisited
StatisticalDesign&ModelsValidation. Introduction.
Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
Topic Outline Motivation Representing/Modeling Causal Systems
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
Forecasting Using the Simple Linear Regression Model and Correlation
Regresi Linear Sederhana Pertemuan 01 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003.
Probability & Statistical Inference Lecture 9
Challenges posed by Structural Equation Models Thomas Richardson Department of Statistics University of Washington Joint work with Mathias Drton, UC Berkeley.
1 Automatic Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour Dept. of Philosophy & CALD Carnegie Mellon.
1 Practical Statistics for Physicists Dresden March 2010 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
Temporal Causal Modeling with Graphical Granger Methods
Ambiguous Manipulations
Chapter 13 Introduction to Linear Regression and Correlation Analysis
1 gR2002 Peter Spirtes Carnegie Mellon University.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Nsm.uh.edu Math Courses Available After College Algebra.
Chapter 7 Forecasting with Simple Regression
Relationships Among Variables
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.
Bayes Net Perspectives on Causation and Causal Inference
1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.
Constraint Based (CB) Approach - ‘PC algorithm’  CB algorithm that learns a structure from complete undirected graph and then "thins" it to its accurate.
1 Tetrad: Machine Learning and Graphcial Causal Models Richard Scheines Joe Ramsey Carnegie Mellon University Peter Spirtes, Clark Glymour.
Moderation: Introduction
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
1.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
1 Tutorial: Causal Model Search Richard Scheines Carnegie Mellon University Peter Spirtes, Clark Glymour, Joe Ramsey, others.
1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
MTH 161: Introduction To Statistics
1 Center for Causal Discovery: Summer Workshop June 8-11, 2015 Carnegie Mellon University.
Nov. 13th, Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon.
Methodological Problems in Cognitive Psychology David Danks Institute for Human & Machine Cognition January 10, 2003.
Penn State - March 23, The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Lecture 2: Statistical learning primer for biologists
Chapter 16 Social Statistics. Chapter Outline The Origins of the Elaboration Model The Elaboration Paradigm Elaboration and Ex Post Facto Hypothesizing.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 18 Introduction to Simple Linear Regression (Data)Data.
The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.
Regression Analysis1. 2 INTRODUCTION TO EMPIRICAL MODELS LEAST SQUARES ESTIMATION OF THE PARAMETERS PROPERTIES OF THE LEAST SQUARES ESTIMATORS AND ESTIMATION.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.
Tetrad 1)Main website: 2)Download:
Section Copyright © 2015, 2011, 2008 Pearson Education, Inc. Lecture Slides Essentials of Statistics 5 th Edition and the Triola Statistics Series.
INTRODUCTION TO Machine Learning 2nd Edition
Bivariate & Multivariate Regression Analysis
Statistical Data Analysis - Lecture /04/03
Linear Regression.
Statistics for Managers using Microsoft Excel 3rd Edition
Workshop Files
Markov Properties of Directed Acyclic Graphs
Workshop Files
CHAPTER 29: Multiple Regression*
Writing about Structural Equation Models
Center for Causal Discovery: Summer Short Course/Datathon
Richard Scheines Carnegie Mellon University
Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour,
Extra Slides.
Educational Data Mining Success Stories
Multivariate Methods Berlin Chen
Searching for Graphical Causal Models of Education Data
Presentation transcript:

Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon

Causal Graphs Causal Graph G = {V,E} Each edge X  Y represents a direct causal claim: X is a direct cause of Y relative to V Chicken Pox 1. don’t define causality - but will introduce axioms to connect probability to causality 2. many fields proceed without agreement on definition - probability, “force” in mechanics, interpretation of quantum mechanics, etc. 3. a number of different kinds of graphs represent probability distributions and independence - advantage of directed graphs is also represents causal relations 4. will introduce several extensions

Causal Bayes Networks The Joint Distribution Factors According to the Causal Graph, i.e., for all X in V P(V) = P(X|Immediate Causes of(X)) P(S = 0) = .7 P(S = 1) = .3 P(YF = 0 | S = 0) = .99 P(LC = 0 | S = 0) = .95 P(YF = 1 | S = 0) = .01 P(LC = 1 | S = 0) = .05 P(YF = 0 | S = 1) = .20 P(LC = 0 | S = 1) = .80 P(YF = 1 | S = 1) = .80 P(LC = 1 | S = 1) = .20 P(S,YF, LC) = P(S) P(YF | S) P(LC | S)

Structural Equation Models Causal Graph Structural Equations: One Equation for each variable V in the graph: V = f(parents(V), errorV) for SEM (linear regression) f is a linear function Statistical Constraints: Joint Distribution over the Error terms 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality

Structural Equation Models Causal Graph Equations: Education = ed Income =Educationincome Longevity =EducationLongevity Statistical Constraints: (ed, Income,Income ) ~N(0,2) 2diagonal - no variance is zero SEM Graph (path diagram) 1. example of recursive structural equation model without correlated errors 2. can show that assumption of independence of errors guarantees correctness of probabilitic interpretation 3. this represents both probability and causality

Tetrad 4: Demo www.phil.cmu.edu/projects/tetrad 1. don’t define causality - but will introduce axioms to connect probability to causality 2. many fields proceed without agreement on definition - probability, “force” in mechanics, interpretation of quantum mechanics, etc. 3. a number of different kinds of graphs represent probability distributions and independence - advantage of directed graphs is also represents causal relations 4. will introduce several extensions

Causal Datamining in Ed. Research Collect Raw Data Build Meaningful Variables Constrain Model Space with Background Knowledge Search for Models Estimate and Test Interpret

CSR Online Are Online students learning as much? What features of online behavior matter?

Are Online students learning as much? CSR Online Are Online students learning as much? Raw Data : Pitt 2001, 87 students For everyone: Pre-test, Recitation attendance, final exam For Online Students: logged: Voluntary question attempts, online quizzes, requests to print modules

CSR Online Build Meaningful Variables: Online [0,1] Pre-test [%] Recitation Attendance [%] Final Exam [%]

CSR Online Data: Correlation Matrix (corrs.dat, N=83) Pre Online Rec Final 1.0 .023 -.004 -.255 .287 .182 .297

CSR Online Background Knowledge: Temporal Tiers: Online, Pre Rec Final

CSR Online Model Search: No latents (patterns – with PC or GES) - no time order : 729 models - temporal tiers: 96 models) With Latents (PAGs – with FCI search) - no time order : 4,096 - temporal tiers: 2,916

Tetrad Demo Online vs. Lecture Data file: corrs.dat

Estimate and Test: Results Model fit excellent Online students attended 10% fewer recitations Each recitation gives an increase of 2% on the final exam Online students did 1/2 a Stdev better than lecture students (p = .059)

References An Introduction to Causal Inference, (1997), R. Scheines, in Causality in Crisis?, V. McKim and S. Turner (eds.), Univ. of Notre Dame Press, pp. 185-200. Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) Causality: Models, Reasoning, and Inference, (2000), Judea Pearl, Cambridge Univ. Press “Causal Inference,” (2004), Spirtes, P., Scheines, R.,Glymour, C., Richardson, T., and Meek, C. (2004), in Handbook of Quantitative Methodology in the Social Sciences, ed. David Kaplan, Sage Publications, 447-478 Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press 1