Educational Data Mining: Discovery with Models Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer.

Slides:



Advertisements
Similar presentations
Bayesian Knowledge Tracing and Discovery with Models
Advertisements

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.
Lesson 10: Linear Regression and Correlation
Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Brief introduction on Logistic Regression
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2012.
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2010.
Educational data mining overview & Introduction to Exploratory Data Analysis Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction.
Knowledge Inference: Advanced BKT Week 4 Video 5.
Knowledge Engineering Week 3 Video 5. Knowledge Engineering  Where your model is created by a smart human being, rather than an exhaustive computer.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Chapter 4 The Relation between Two Variables
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 27, 2012.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 7, 2013.
Educational Data Mining March 3, Today’s Class EDM Assignment#5 Mega-Survey.
Discovery with Models Week 8 Video 1. Discovery with Models: The Big Idea  A model of a phenomenon is developed  Via  Prediction  Clustering  Knowledge.
Week 2 Video 4 Metrics for Regressors.
Model Assessment, Selection and Averaging
CMPUT 466/551 Principal Source: CMU
Supporting (aspects of) self- directed learning with Cognitive Tutors Ken Koedinger CMU Director of Pittsburgh Science of Learning Center Human-Computer.
Special Topics in Educational Data Mining HUDK50199 Spring term, 2013 April 1, 2012.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Conclusion Our prediction model did a good job at predict 8 th grade math proficiency. It can be used to estimate 10 th grade score fairly well, too. But.
Lecture 6: Multiple Regression
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Educational Data Mining Overview John Stamper PSLC Summer School /25/2011 1PSLC Summer School 2011.
Multiple Regression – Basic Relationships
Chapter 5 Data mining : A Closer Look.
Educational Data Mining and DataShop John Stamper Carnegie Mellon University 1 9/12/2012 PSLC Corporate Partner Meeting 2012.
Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Correlation & Regression
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Classifiers, Part 1 Week 1, video 3:. Prediction  Develop a model which can infer a single aspect of the data (predicted variable) from some combination.
Case Study – San Pedro Week 1, Video 6. Case Study of Classification  San Pedro, M.O.Z., Baker, R.S.J.d., Bowers, A.J., Heffernan, N.T. (2013) Predicting.
Technical Adequacy Session One Part Three.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 4, 2012.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 13, 2012.
Understanding Statistics
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 2, 2012.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 February 6, 2013.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Data Mining and Decision Support
Brian Lukoff Stanford University October 13, 2006.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Using DataShop Tools to Model Students Learning Statistics Marsha C. Lovett Eberly Center & Psychology Acknowledgements to: Judy Brooks, Ken Koedinger,
Special Topics in Educational Data Mining HUDK5199 Spring, 2013 April 3, 2013.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 1, 2012.
Nonparametric Statistics
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Chapter 7. Classification and Prediction
Statistics in MSmcDESPOT
Introduction to Predictive Modeling
Neil T. Heffernan, Joseph E. Beck & Kenneth R. Koedinger
Review of Chapter 3 Examining Relationships
Presentation transcript:

Educational Data Mining: Discovery with Models Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University

In this segment… We will discuss Discovery with Models in (some) detail

Last time… We gave a very simple example of Discovery with Models using Bayesian Knowledge Tracing

Uses of Knowledge Tracing Can be interpreted to learn about skills

Skills from the Algebra Tutor skillL0T AddSubtractTypeinSkillIsolatepositiveIso0.01 ApplyExponentExpandExponentsevalradicalE CalculateEliminateParensTypeinSkillElimi CalculatenegativecoefficientTypeinSkillM Changingaxisbounds0.01 Changingaxisintervals0.01 ChooseGraphicala combineliketermssp

Which skills could probably be removed from the tutor? skillL0T AddSubtractTypeinSkillIsolatepositiveIso0.01 ApplyExponentExpandExponentsevalradicalE CalculateEliminateParensTypeinSkillElimi CalculatenegativecoefficientTypeinSkillM Changingaxisbounds0.01 Changingaxisintervals0.01 ChooseGraphicala combineliketermssp

Which skills could use better instruction? skillL0T AddSubtractTypeinSkillIsolatepositiveIso0.01 ApplyExponentExpandExponentsevalradicalE CalculateEliminateParensTypeinSkillElimi CalculatenegativecoefficientTypeinSkillM Changingaxisbounds0.01 Changingaxisintervals0.01 ChooseGraphicala combineliketermssp

Why do Discovery with Models? We have a model of some construct of interest or importance  Knowledge  Meta-Cognition  Motivation  Affect  Collaborative Behavior Helping Acts, Insults  Etc.

Why do Discovery with Models? We can now use that model to  Find outliers of interest by finding out where the model makes extreme predictions  Inspect the model to learn what factors are involved in predicting the construct  Find out the construct’s relationship to other constructs of interest, by studying its correlations/associations/causal relationships with data/models on the other constructs  Study the construct across contexts or students, by applying the model within data from those contexts or students  And more…

Finding Outliers of Interest Finding outliers of interest by finding out where the model makes extreme predictions  As in the example from Bayesian Knowledge Tracing  As in Ken’s example yesterday of finding upward spikes in learning curves

Model Inspection By looking at the features in the Gaming Detector, Baker, Corbett, & Koedinger (2004, in press) were able to see that Students who game the system and have poor learning  game the system on steps they don’t know Students who game the system and have good learning  game the system on steps they already know

Model Inspection: A tip The simpler the model, the easier this is to do Decision Trees and Linear/Step Regression: Easy.

Model Inspection: A tip The simpler the model, the easier this is to do Decision Trees and Linear/Step Regression: Easy. Neural Networks and Support Vector Machines: Fuhgeddaboudit!

Correlations to Other Constructs

Take Model of a Construct And see whether it co-occurs with other constructs of interest

Example Detector of gaming the system (in fashion associated with poorer learning) correlated with questionnaire items assessing various motivations and attitudes (Baker et al, 2008)

Example Detector of gaming the system (in fashion associated with poorer learning) correlated with questionnaire items assessing various motivations and attitudes (Baker et al, 2008) Surprise: Nothing correlated very well (correlations between gaming and some attitudes statistically significant, but very weak – r < 0.2)

Example More on this in a minute…

Studying a Construct Across Contexts Often, but not always, involves:

Model Transfer

Richard said that prediction assumes that the Sample where the predictions are made Is “the same as” The sample where the prediction model was made Not entirely true

Model Transfer It’s more that prediction assumes the differences “aren’t important” So how do we know that’s the case?

Model Transfer You can use a classifier in contexts beyond where it was trained, with proper validation This can be really nice  you may only have to train on data from 100 students and 4 lessons  and then you can use your classifier in cases where there is data from 1000 students and 35 lessons Especially nice if you have some unlabeled data set with nice properties  Additional data such as questionnaire data (cf. Baker, 2007; Baker, Walonoski, Heffernan, Roll, Corbett, & Koedinger, 2008)

Validate the Transfer You should make sure your model is valid in the new context (cf. Roll et al, 2005; Baker et al, 2006) Depending on the type of model, and what features go into it, your model may or may not be valid for data taken  From a different system  In a different context of use  With a different population

Validate the Transfer For example Will an off-task detector trained in schools work in dorm rooms?

Validate the Transfer For example Will a gaming detector trained in a tutor where {gaming=systematic guessing, hint abuse} Work in a tutor where {gaming=point cartels}

Validate the Transfer However Will a gaming detector trained in a tutor unit where {gaming=systematic guessing, hint abuse} Work in a different tutor unit where {gaming=systematic guessing, hint abuse}?

Maybe…

Baker, Corbett, Koedinger, & Roll (2006) We tested whether A gaming detector trained in a tutor unit where {gaming=systematic guessing, hint abuse} Would work in a different tutor unit where {gaming=systematic guessing, hint abuse}

Scheme Train on data from three lessons, test on a fourth lesson For all possible combinations of 4 lessons (4 combinations)

Transfer lesson.vs. Training lessons Ability to distinguish students who game from non- gaming students Overall performance in training lessons: A’ = 0.85 Overall performance in test lessons: A’ = 0.80 Difference is NOT significant, Z=1.17, p=0.24 (using Strube’s Adjusted Z)

So transfer is possible… Of course 4 successes over 4 lessons from the same tutor isn’t enough to conclude that any model trained on 3 lessons will transfer to any new lesson

What we can say is…

If… If we posit that these four cases are “successful transfer”, and assume they were randomly sampled from lessons in the middle school tutor…

Maximum Likelihood Estimation

Studying a Construct Across Contexts Using this detector (Baker, 2007)

Research Question Do students game the system because of state or trait factors? If trait factors are the main explanation, differences between students will explain much of the variance in gaming If state factors are the main explanation, differences between lessons could account for many (but not all) state factors, and explain much of the variance in gaming So: is the student or the lesson a better predictor of gaming?

Application of Detector After validating its transfer We applied the gaming detector across 35 lessons, used by 240 students, from a single Cognitive Tutor Giving us, for each student in each lesson, a gaming frequency

Model Linear Regression models Gaming frequency = Lesson +  0 Gaming frequency = Student +  0

Model Categorical variables transformed to a set of binaries i.e. Lesson = Scatterplot becomes 3DGeometry = 0 Percents = 0 Probability = 0 Scatterplot = 1 Boxplot = 0 Etc…

Metrics

r2r2 The correlation, squared The proportion of variability in the data set that is accounted for by a statistical model

r2r2 The correlation, squared The proportion of variability in the data set that is accounted for by a statistical model

r2r2 However, a limitation The more variables you have, the more variance you should be expected to predict, just by chance

r2r2 We should expect 240 students To predict gaming better than 35 lessons Just by overfitting

So what can we do?

Our good friend BiC Bayesian Information Criterion (Raftery, 1995) Makes trade-off between goodness of fit and flexibility of fit (number of parameters)

Predictors

The Lesson Gaming frequency = Lesson +  0 35 parameters r 2 = 0.55 BiC’ =  Model is significantly better than chance would predict given model size & data set size

The Student Gaming frequency = Student +  parameters r 2 = 0.16 BiC’ = 1382  Model is worse than chance would predict given model size & data set size!

Standard deviation bars, not standard error bars

In this talk… Discovery with Models to  Find outliers of interest by finding out where the model makes extreme predictions  Inspect the model to learn what factors are involved in predicting the construct  Find out the construct’s relationship to other constructs of interest, by studying its correlations/associations/causal relationships with data/models on the other constructs  Study the construct across contexts or students, by applying the model within data from those contexts or students

Necessarily… Only a few examples given in this talk

An area of increasing importance within EDM…

In the last 3 days we have discussed (or at least mentioned) 5 broad areas of EDM Prediction Clustering Relationship Mining Discovery with Models Distillation of Data for Human Judgment

Now it’s your turn To use these techniques to answer important questions about learners and learning To improve these techniques, moving forward

To learn more Baker, R.S.J.d. (under review) Data Mining in Education. Under review for inclusion in the International Encyclopedia of Education  Available upon request Baker, R.S.J.d., Barnes, T., Beck, J.E. (2008) Proceedings of the First International Conference on Educational Data Mining Romero, C., Ventura, S. (2007) Educational Data Mining: A Survey from 1995 to Expert Systems with Applications, 33 (1),

END

valuesabcdefghijk

valuesabcdefghijk Real dataRandom numbers

num vars r2r

r2r2 Nine variables of random junk successfully got an r 2 of 1 on ten data points And that’s what we call overfitting