Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer.

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

Bayesian Knowledge Tracing and Discovery with Models
Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other.
Bayesian Knowledge Tracing Prediction Models
Intro to EDM Why EDM now? Which tools to use in class Week 1, video 1.
Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2012.
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2010.
Feature Engineering Studio Special Session October 23, 2013.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 February 18, 2013.
Educational data mining overview & Introduction to Exploratory Data Analysis Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction.
Knowledge Inference: Advanced BKT Week 4 Video 5.
Improving learning by improving the cognitive model: A data- driven approach Cen, H., Koedinger, K., Junker, B. Learning Factors Analysis - A General Method.
Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining Zachary A. Pardos PSLC Summer School 2011 Bayesian Knowledge Tracing.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 27, 2012.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 7, 2013.
Educational Data Mining March 3, Today’s Class EDM Assignment#5 Mega-Survey.
Discovery with Models Week 8 Video 1. Discovery with Models: The Big Idea  A model of a phenomenon is developed  Via  Prediction  Clustering  Knowledge.
Ryan S.J.d. Baker Adam B. Goldstein Neil T. Heffernan Detecting the Moment of Learning.
Supporting (aspects of) self- directed learning with Cognitive Tutors Ken Koedinger CMU Director of Pittsburgh Science of Learning Center Human-Computer.
HUDM4122 Probability and Statistical Inference February 18, 2015.
Building Global Models from Local Patterns A.J. Knobbe.
Week 8 Video 4 Hidden Markov Models.
Cognitive Modeling February 5, Today’s Class Cognitive Modeling Assignment #3 Probing Questions Surveys.
Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d.
Educational Data Mining Overview John Stamper PSLC Summer School /25/2011 1PSLC Summer School 2011.
Educational data mining overview & Introduction to Exploratory Data Analysis with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
Educational Data Mining and DataShop John Stamper Carnegie Mellon University 1 9/12/2012 PSLC Corporate Partner Meeting 2012.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Data Mining Techniques
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 6, 2012.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Classifiers, Part 1 Week 1, video 3:. Prediction  Develop a model which can infer a single aspect of the data (predicted variable) from some combination.
John Stamper Human-Computer Interaction Institute Carnegie Mellon University Technical Director Pittsburgh Science of Learning Center DataShop.
PSLC DataShop Introduction Slides current to DataShop version John Stamper DataShop Technical Director.
Case Study – San Pedro Week 1, Video 6. Case Study of Classification  San Pedro, M.O.Z., Baker, R.S.J.d., Bowers, A.J., Heffernan, N.T. (2013) Predicting.
Data Annotation for Classification. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination.
Prediction (Classification, Regression) Ryan Shaun Joazeiro de Baker.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 13, 2012.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 2, 2012.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
Educational Data Mining: Discovery with Models Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 January 28, 2013.
Advanced BKT February 11, Classic BKT Not learned Two Learning Parameters p(L 0 )Probability the skill is already known before the first opportunity.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Assessment embedded in step- based tutors (SBTs) CPI 494 Feb 12, 2009 Kurt VanLehn ASU.
Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
How to answer the American West exam paper Edexcel.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 25, 2012.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining
Michael V. Yudelson Carnegie Mellon University
General principles in building a predictive model
COMP61011 : Machine Learning Ensemble Models
Prediction (Classification, Regression)
Bayes Net Toolbox for Student Modeling (BNT-SM)
Big Data, Education, and Society
3.1.1 Introduction to Machine Learning
Core Methods in Educational Data Mining
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer Interaction Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University

In this segment… We will give a brief overview of classes of Educational Data Mining methods Discussing in detail  Causal Data Mining An important Educational Data Mining method  Bayesian Knowledge Tracing One of the key building blocks of many Educational Data Mining analyses

Baker (under review) EDM Methods Prediction Clustering Relationship Mining Discovery with Models Distillation of Data for Human Judgment

Coverage at EDM2008 (of 31 papers; not mutually exclusive) Prediction – 45% Clustering – 6% Relationship Mining – 19% Discovery with Models – 13% Distillation of Data for Human Judgment – 16% None of the Above – 6%

We will talk about three approaches now 2 types of Prediction 1 type of Relationship Mining Tomorrow, 9:30am: Discovery with Models Yesterday: Some examples of Distillation of Data for Human Judgment

Prediction Pretty much what it says A student is using a tutor right now. Is he gaming the system or not? (“attempting to succeed in an interactive learning environment by exploiting properties of the system rather than by learning the material”) A student has used the tutor for the last half hour. How likely is it that she knows the knowledge component in the next step? A student has completed three years of high school. What will be her score on the SAT-Math exam?

Two Key Types of Prediction This slide adapted from slide by Andrew W. Moore, Google

Classification There is something you want to predict (“the label”) The thing you want to predict is categorical  The answer is one of a set of categories, not a number  CORRECT/WRONG (sometimes expressed as 0,1)  HELP REQUEST/WORKED EXAMPLE REQUEST/ATTEMPT TO SOLVE  WILL DROP OUT/WON’T DROP OUT  WILL SELECT PROBLEM A,B,C,D,E,F, or G

Classification Associated with each label are a set of “features”, which maybe you can use to predict the label KnowledgeComppknowtimetotalactionsright ENTERINGGIVEN WRONG ENTERINGGIVEN RIGHT USEDIFFNUM WRONG ENTERINGGIVEN RIGHT REMOVECOEFF WRONG REMOVECOEFF RIGHT USEDIFFNUM RIGHT ….

Classification The basic idea of a classifier is to determine which features, in which combination, can predict the label KnowledgeComp pknowtimetotalactionsright ENTERINGGIVEN WRONG ENTERINGGIVEN RIGHT USEDIFFNUM WRONG ENTERINGGIVEN RIGHT REMOVECOEFF WRONG REMOVECOEFF RIGHT USEDIFFNUM RIGHT ….

Many algorithms you can use Decision Trees (e.g. C4.5, J48, etc.) Logistic Regression Etc, etc In your favorite Machine Learning package  WEKA  RapidMiner  KEEL

Regression There is something you want to predict (“the label”) The thing you want to predict is numerical  Number of hints student requests (0, 1, 2, 3...)  How long student takes to answer (4.7 s., 8.9 s., 88.2 s., 0.3 s.)  What will the student’s test score be (95%, 84%, 33%, 100%)

Regression Associated with each label are a set of “features”, which maybe you can use to predict the label KnowledgeComp pknowtimetotalactionsnumhints ENTERINGGIVEN ENTERINGGIVEN USEDIFFNUM ENTERINGGIVEN REMOVECOEFF REMOVECOEFF USEDIFFNUM ….

Regression The basic idea of regression is to determine which features, in which combination, can predict the label’s value KnowledgeComp pknowtimetotalactionsnumhints ENTERINGGIVEN ENTERINGGIVEN USEDIFFNUM ENTERINGGIVEN REMOVECOEFF REMOVECOEFF USEDIFFNUM ….

Linear Regression The most classic form of regression is linear regression Numhints = 0.12*Pknow *Time – 0.11*Totalactions

Many more complex algorithms… Neural Networks Support Vector Machines Surprisingly, Linear Regression performs quite well in many cases despite being overly simple Particularly when you have a lot of data Which increasingly is not a problem in EDM…

Relationship Mining Richard Scheines will now talk about one type of relationship mining, Causal Data Mining

Bayesian Knowledge-Tracing The algorithm behind the skill bars … Being improved by Educational Data Mining Key in many EDM analyses and models

Goal: For each knowledge component (KC), infer the student’s knowledge state from performance. Suppose a student has six opportunities to apply a KC and makes the following sequence of correct (1) and incorrect (0) responses. Has the student has learned the rule? Bayesian Knowledge Tracing

Model Learning Assumptions Two-state learning model  Each skill is either learned or unlearned In problem-solving, the student can learn a skill at each opportunity to apply the skill A student does not forget a skill, once he or she knows it Only one skill per action

Model Performance Assumptions If the student knows a skill, there is still some chance the student will slip and make a mistake. If the student does not know a skill, there is still some chance the student will guess correctly.

Corbett and Anderson’s Model Not learned Two Learning Parameters p(L 0 )Probability the skill is already known before the first opportunity to use the skill in problem solving. p(T)Probability the skill will be learned at each opportunity to use the skill. Two Performance Parameters p(G)Probability the student will guess correctly if the skill is not known. p(S)Probability the student will slip (make a mistake) if the skill is known. Learned p(T) correct p(G)1-p(S) p(L 0 )

Bayesian Knowledge Tracing Whenever the student has an opportunity to use a skill, the probability that the student knows the skill is updated using formulas derived from Bayes’ Theorem.

Formulas

Knowledge Tracing How do we know if a knowledge tracing model is any good? Our primary goal is to predict knowledge

Knowledge Tracing How do we know if a knowledge tracing model is any good? Our primary goal is to predict knowledge But knowledge is a latent trait

Knowledge Tracing How do we know if a knowledge tracing model is any good? Our primary goal is to predict knowledge But knowledge is a latent trait But we can check those knowledge predictions by checking how well the model predicts performance

Fitting a Knowledge-Tracing Model In principle, any set of four parameters can be used by knowledge-tracing But parameters that predict student performance better are preferred

Knowledge Tracing So, we pick the knowledge tracing parameters that best predict performance Defined as whether a student’s action will be correct or wrong at a given time Effectively a classifier

Recent Advances Recently, there has been work towards contextualizing the guess and slip parameters (Baker, Corbett, & Aleven, 2008a, 2008b) The intuition: Do we really think the chance that an incorrect response was a slip is equal when  Student has never gotten action right; spends 78 seconds thinking; answers; gets it wrong  Student has gotten action right 3 times in a row; spends 1.2 seconds thinking; answers; gets it wrong

Recent Advances In this work, P(G) and P(S) are determined by a model that looks at time, previous history, the type of action, etc. Significantly improves predictive power of method  Probability of distinguishing correct from incorrect increases by about 15% of potential gain To 71%, so still room for improvement

Uses Outside of EDM, can be used to drive tutorial decisions Within educational data mining, there are several things you can do with these models

Uses of Knowledge Tracing Often key components in models of other constructs  Help-Seeking and Metacognition (Aleven et al, 2004, 2008)  Gaming the System (Baker et al, 2004, in press)  Off-Task Behavior (Baker, 2007)

Uses of Knowledge Tracing If you want to understand a student’s strategic/meta-cognitive choices, it is helpful to know whether the student knew the skill Gaming the system means something different if a student already knows the step, versus if the student doesn’t know it A student who doesn’t know a skill should ask for help; a student who does, shouldn’t

Uses of Knowledge Tracing Can be interpreted to learn about skills

Skills from the Algebra Tutor skillL0T AddSubtractTypeinSkillIsolatepositiveIso0.01 ApplyExponentExpandExponentsevalradicalE CalculateEliminateParensTypeinSkillElimi CalculatenegativecoefficientTypeinSkillM Changingaxisbounds0.01 Changingaxisintervals0.01 ChooseGraphicala combineliketermssp

Which skills could probably be removed from the tutor? skillL0T AddSubtractTypeinSkillIsolatepositiveIso0.01 ApplyExponentExpandExponentsevalradicalE CalculateEliminateParensTypeinSkillElimi CalculatenegativecoefficientTypeinSkillM Changingaxisbounds0.01 Changingaxisintervals0.01 ChooseGraphicala combineliketermssp

Which skills could use better instruction? skillL0T AddSubtractTypeinSkillIsolatepositiveIso0.01 ApplyExponentExpandExponentsevalradicalE CalculateEliminateParensTypeinSkillElimi CalculatenegativecoefficientTypeinSkillM Changingaxisbounds0.01 Changingaxisintervals0.01 ChooseGraphicala combineliketermssp

END This last example is a simple example of Discovery with Models Tomorrow at 9:30am, we’ll discuss some more complex examples