Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining Zachary A. Pardos PSLC Summer School 2011 Bayesian Knowledge Tracing.

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Data Mining Classification: Alternative Techniques
Navigating the parameter space of Bayesian Knowledge Tracing models Visualizations of the convergence of the Expectation Maximization algorithm Zachary.
Knowledge Inference: Advanced BKT Week 4 Video 5.
Ryan S.J.d. Baker Adam B. Goldstein Neil T. Heffernan Detecting the Moment of Learning.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Model Assessment, Selection and Averaging
Effective Skill Assessment Using Expectation Maximization in a Multi Network Temporal Bayesian Network By Zach Pardos, Advisors: Neil Heffernan, Carolina.
Sparse vs. Ensemble Approaches to Supervised Learning
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Ensemble Learning: An Introduction
Visual Recognition Tutorial
+ Doing More with Less : Student Modeling and Performance Prediction with Reduced Content Models Yun Huang, University of Pittsburgh Yanbo Xu, Carnegie.
Educational Data Mining Overview John Stamper PSLC Summer School /25/2011 1PSLC Summer School 2011.
Ensemble Learning (2), Tree and Forest
Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Machine Learning CS 165B Spring 2012
John Stamper Human-Computer Interaction Institute Carnegie Mellon University Technical Director Pittsburgh Science of Learning Center DataShop.
by B. Zadrozny and C. Elkan
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
EM and expected complete log-likelihood Mixture of Experts
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Educational Data Mining: Discovery with Models Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
An Introduction to Support Vector Machines (M. Law)
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 January 28, 2013.
Advanced BKT February 11, Classic BKT Not learned Two Learning Parameters p(L 0 )Probability the skill is already known before the first opportunity.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Ensemble Methods in Machine Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
NTU & MSRA Ming-Feng Tsai
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 25, 2012.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Data Mining Lab Student performance evaluation. Rate of learning varies from student to student May depend on similarity of the problem Is it possible.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Usman Roshan Dept. of Computer Science NJIT
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Core Methods in Educational Data Mining
Michael V. Yudelson Carnegie Mellon University
How to interact with the system?
Data Mining Lecture 11.
Bayes Net Toolbox for Student Modeling (BNT-SM)
Using Bayesian Networks to Predict Test Scores
Combining Base Learners
Detecting the Learning Value of Items In a Randomized Problem Set
Predicting Student Performance: An Application of Data Mining Methods with an Educational Web-based System FIE 2003, Boulder, Nov 2003 Behrouz Minaei-Bidgoli,
Knowledge Tracing Parameters can be learned with the EM algorithm!
How to interact with the system?
Core Methods in Educational Data Mining
Usman Roshan Dept. of Computer Science NJIT
Presentation transcript:

Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining Zachary A. Pardos PSLC Summer School 2011 Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

2 Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Outline of Talk Introduction to Knowledge Tracing – History – Intuition – Model – Demo – Variations (and other models) – Evaluations (baker work / kdd) Random Forests – Description – Evaluations (kdd) Time left? – Vote on next topic

Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos History Introduced in 1995 (Corbett & Anderson, UMUAI) Basked on ACT-R theory of skill knowledge (Anderson 1993) Computations based on a variation of Bayesian calculations proposed in 1972 (Atkinson)

Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Intuition Based on the idea that practice on a skill leads to mastery of that skill Has four parameters used to describe student performance Relies on a KC model Tracks student knowledge over time

Given a student’s response sequence 1 to n, predict n ? For some Skill K: Chronological response sequence for student Y [ 0 = Incorrect response 1 = Correct response] 1 …. n n+1 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Track knowledge over time (model of learning) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Knowledge Tracing (KT) can be represented as a simple HMM Latent Observed Node representations K = Knowledge node Q = Question node Node states K = Two state (0 or 1) Q = Two state (0 or 1) UMAP Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Four parameters of the KT model: P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) P(G) P(S) Probability of forgetting assumed to be zero (fixed) 8 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Formulas for inference and prediction Derivation (Reye, JAIED 2004): Formulas use Bayes Theorem to make inferences about latent variable Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

00111 Model Training Step - Values of parameters P(T), P(G), P(S) & P(L 0 ) used to predict student responses Ad-hoc values could be used but will likely not be the best fitting Goal: find a set of values for the parameters that minimizes prediction error Student A Student B Student C 0 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Model Training:

Model Tracing Step – Skill: Subtraction Student’s last three responses to Subtraction questions (in the Unit) Test set questions Latent (knowledge) Observable (responses) 10% 45% 75% 79% 83% 71% 74% P(K) P(Q) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Model Prediction:

Influence of parameter values P(L 0 ): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09 Student reached 95% probability of knowledge After 4 th opportunity Estimate of knowledge for student with response sequence: Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Estimate of knowledge for student with response sequence: P(L 0 ): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09 P(L 0 ): 0.50 P(T): 0.20 P(G): 0.64 P(S): 0.03 Student reached 95% probability of knowledge After 8 th opportunity Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Influence of parameter values

Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos ( Demo )

Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Variations on Knowledge Tracing (and other models)

Prior Individualization Approach Do all students enter a lesson with the same background knowledge? Node representations K = Knowledge node Q = Question node S = Student node Node states K = Two state (0 or 1) Q = Two state (0 or 1) S = Multi state (1 to N) P(L 0 |S) Observed Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Conditional Probability Table of Student node and Individualized Prior node P(L 0 |S) S valueP(S=value) 11/N 2 3 N CPT of Student node CPT of observed student node is fixed Possible to have S value for every student ID Raises initialization issue (where do these prior values come from?) S value can represent a cluster or type of student instead of ID Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

Conditional Probability Table of Student node and Individualized Prior node P(L 0 |S) S valueP(L 0 |S) N0.92 CPT of Individualized Prior node Individualized L 0 values need to be seeded This CPT can be fixed or the values can be learned Fixing this CPT and seeding it with values based on a student’s first response can be an effective strategy This model, that only individualizes L 0, the Prior Per Student (PPS) model P(L 0 |S) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

Conditional Probability Table of Student node and Individualized Prior node P(L 0 |S) S valueP(L 0 |S) CPT of Individualized Prior node Bootstrapping prior If a student answers incorrectly on the first question, she gets a low prior If a student answers correctly on the first question, she gets a higher prior P(L 0 |S) 1 1 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

What values to use for the two priors? P(L 0 |S) S valueP(L 0 |S) CPT of Individualized Prior node What values to use for the two priors? P(L 0 |S) 1 1 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

What values to use for the two priors? P(L 0 |S) S valueP(L 0 |S) CPT of Individualized Prior node 1.Use ad-hoc values P(L 0 |S) 1 1 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

What values to use for the two priors? P(L 0 |S) S valueP(L 0 |S) 0EM 1 CPT of Individualized Prior node 1.Use ad-hoc values 2.Learn the values P(L 0 |S) 1 1 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

What values to use for the two priors? P(L 0 |S) S valueP(L 0 |S) 0Slip 11-Guess CPT of Individualized Prior node 1.Use ad-hoc values 2.Learn the values 3.Link with the guess/slip CPT P(L 0 |S) 1 1 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

What values to use for the two priors? P(L 0 |S) S valueP(L 0 |S) 0Slip 11-Guess CPT of Individualized Prior node 1.Use ad-hoc values 2.Learn the values 3.Link with the guess/slip CPT P(L 0 |S) 1 1 With ASSISTments, PPS (ad-hoc) achieved an R 2 of (0.176 with KT) (Pardos & Heffernan, UMAP 2010) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

UMAP Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Variations on Knowledge Tracing (and other models)

P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) (Baker et al., 2010) BKT-BF Learns values for these parameters by performing a grid search (0.01 granularity) and chooses the set of parameters with the best squared error... P(G) P(S) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) (Chang et al., 2006) BKT-EM Learns values for these parameters with Expectation Maximization (EM). Maximizes the log likelihood fit to the data... P(G) P(S) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) (Baker, Corbett, & Aleven, 2008) BKT-CGS Guess and slip parameters are assessed contextually using a regression on features generated from student performance in the tutor... P(G) P(S) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) (Baker, Corbett, & Aleven, 2008) BKT-CSlip Uses the student’s averaged contextual Slip parameter learned across all incorrect actions.... P(G) P(S) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) (Nooraiei et al, 2011) BKT-LessData Limits students response sequence length to the most recent 15 during EM training.... P(G) P(S) Most recent 15 responses used (max) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) (Pardos & Heffernan, 2010) BKT-PPS Prior per student (PPS) model which individualizes the prior parameter. Students are assigned a prior based on their response to the first question.... P(G) P(S) P(L 0 |S) Observed Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

UMAP CFAR Correct on First Attempt Rate (CFAR) calculates the student’s percent correct on the current skill up until the question being predicted. Student responses for Skill X: _ Predicted next response would be 0.50 (Yu et al., 2010) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

UMAP Tabling Uses the student’s response sequence (max length 3) to predict the next response by looking up the average next response among student with the same sequence in the training set Training set Student A: Student B: Student C: Predicted next response would be 0.66 Test set student: _ Max table length set to 3: Table size was =15 (Wang et al., 2011) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

UMAP PFA Performance Factors Analysis (PFA). Logistic regression model which elaborates on the Rasch IRT model. Predicts performance based on the count of student’s prior failures and successes on the current skill. (Pavlik et al., 2009) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Study Cognitive Tutor for Genetics – 76 CMU undergraduate students – 9 Skills (no multi-skill steps) – 23,706 problem solving attempts – 11,582 problem steps in the tutor – 152 average problem steps completed per student (SD=50) – Pre and post-tests were administered with this assignment Dataset Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Methodology Evaluation Intro to Knowledge Tracing

Study Predictions were made by the 9 models using a 5 fold cross-validation by student Methodology model in-tutor prediction Student 1Skill AResp Skill AResp 2 … Skill AResp N Student 1Skill BResp 1 … Skill BResp N BKT-BF BKT-EM … Actual Accuracy was calculated with A’ for each student. Those values were then averaged across students to report the model’s A’ (higher is better) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Study Results in-tutor model prediction ModelA’ BKT-PPS BKT-BF BKT-EM BKT-LessData PFA Tabling BKT-CSlip CFAR BKT-CGS A’ results averaged across students Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Study Results in-tutor model prediction ModelA’ BKT-PPS BKT-BF BKT-EM BKT-LessData PFA Tabling BKT-CSlip CFAR BKT-CGS A’ results averaged across students No significant differences within these BKT Significant differences between these BKT and PFA Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Study 5 ensemble methods were used, trained with the same 5 fold cross-validation folds Methodology ensemble in-tutor prediction Ensemble methods were trained using the 9 model predictions as the features and the actual response as the label. Student 1Skill AResp Skill AResp 2 … Skill AResp N Student 1Skill BResp 1 … Skill BResp N BKT-BF BKT-EM … Actual featureslabel Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Study Ensemble methods used: 1.Linear regression with no feature selection (predictions bounded between {0,1}) 2.Linear regression with feature selection (stepwise regression) 3.Linear regression with only BKT-PPS & BKT-EM 4.Linear regression with only BKT-PPS, BKT-EM & BKT-CSlip 5.Logistic regression Methodology ensemble in-tutor prediction Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Study Results in-tutor ensemble prediction ModelA’ Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip Ensemble: LinReg with BKT-PPS & BKT-EM Ensemble: LinReg without feature selection Ensemble: LinReg with feature selection (stepwise) Ensemble: Logistic without feature selection A’ results averaged across students Tabling No significant difference between ensembles Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Study Results in-tutor ensemble & model prediction ModelA’ BKT-PPS Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip Ensemble: LinReg with BKT-PPS & BKT-EM BKT-BF BKT-EM Ensemble: LinReg without feature selection Ensemble: LinReg with feature selection (stepwise) Ensemble: Logistic without feature selection BKT-LessData PFA Tabling BKT-CSlip CFAR BKT-CGS A’ results averaged across students Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Study Results in-tutor ensemble & model prediction ModelA’ Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip Ensemble: LinReg without feature selection Ensemble: LinReg with feature selection (stepwise) Ensemble: Logistic regression without feature selection Ensemble: LinReg with BKT-PPS & BKT-EM BKT-EM BKT-BF BKT-PPS PFA BKT-LessData CFAR Tabling Contextual Slip BKT-CGS A’ results calculated across all actions Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

In the KDD Cup Motivation for trying non KT approach: – Bayesian method only uses KC, opportunity count and student as features. Much information is left unutilized. Another machine learning method is required Strategy: – Engineer additional features from the dataset and use Random Forests to train a model Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Random Forests

Strategy: – Create rich feature datasets that include features created from features not included in the test set Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Created by Leo Breiman The method trains T number of separate decision tree classifiers (50-800) Each decision tree selects a random 1/P portion of the available features (1/3) The tree is grown until there are at least M observations in the leaf (1-100) When classifying unseen data, each tree votes on the class. The popular vote wins or an average of the votes (for regression) Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Feature Importance Features extracted from training set: Student progress features (avg. importance: 1.67) – Number of data points [today, since the start of unit] – Number of correct responses out of the last [3, 5, 10] – Zscore sum for step duration, hint requests, incorrects – Skill specific version of all these features Percent correct features (avg. importance: 1.60) – % correct of unit, section, problem and step and total for each skill and also for each student (10 features) Student Modeling Approach features (avg. importance: 1.32) – The predicted probability of correct for the test row – The number of data points used in training the parameters – The final EM log likelihood fit of the parameters / data points Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Features of the user were more important in Bridge to Algebra than Algebra Student progress features / gaming the system (Baker et al., UMUAI 2008) were important in both datasets Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

RankFeature setRMSECoverage 1All features % 2Percent correct % 3All features (fill) % RankFeature setRMSECoverage 1All features % 2All features (fill) % 3Percent correct % Algebra Bridge to Algebra Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

RankFeature setRMSECoverage 1All features % 2Percent correct % 3All features (fill) % RankFeature setRMSECoverage 1All features % 2All features (fill) % 3Percent correct % Algebra Bridge to Algebra Best Bridge to Algebra RMSE on the Leaderboard was Random Forest RMSE of here is exceptional Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

RankFeature setRMSECoverage 1All features % 2Percent correct % 3All features (fill) % RankFeature setRMSECoverage 1All features % 2All features (fill) % 3Percent correct % Algebra Bridge to Algebra Skill data for a student was not always available for each test row Because of this many skill related feature sets only had 92% coverage Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Conclusion from KDD Combining user features with skill features was very powerful in both modeling and classification approaches Model tracing based predictions performed formidably against pure machine learning techniques Random Forests also performed very well on this educational data set compared to other approaches such as Neural Networks and SVMs. This method could significantly boost accuracy in other EDM datasets. Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Hardware/Software Software – MATLAB used for all analysis Bayes Net Toolbox for Bayesian Networks Models Statistics Toolbox for Random Forests classifier – Perl used for pre-processing Hardware – Two rocks clusters used for skill model training 178 CPUs in total. Training of KT models took ~48 hours when utilizing all CPUs. – Two 32gig RAM systems for Random Forests RF models took ~16 hours to train with 800 trees Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

Choose the next topic KT: 1-35 Prediction: Evaluation: sig tests: Regression/sig tests: Time left? Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

UMAP Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Individualize Everything?

Fully Individualized Model Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos (Pardos & Heffernan, JMLR 2011)

Fully Individualized Model S identifies the student Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos (Pardos & Heffernan, JMLR 2011)

Fully Individualized Model T contains the CPT lookup table of individual student learn rates Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos (Pardos & Heffernan, JMLR 2011)

Fully Individualized Model P(T) is trained for each skill which gives a learn rate for: P(T|T=1) [high learner] and P(T|T=0) [low learner] Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos (Pardos & Heffernan, JMLR 2011)

SSI model results DatasetNew RMSEPrev RMSEImprovement Algebra Bridge to Algebra Average of Improvement is the difference between the 1 st and 3 rd place. It is also the difference between 3 rd and 4 th place. The difference between PPS and SSI are significant in each dataset at the P < 0.01 level (t-test of squared errors) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos (Pardos & Heffernan, JMLR 2011)