Download presentation
Presentation is loading. Please wait.
Published byPriscilla Blankenship Modified over 6 years ago
1
Michael V. Yudelson Carnegie Mellon University
Individualizing BKT. Are Skill Parameters More Important Than Student Parameters? Michael V. Yudelson Carnegie Mellon University “Important” is a provocation, more of “capturing more variance of performance” hence, if accounted for, more useful for modeling
2
Modeling Student Learning (1)
Sources of performance variability Time – learning happens with repetition Knowledge quanta – learning is better (only) visible when aggregated by skill Students – people learn differently Content – there are easier/harder chunks of material Michael V. Yudelson (C) 2016
3
Modeling Student Learning (2)
Accounting for variability Time – implicitly present in time-series data Knowledge quanta –skills are frequently used as units of transfer Content – many models address modality and/or instances of content Students – significant attention is given to accounting for individual differences Michael V. Yudelson (C) 2016
4
Of Students and Skills Without accounting for skills (the what) there is little chance to see learning Component theory of transfer reliably defeats faculty theory and item-based models (Koedinger, et al., 2016) Student (the who) is, arguably, the runner up/contestant for the most potent factor Skill and student-level factors in models of learning Which one is more influential when predicting performance? Michael V. Yudelson (C) 2016
5
Focus if this work Subject: mathematics
Model: Bayesian Knowledge Tracing (BKT) Investigation: adding per-student parameters Extension: [globally] weighting skill vs. student parameters Question: which [global] weight is larger – per-skills or per-student? If I jump to the last slide now, we will be done, but let me still go through the slides in the middle Michael V. Yudelson (C) 2016
6
Bayesian Knowledge Tracing
Unrolled view of single-skill BKT Parameters Values ∈[0,1] Rows sum to 1 Forgetting (pF) = 0 Why 4 parameters Rows sum to 1, every last value can be omitted Forgetting is 0, no 5th parameter Michael V. Yudelson (C) 2016
7
Individualization BKT
Individualization – student-level parameters 1PL IRT, AFM – “student ability” intercept Split BKT parameters into student/skill components Corbett & Anderson, 1995; Yudelson et al., 2013 Multiplex Init for different student cohorts Pardos & Heffernan, 2010 Parameters are only set within student, not across students Lee & Brunskill, 2012 Michael V. Yudelson (C) 2016
8
Additive/Compensatory BKT Individualization
BKT parameter P∈{Init, Learn, Slip, Guess) iBKT: splitting parameter P (Yudelson et al., 2013) P = f(Puser,Pskill)=Sigmoid( logit(Puser) + logit(Pskill) ) Per-student and per-skill parameters are added on the logit scale and converted back to probability scale Setting all Puser = 0.5 converts iBKT to standard BKT iBKT model fit using block coordinate descent iBKT-W: making parameter split more interesting P = f(Pu,Pk,W0,Wu,Wk,Wuk)= Sigmoid( W0 + Wu𐄁logit(Pu) + Wk𐄁logit(Pk) + Wuk𐄁logit(Pu)𐄁logit(Pk) ) W0 – bias, hopefully low Wu vs. Wk – student vs. skill weight Wuk – interaction of student and skill components Michael V. Yudelson (C) 2016
9
Fitting BKT Models iBKT: HMM-scalable Public version Standard BKT only Fits standard and individualized BKT models using a suite of gradient-based solvers Exact inference of student/skill parameters (via block coordinate descent) iBKT-W: JAGS/WinBUGS via R’s rjags package Hierarchical Bayesian Model Flexible hyper-parameterization Skill parameters – drawn from uniform distribution Student parameters – drawn from Gaussian distribution Only individualize Init and Learn parameters. Michael V. Yudelson (C) 2016
10
Data KDD Cup 2010 Educational Data Mining Challenge. Carnegie Learning’s Cognitive Tutor data One curriculum unit Linear Inequalities JAGS/WibBUGS is less computationally efficient 336 students 66,307 transactions 30 skills Michael V. Yudelson (C) 2016
11
BKT Models. Statistical Fit.
Parameters Hyper Parameters RMSE Accuracy Majority Class (predict correct) 0.7242 Standard BKT hmm- scalable *4N 0.7561 Standard BKT HBM 4N 0.7569 iBKT hmm-scalable **4N+2M 0.7680 iBKT HBM 4N+2M 4 0.7692 iBKT-W HBM 4N+2M+4 12 0.7687 iBKT-W-2G HBM*** 16 0.7689 * N – number of skills ** M – number of students *** Init and Learn fit as a mixture of 2 Gaussians Michael V. Yudelson (C) 2016
12
The Story of Two Gaussians
iBKT-W HBM iBKT-W-2G* HBM * Fitting a mixture of 3-Gaussians results in bimodal distribution as well Michael V. Yudelson (C) 2016
13
Student vs. Skill The [global] bias W0 is low
Model W0 Wskill Wstudent Wstudent*skill iBKT-W HBM 0.012 0.565 1.420 0.004 iBKT-W-2G HBM 0.019 0.700 1.274 0.007 The [global] bias W0 is low The [global] interaction term Wstudent*skill is even lower Student [global] weight in the additive function is visibly higher Michael V. Yudelson (C) 2016
14
Discussion (1) Student parameters v. skill parameters
Bias and interaction terms effectively 0 A little disappointed about the interaction Student parameters weighted higher (2 reported + 7 additional models tried) Only small chance of over-fit despite random-factor treatment 30 skills (uniform distr.) 336 students (Gaussian distr.) Wk and Wu weights could be compensating/shifting the individual student/skill parameters Michael V. Yudelson (C) 2016
15
Discussion (2) iBKT via hmm-scalable Per-student Init(x)~Learn(y)
Exact inference (fixed effect) iBKT-W-2G via HBM Per-student Init(x)~Learn(y) Regularization via setting priors Wouldn’t it be nice to have students here? W R R R R R, BKT: Init0, Learn1, Logistic: Sigmoid(intercept)0, Sigmoid(slope)1 Michael V. Yudelson (C) 2016
16
Discussion (3) Small differences in statistical fits
Models with similar accuracies could be vastly different Significant differences in the amount of practice they would prescribe Prescribed practice time hh:mm Michael V. Yudelson (C) 2016
17
Discussion (4) iBKT-W-2G: what do 2 Gaussians represent?
Problems, time, hints, errors, % correct, {time,errors,hints}/problem – none of these explain the membership Lower Init&Learn vs. higher Init&Learn – does explain membership latent uni-dimensional student ability Michael V. Yudelson (C) 2016
18
Thank you! Michael V. Yudelson (C) 2016
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.