Presentation is loading. Please wait.

Presentation is loading. Please wait.

Michael V. Yudelson Carnegie Mellon University

Similar presentations


Presentation on theme: "Michael V. Yudelson Carnegie Mellon University"— Presentation transcript:

1 Michael V. Yudelson Carnegie Mellon University
Individualizing BKT. Are Skill Parameters More Important Than Student Parameters? Michael V. Yudelson Carnegie Mellon University “Important” is a provocation, more of “capturing more variance of performance” hence, if accounted for, more useful for modeling

2 Modeling Student Learning (1)
Sources of performance variability Time – learning happens with repetition Knowledge quanta – learning is better (only) visible when aggregated by skill Students – people learn differently Content – there are easier/harder chunks of material Michael V. Yudelson (C) 2016

3 Modeling Student Learning (2)
Accounting for variability Time – implicitly present in time-series data Knowledge quanta –skills are frequently used as units of transfer Content – many models address modality and/or instances of content Students – significant attention is given to accounting for individual differences Michael V. Yudelson (C) 2016

4 Of Students and Skills Without accounting for skills (the what) there is little chance to see learning Component theory of transfer reliably defeats faculty theory and item-based models (Koedinger, et al., 2016) Student (the who) is, arguably, the runner up/contestant for the most potent factor Skill and student-level factors in models of learning Which one is more influential when predicting performance? Michael V. Yudelson (C) 2016

5 Focus if this work Subject: mathematics
Model: Bayesian Knowledge Tracing (BKT) Investigation: adding per-student parameters Extension: [globally] weighting skill vs. student parameters Question: which [global] weight is larger – per-skills or per-student? If I jump to the last slide now, we will be done, but let me still go through the slides in the middle Michael V. Yudelson (C) 2016

6 Bayesian Knowledge Tracing
Unrolled view of single-skill BKT Parameters Values ∈[0,1] Rows sum to 1 Forgetting (pF) = 0 Why 4 parameters Rows sum to 1, every last value can be omitted Forgetting is 0, no 5th parameter Michael V. Yudelson (C) 2016

7 Individualization BKT
Individualization – student-level parameters 1PL IRT, AFM – “student ability” intercept Split BKT parameters into student/skill components Corbett & Anderson, 1995; Yudelson et al., 2013 Multiplex Init for different student cohorts Pardos & Heffernan, 2010 Parameters are only set within student, not across students Lee & Brunskill, 2012 Michael V. Yudelson (C) 2016

8 Additive/Compensatory BKT Individualization
BKT parameter P∈{Init, Learn, Slip, Guess) iBKT: splitting parameter P (Yudelson et al., 2013) P = f(Puser,Pskill)=Sigmoid( logit(Puser) + logit(Pskill) ) Per-student and per-skill parameters are added on the logit scale and converted back to probability scale Setting all Puser = 0.5 converts iBKT to standard BKT iBKT model fit using block coordinate descent iBKT-W: making parameter split more interesting P = f(Pu,Pk,W0,Wu,Wk,Wuk)= Sigmoid( W0 + Wu𐄁logit(Pu) + Wk𐄁logit(Pk) + Wuk𐄁logit(Pu)𐄁logit(Pk) ) W0 – bias, hopefully low Wu vs. Wk – student vs. skill weight Wuk – interaction of student and skill components Michael V. Yudelson (C) 2016

9 Fitting BKT Models iBKT: HMM-scalable Public version Standard BKT only Fits standard and individualized BKT models using a suite of gradient-based solvers Exact inference of student/skill parameters (via block coordinate descent) iBKT-W: JAGS/WinBUGS via R’s rjags package Hierarchical Bayesian Model Flexible hyper-parameterization Skill parameters – drawn from uniform distribution Student parameters – drawn from Gaussian distribution Only individualize Init and Learn parameters. Michael V. Yudelson (C) 2016

10 Data KDD Cup 2010 Educational Data Mining Challenge. Carnegie Learning’s Cognitive Tutor data One curriculum unit Linear Inequalities JAGS/WibBUGS is less computationally efficient 336 students 66,307 transactions 30 skills Michael V. Yudelson (C) 2016

11 BKT Models. Statistical Fit.
Parameters Hyper Parameters RMSE Accuracy Majority Class (predict correct) 0.7242 Standard BKT hmm- scalable *4N 0.7561 Standard BKT HBM 4N 0.7569 iBKT hmm-scalable **4N+2M 0.7680 iBKT HBM 4N+2M 4 0.7692 iBKT-W HBM 4N+2M+4 12 0.7687 iBKT-W-2G HBM*** 16 0.7689 * N – number of skills ** M – number of students *** Init and Learn fit as a mixture of 2 Gaussians Michael V. Yudelson (C) 2016

12 The Story of Two Gaussians
iBKT-W HBM iBKT-W-2G* HBM * Fitting a mixture of 3-Gaussians results in bimodal distribution as well Michael V. Yudelson (C) 2016

13 Student vs. Skill The [global] bias W0 is low
Model W0 Wskill Wstudent Wstudent*skill iBKT-W HBM 0.012 0.565 1.420 0.004 iBKT-W-2G HBM 0.019 0.700 1.274 0.007 The [global] bias W0 is low The [global] interaction term Wstudent*skill is even lower Student [global] weight in the additive function is visibly higher Michael V. Yudelson (C) 2016

14 Discussion (1) Student parameters v. skill parameters
Bias and interaction terms effectively 0 A little disappointed about the interaction Student parameters weighted higher (2 reported + 7 additional models tried) Only small chance of over-fit despite random-factor treatment 30 skills (uniform distr.) 336 students (Gaussian distr.) Wk and Wu weights could be compensating/shifting the individual student/skill parameters Michael V. Yudelson (C) 2016

15 Discussion (2) iBKT via hmm-scalable Per-student Init(x)~Learn(y)
Exact inference (fixed effect) iBKT-W-2G via HBM Per-student Init(x)~Learn(y) Regularization via setting priors Wouldn’t it be nice to have students here? W R R R R R, BKT: Init0, Learn1, Logistic: Sigmoid(intercept)0, Sigmoid(slope)1 Michael V. Yudelson (C) 2016

16 Discussion (3) Small differences in statistical fits
Models with similar accuracies could be vastly different Significant differences in the amount of practice they would prescribe Prescribed practice time hh:mm Michael V. Yudelson (C) 2016

17 Discussion (4) iBKT-W-2G: what do 2 Gaussians represent?
Problems, time, hints, errors, % correct, {time,errors,hints}/problem – none of these explain the membership Lower Init&Learn vs. higher Init&Learn – does explain membership latent uni-dimensional student ability Michael V. Yudelson (C) 2016

18 Thank you! Michael V. Yudelson (C) 2016


Download ppt "Michael V. Yudelson Carnegie Mellon University"

Similar presentations


Ads by Google