A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science.

1 A M ACHINE L EARNING A PPROACH FOR A UTOMATIC S TUDENT M ODEL D ISCOVERY Nan Li, Noboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science Department Carnegie Mellon University

2 2 S TUDENT M ODEL A set of knowledge components (KCs) Encoded in intelligent tutors to model how students solve problems Example: What to do next on problems like 3x=12 A key factor behind instructional decisions in automated tutoring systems

3 3 S TUDENT M ODEL C ONSTRUCTION Traditional Methods Structured interviews Think-aloud protocols Rational analysis Previous Automated Methods Learning factor analysis (LFA) Proposed Approach Use a machine-learning agent, SimStudent, to acquire knowledge 1 production rule acquired => 1 KC in student model (Q matrix) Require expert input. Highly subjective. Require expert input. Highly subjective. Within the search space of human- provided factors. Independent of human- provided factors.

4 4 A B RIEF R EVIEW OF S IM S TUDENT A machine-learning agent that acquires production rules from examples & problem solving experience given a set of feature predicates & functions

5 5 P RODUCTION R ULES Skill divide (e.g. -3x = 6) What: Left side (-3x) Right side (6) When: Left side (-3x) does not have constant term => How: Get-coefficient (-3) of left side (-3x) Divide both sides with the coefficient Each production rule is associated with one KC Each step (-3x = 6) is labeled with one KC, decided by the production applied to that step Original model required strong domain-specific operators, like Get- coefficient  Does not differentiate important distinctions in learning (e.g., -x=3 vs -3x = 6)

6 6 D EEP F EATURE L EARNING Expert vs Novice (Chi et al., 1981) Example: What’s the coefficient of -3x? Expert uses deep functional features to reply -3 Novice may use shallow perceptual features to reply 3 Model deep feature learning using machine learning techniques Integrate acquired knowledge into SimStudent learning Remove dependence on strong operators & split KCs into finer grain sizes

10 10 E XAMPLE OF P RODUCTION R ULES B EFORE AND A FTER INTEGRATION Extend the “What” Part in Production Rule Original: Skill divide (e.g. -3x = 6) What: Left side (-3x) Right side (6) When: Left side (-3x) does not have constant term => How: Get coefficient (-3) of left side (-3x) Divide both sides with the coefficient (-3) Extended: Skill divide (e.g. -3x = 6) What: Left side ( -3, -3x) Right side (6) When: Left side (-3x) does not have constant term => How: Get coefficient (-3) of left side (-3x) Divide both sides with the coefficient ( -3 ) Fewer operators Eliminate need for domain-specific operators Fewer operators Eliminate need for domain-specific operators

12 12 E XPERIMENT M ETHOD SimStudent vs. Human-generated model Code real student data 71 students used a Carnegie Learning Algebra I Tutor on equation solving SimStudent: Tutored by a Carnegie Learning Algebra I Tutor Coded each step by the applicable production rule Used human-generated coding in case of no applicable production Human-generated model: Coded manually based on expertise

13 13 H UMAN - GENERATED VS S IM S TUDENT KC S Human- generated Model SimStude nt Comment Total # of KCs1221 # of Basic Arithmetic Operation KCs 413Split into finer grain sizes based on different problem forms # of Typein KCs44Approximately the same # of Other Transformation Operation KCs (e.g. combine like terms) 44Approximately the same

14 14 H OW WELL TWO MODELS FIT WITH REAL STUDENT DATA Used Additive Factor Model (AFM) An instance of logistic regression that Uses each student, each KC and KC by opportunity interaction as independent variables To predict probabilities of a student making an error on a specific step

15 divide1111111111 simSt-divide1111111000 simSt-divide-10000000111 A N E XAMPLE OF S PLIT IN D IVISION Human-generated Model divide: Ax=B & -x=A SimStudent simSt-divide: Ax=B simSt-divide-1: -x=A Ax=B -x=A

16 16 P RODUCTION R ULES FOR D IVISION Skill simSt-divide (e.g. - 3x = 6) What: Left side (-3, -3x) Right side (6) When: Left side (-3x) does not have constant term How: Divide both sides with the coefficient (-3) Skill simSt-divide-1 (e.g. -x = 3) What: Left side (-x) Right side (3) When: Left side (-x) is of the form -v How: Generate one (1) Divide both sides with -1

17 17 A N E XAMPLE WITHOUT S PIT IN D IVIDE T YPEIN Human- generated Model divide-typein SimStudent simSt-divide- typein divide-typein111111111 simSt-divide- typin 111111111

18 18 S IM S TUDENT VS S IM S TUDENT + F EATURE L EARNING SimStudent Needs strong operators Constructs student models similar to human-generated model Extended SimStudent Only requires weak operators Split KCs into finer grain sizes based on different parse trees Does Extended SimStudent produce a KC model that better fits student learning data?

19 19 R ESULTS Human-generated Model SimStudent AIC65296448 3-Fold Cross Validation RMSE 0.40340.3997 Significance Test SimStudent outperforms the human-generated model in 4260 out of 6494 steps p < 0.001 SimStudent outperforms the human-generated model across 20 runs of cross validation p < 0.001

20 20 S UMMARY Presented an innovative application of a machine-learning agent, SimStudent, for an automatic discovery of student models. Showed that a SimStudent generated student model was a better predictor of real student learning behavior than a human-generate model.

21 21 F UTURE S TUDIES Test generality in other datasets in DataShop Apply this proposed approach in other domains Stoichiometry Fraction addition

22 22


24 24 F EATURE R ECOGNITION AS PCFG I NDUCTION Underlying structure in the problem  Grammar Feature  Non-terminal symbol in a grammar rule Feature learning task  Grammar induction Student errors  Incorrect parsing

25 25 L EARNING P ROBLEM Input is a set of feature recognition records consisting of An original problem (e.g. -3x) The feature to be recognized (e.g. -3 in -3x) Output A probabilistic context free grammar (PCFG) A non-terminal symbol in a grammar rule that represents target feature

26 26 A C OMPUTATIONAL M ODEL OF D EEP F EATURE L EARNING Extended a PCFG Learning Algorithm (Li et al., 2009) Feature Learning Stronger Prior Knowledge: Transfer Learning Using Prior Knowledge

27 27 A T WO -S TEP PCFG L EARNING A LGORITHM Greedy Structure Hypothesizer: Hypothesizes grammar rules in a bottom-up fashion Creates non-terminal symbols for frequently occurred sequences E.g. – and 3, SignedNumber and Variable Viterbi Training Phase: Refines rule probabilities Occur more frequently  Higher probabilities Generalizes Inside-Outside Algorithm (Lary & Young, 1990)

28 28 F EATURE L EARNING Build most probable parse trees For all observation sequences Select a non-terminal symbol that Matches the most training records as the target feature

29 29 T RANSFER L EARNING U SING P RIOR K NOWLEDGE GSH Phase: Build parse trees based on some previously acquired grammar rules Then call the original GSH Viterbi Training: Add rule frequency in previous task to the current task 0.66 0.33 0.5

