Download presentation
Presentation is loading. Please wait.
Published byBasil York Modified over 9 years ago
1
Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE
2
Motivation ◦BKT parameters are inferred from data ◦But best solution for a given data set may not quite match the parameters that actually generated it (sampling error) 0,0,0,0,0 0,1,1,0,1 0,1,0,0,0 0,0,1,1,0 5 students, 5 problems each, 25 bits of data prior = 0.205 learning = 0.010 guess = 0.142 slip = 0.031 4 parameters, 3 decimal digits each, 39.9 bits of data Not even possible for all parameter sets to be represented!
3
Questions ◦So how much data is needed for accurate estimates? ◦And do the parameter values affect how much you need? ◦Can we give confidence intervals for parameters?
5
Normal distribution over samples ◦Mean is almost always near true generating value ◦Standard deviation can be used to describe variation of estimates ◦Can use 68–95–99.7 rule for confidence intervals
7
Variation does depend on parameter values ◦Each parameter behaves differently ◦Best estimates for parameters near zero/one, worst in 05-0.8 range
9
There are interactions between parameter values ◦Can’t just precompute a table of stddevs for each parameter ◦Complex relationship, analytical approach probably infeasible ◦But at least there is continuity with small rates of change
11
Sample size recommendations ◦Stddev proportional to 1/sqrt(n) ◦Must increase sample size by factor of 4 to improve error by factor of 2 ◦Small data sets (<1000 students) will not give even one sigfig in all parameters ◦Question systems based on small classes!
13
No interaction between sample size and parameters ◦Change sample size without changing parameters → predictable variation in error ◦Gives an approach to estimate error on real-world data sets: ◦Take samples with replacement, infer parameters for each, compute stddev ◦Scale using 1/sqrt(n) to estimate stddevs at other sample sizes
14
Knowledge Tracing for Interacting Student Pairs DERRICK COETZEE
15
Motivation ◦Standard Bayesian knowledge tracing uses fixed learning rate parameter to capture all learning
16
Motivation ◦One way to improve: use information on course materials viewed
17
Motivation ◦What about peer interaction (e.g. forums/chat)? ◦Not fixed/static like instructional materials ◦The level of knowledge of the other student is important ◦Use our BKT model of the other student’s knowledge!
18
Pair interaction scenario ◦Simple case of student interaction ◦Two students are paired and always interact between each item (no interactions with others) Do exercise Learn independently Interact with partner Do exercise Learn independently
19
Pair interaction scenario ◦Model independent learning and interaction stages
20
Pair interaction scenario ◦Model independent learning and interaction stages ◦New parameters: teach, mislead KnowsOther student knows Probability knows after interaction No 0 Yes 1 NoYesteach YesNo1−mislead
21
Results: Preliminary simulations ◦5-parameter system (prior, learn, guess, slip, teach) ◦forget, mislead parameters fixed at zero ◦Generate synthetic data, run EM from generating values ◦Same behavior as classic system when teach = 0 ◦Unstable when teach > 0 ◦Converges to trivial solution prior=learn=teach=1, slip=proportion incorrect responses ◦Occurs for both small and large teach parameters
22
Results: Preliminary simulations ◦4-parameter system (learn, guess, slip, teach) ◦forget, mislead, prior fixed at zero ◦For small teach values (e.g. 0.05), teach converges to zero ◦Yields nontrivial solutions for large teach values, but other parameters absorb some of the teach: ◦learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 100 students → learn=0.1586, guess=0.1648, slip=0.0856, teach=0.6481 ◦learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 1000 students → learn=0.1643, guess=0.1940, slip=0.1102, teach=0.7225
23
Results: Preliminary simulations ◦4-parameter system (learn, guess, slip, teach) with 10000 students and high teach ◦prior=0.0000, learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000 → prior=0.2184, learn=0.0841, guess=0.1239, slip=0.2658, teach=0.8793 ◦prior and slip have high error, but learning/guess/teach are good ◦teach accuracy increases dramatically with sample size
24
Possible solutions ◦Answer items between independent learning and interaction (more observed data) ◦Mentor/mentee model: knowledge flows in only one direction ◦Eliminate different parameters, or combine parameters to create lower-dimensional space
25
Future work ◦Determine whether interaction model produces better predictions on synthetic data ◦Gather real-world pair interaction data using MOOCchat tool ◦Determine whether pair interaction produces better predictions ◦Typical values, appropriate interpretations for teach and mislead parameters? ◦Generalize to more complex interactions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.