Detecting the Learning Value of Items In a Randomized Problem Set

Slides:



Advertisements
Similar presentations
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
Advertisements

Navigating the parameter space of Bayesian Knowledge Tracing models Visualizations of the convergence of the Expectation Maximization algorithm Zachary.
Modeling Student Knowledge Using Bayesian Networks to Predict Student Performance By Zach Pardos, Neil Heffernan, Brigham Anderson and Cristina Heffernan.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Effective Skill Assessment Using Expectation Maximization in a Multi Network Temporal Bayesian Network By Zach Pardos, Advisors: Neil Heffernan, Carolina.
Matching Experiment Class Results. Experiment Analyzed 114 subjects after removal of subjects who completed fewer than 4 problems 8 problems 2.
Computer Science Department Jeff Johns Autonomous Learning Laboratory A Dynamic Mixture Model to Detect Student Motivation and Proficiency Beverly Woolf.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
1 Validation and Verification of Simulation Models.
Berkeley Parlab 1. INTRODUCTION A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing 2. CORRELATIONS TO THE GROUND.
Experimental Evaluation
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Chapter Nine: Evaluating Results from Samples Review of concepts of testing a null hypothesis. Test statistic and its null distribution Type I and Type.
The Formative Assessment Cycle Solve a selection of problems of a given skill Analysis Students are instantly told if their answers on ASSISTment are correct.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Student Preferences For Learning College Algebra in a Web Enhanced Environment Dr. Laura J. Pyzdrowski, Pre-Collegiate Mathematics Coordinator Institute.
Reserve Variability – Session II: Who Is Doing What? Mark R. Shapland, FCAS, ASA, MAAA Casualty Actuarial Society Spring Meeting San Juan, Puerto Rico.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Sampling and estimation Petter Mostad
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Test of a Population Median. The Population Median (  ) The population median ( , P 50 ) is defined for population T as the value for which the following.
Data Science Credibility: Evaluating What’s Been Learned
Chapter 8: Estimating with Confidence
One-Sample Tests of Hypothesis
Chapter 8: Estimating with Confidence
How to interact with the system?
CS4311 Spring 2011 Process Improvement Dr
Rule Induction for Classification Using
Binomial Heaps On the surface it looks like Binomial Heaps are great if you have no remove mins. But, in this case you need only keep track of the current.
Using Bayesian Networks to Predict Test Scores
Statistical Process Control
One-Sample Tests of Hypothesis
CONCEPTS OF HYPOTHESIS TESTING
Professor S K Dubey,VSM Amity School of Business
Mingyu Feng Neil Heffernan Joseph Beck
Hidden Markov Models Part 2: Algorithms
Probability Probability underlies statistical inference - the drawing of conclusions from a sample of data. If samples are drawn at random, their characteristics.
Daniela Stan Raicu School of CTI, DePaul University
Daniela Stan Raicu School of CTI, DePaul University
Discrete Event Simulation - 4
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Addressing the Assessing Challenge with the ASSISTment System
Knowledge Tracing Parameters can be learned with the EM algorithm!
Four-Cut: An Approximate Sampling Procedure for Election Audits
The Behavior of Tutoring Systems
Neil T. Heffernan, Joseph E. Beck & Kenneth R. Koedinger
How to interact with the system?
Reasoning in Psychology Using Statistics
Chapter 8: Estimating with Confidence
Lecture 7 Sampling and Sampling Distributions
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Reasoning in Psychology Using Statistics
Chapter 8: Estimating with Confidence
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Inductive and Deductive Reasoning
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
pairing data values (before-after, method1 vs
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Matching Experiment Class Results.
Machine Learning: Lecture 5
Presentation transcript:

Detecting the Learning Value of Items In a Randomized Problem Set Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of Computer Science

The Problem What is the learning value of content in ITS? - does it promote learning? Ways to find out: Run an RCE Data mine responses (using this method) 100s of Items of learning content 1000s of students’ responses Pile of learning content (100s of assistments) Created by IQP students, Cris, Leena, me, not teachers in schools though Is it any good? It could all be great or all terrible, we are looking for what is the best content relative to the rest What content is the top of the heap? (conical two triangles example)

Dataset Student main problem responses (correct/incorrect) to 25 problem sets of 2,3 and 4 questions Questions within a problem set relate to the same skill 160-800 students completed each problem set in 2006-2007 school year data 2,400 students total with 54,000 responses (14-16 year olds) Questions in the problem sets were presented in a randomized order (required for this analysis) Main problem hint

Confound Since only main question responses are being analyzed, the learning from the main question is confounded with the learning from the scaffolding and hints of the problem. learning could be attributed to The immediate feedback to the main problem of question 1 The scaffolding of question 1 Applying concepts from question 1 the next main problem Main problem hint

Model S S S Parameters can be learned with the EM algorithm! .. ? Modeling or measuring learning requires modeling knowledge Knowledge Tracing used to model learning Parameters (probability of learning) (guess/slip) P(Skill: 0 → 1) P(Skill: 0 → 1) S S S Latent (skill knowledge) (dichotomous) P(correct| Skill = 0) P(incorrect| Skill = 1) Observables (question answers) incorrect correct correct

Model Knowledge tracing assumes that learning rate is the same between each opportunity Our model associates the learning rate with the particular problem that was encountered Knowledge Tracing Learning rate between opportunities are the same regardless of which problem the student saw 0.12 0.12 Our Item Effect Model Learning rates are an attribute of specific problems Learning rates must be associated with problem for all permutations. 0.11 0.15

Model Three question sequence permutations modeled with shared Bayesian parameters Also known as Equivalence classes of CPTs (conditional probability tables )

Reliability measure Data for a problem set randomly split into 20 equal size bins by student Each bin was evaluated separately by the model Binomial test used to estimate the probability of the null hypothesis, that each item is equally likely to have the highest learning rate ie: binopdf(best_choice_mode,20,0.25) Item learning rates 1 2 3 4 Split 1 0.0732 0.0267 0.0837 0.0701 ... Split 20 0.0849 0.0512 0.0550 0.0710

Method Application Which one is BEST? Compute the learning rates of the three questions in the problem set Which one is BEST? Definition of BEST in this analysis: The question in a problem set that has the highest probability of learning.

Results Problem sets with four questions were analyzed and the parameters of prior, guess/slip and learning rates were learned using the described method The question with the highest probability of learning was identified Problem set Number of users Best question p value prior q1 rate q2 rate q3 rate q4 rate 16 800 2 0.0652 0.6738 0.1100 0.1115 0.1017 0.1011 11 560 4 0.0170 0.5909 0.0958 0.0916 0.0930 0.1039 14 480 3 0.6499 0.1365 0.0977 0.1169 0.1063 25 440 1 0.7821 0.1392 0.0848 0.1157 0.1242 282 220 0.0039 0.7365 0.1574 0.0999 0.0991 0.1004 33 200 0.4394 0.7205 0.1124 0.1028 0.1237 0.1225 39 160 0.6180 0.0853 0.1192 0.1015 0.0819 Method needs validation. Another method may report different results and reliability. Ground truth of the parameters is necessary to validate the method and results

Simulation Validation Since ground truth of learning rates in the real world are impossible to know, a simulation study was run The simulation set a variety of values for the parameters of prior, guess/slip and learning rates and then simulated user responses These responses could then be analyzed by the method using the same technique as was used on real data An error analysis could done since the underlying simulation parameters of the data were known (did the method pick the right best question?) Opportunity to learn what the method can & can’t do

Simulation Results More students increases chance of a result Larger learning difference between questions also increases the change of a result Of the 160 experiments evaluated, 89 were reported as reliable (56%) Of the 89 reported reliable results (using p < 0.05), seven were incorrect (7.8% FP)

Limitations Only problem sets of five questions or less can be reasonably evaluated Larger problem sets become intractable to compute due to the exponential increase in nodes and permutations as question count increases for a four question set (4+4)*24 = 192 nodes for a five question set (5+5) *120 = 1,200 nodes Possible optimization is to only model the sequences for which there is data Randomization of question order must be present to control for factors including problem difficulty and allow for detecting learning rates of all item pairs in the problem set

Contribution No methods previous to this Estimate learning rates per problem Allows for the best (and worse) content to be identified without RCEs Extends knowledge tracing to support randomization of problem order Uses permutations of sequences to estimate stable Bayesian parameters with EM

Conclusions & Future Work We think that this method, and ones built off of it, will facilitate better tutoring systems Randomization gives many of the properties of a RCE. This method can perform a similar function but in the form of data mining to find what content works best Method could be applied to aid in improving accuracy of question skill tagging