Core Methods in Educational Data Mining

Slides:



Advertisements
Similar presentations
Bayesian Knowledge Tracing and Discovery with Models
Advertisements

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 23, 2012.
Bayesian Knowledge Tracing Prediction Models
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 30, 2012.
Next Semester CSCI 5622 – Machine learning (Matt Wilder)  great text by Hastie, Tibshirani, & Friedman great text ECEN 5018 – Game Theory ECEN 5322 –
Navigating the parameter space of Bayesian Knowledge Tracing models Visualizations of the convergence of the Expectation Maximization algorithm Zachary.
Knowledge Inference: Advanced BKT Week 4 Video 5.
Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining Zachary A. Pardos PSLC Summer School 2011 Bayesian Knowledge Tracing.
Ryan S.J.d. Baker Adam B. Goldstein Neil T. Heffernan Detecting the Moment of Learning.
Week 8 Video 4 Hidden Markov Models.
Effective Skill Assessment Using Expectation Maximization in a Multi Network Temporal Bayesian Network By Zach Pardos, Advisors: Neil Heffernan, Carolina.
Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer.
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 6, 2012.
Using Excel for A – Z Analysis: ‘To Present’ items Jack Weinbender, Milligan College.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 13, 2012.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 2, 2012.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 January 28, 2013.
Advanced BKT February 11, Classic BKT Not learned Two Learning Parameters p(L 0 )Probability the skill is already known before the first opportunity.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 February 4, 2013.
Automatic Generation of Programming Feedback: A Data-Driven Approach Kelly Rivers and Ken Koedinger 1.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 25, 2012.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 6, 2012.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Michael V. Yudelson Carnegie Mellon University
Strategies for Taking Standardized Tests
Core Methods in Educational Data Mining
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
5 The Binomial Probability Distribution and Related Topics
Special Topics in Educational Data Mining
C ODEBREAKER Class discussion.
Using Bayesian Networks to Predict Test Scores
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Core Methods in Educational Data Mining
Strategies for Taking Standardized Tests
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Big Data, Education, and Society
Strategies for Taking Standardized Tests
Knowledge Tracing Parameters can be learned with the EM algorithm!
Core Methods in Educational Data Mining
Strategies for Taking Tests
Absolute and Relative cell referencing
Strategies for Taking Standardized Tests
Strategies for Taking Standardized Tests
Core Methods in Educational Data Mining
Strategies for Taking Standardized Tests
Strategies for Taking Standardized Tests
Strategies for Taking Standardized Tests
Strategies for Taking Standardized Tests
Core Methods in Educational Data Mining
Strategies for Taking Standardized Tests
CPSC 121: Models of Computation
Strategies for Taking Standardized Tests
Strategies for Taking Standardized Tests
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms
Strategies for Taking Standardized Tests
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Strategies for Taking Standardized Tests
Strategies for Taking Standardized Tests
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Core Methods in Educational Data Mining EDUC545 Spring 2017

What is the Goal of Knowledge Inference?

What is the Goal of Knowledge Inference? Measuring what a student knows at a specific time Measuring what relevant knowledge components a student knows at a specific time

Why is it useful to measure student knowledge?

Key assumptions of BKT Assess a student’s knowledge of skill/KC X Based on a sequence of items that are scored between 0 and 1 Classically 0 or 1, but there are variants that relax this Where each item corresponds to a single skill Where the student can learn on each item, due to help, feedback, scaffolding, etc.

Key assumptions of BKT Each skill has four parameters From these parameters, and the pattern of successes and failures the student has had on each relevant skill so far We can compute Latent knowledge P(Ln) The probability P(CORR) that the learner will get the item correct

Key assumptions of BKT Two-state learning model Each skill is either learned or unlearned In problem-solving, the student can learn a skill at each opportunity to apply the skill A student does not forget a skill, once he or she knows it

Model Performance Assumptions If the student knows a skill, there is still some chance the student will slip and make a mistake. If the student does not know a skill, there is still some chance the student will guess correctly.

Classical BKT p(T) Not learned Learned p(L0) p(G) 1-p(S) correct Two Learning Parameters p(L0) Probability the skill is already known before the first opportunity to use the skill in problem solving. p(T) Probability the skill will be learned at each opportunity to use the skill. Two Performance Parameters p(G) Probability the student will guess correctly if the skill is not known. p(S) Probability the student will slip (make a mistake) if the skill is known.

Assignment B5 Let’s go through the assignment together

Filter out all actions from (a copy of) the data set, until you only have actions for KC “VALUING-CAT-FEATURES”. How many rows of data remain?

Filter out all actions from (a copy of) the data set, until you only have actions for KC “VALUING-CAT-FEATURES”. How many rows of data remain? Correct answer: 2473 Other known answer: 2474 (“Almost. You have also included the header row. What is the total when you eliminate that?”) Other known answer: 124370 or 124371 (“You haven’t removed anything.”) Other known answer: 121897 or 121898 (“Oops! You deleted VALUING-CAT-FEATURES instead of keeping that.”)

We need to delete some rows, based on the assumptions of Bayesian Knowledge Tracing. With reference to the firstattempt column, which rows do we need to delete? Firstattempt = 1 Firstattempt = 0 No rows All rows

We need to delete some rows, based on the assumptions of Bayesian Knowledge Tracing. With reference to the firstattempt column, which rows do we need to delete? Firstattempt = 1 Firstattempt = 0 No rows All rows

Go ahead and delete the rows you indicated in question 2 Go ahead and delete the rows you indicated in question 2. How many rows of data remain? Correct answer: 1791

We’re going to create a Bayesian Knowledge Tracing model for VALUING-CAT-FEATURES. Create variable columns P(Ln-1) (cell I1), P(Ln-1|RESULT) (cell J1), and P(Ln) (cell K1), and leave the columns below them empty for now. (If you’re not sure what these represent, re-watch the lecture). To the right of this, type into four cells, (cell M2) L0, (M3) T, (M4) S, and (M5) G. Now type 0.3, 0.1, 0.2, and 0.25 to the right of (respectively) L0, T, S, and G (e.g. cells N2, N3, N4, N5). What is your slip parameter?

We’re going to create a Bayesian Knowledge Tracing model for VALUING-CAT-FEATURES. Create variable columns P(Ln-1) (cell I1), P(Ln-1|RESULT) (cell J1), and P(Ln) (cell K1), and leave the columns below them empty for now. (If you’re not sure what these represent, re-watch the lecture). To the right of this, type into four cells, (cell M2) L0, (M3) T, (M4) S, and (M5) G. Now type 0.3, 0.1, 0.2, and 0.25 to the right of (respectively) L0, T, S, and G (e.g. cells N2, N3, N4, N5). What is your slip parameter? Correct answer: 0.2

Just temporarily, set K3 to have = I2+0 Just temporarily, set K3 to have = I2+0.1, and propagate that formula all the way down (using copy-and-paste, for example), so that K4 has = I3+0.1, and so on (this pretends that the student always gets 10% better each time, even going over 100%, which is clearly wrong… we’ll fix it later). What should the formula be for Column I, P(Ln-1)? If you’re not sure which of these is right, try them each in Excel. Now, what should the formula for cell I2 be?

Propagate the correct formula for column I all the way down (using copy-and-paste). Just temporarily, set J2 to have =I2, and propagate that formula all the way down (this eliminates Bayesian updating, which is not correct within BKT… we’ll fix it later). Now, what should the formula for cell K2 be, to correctly represent learning based on the P(T) parameter?

What should the formula for cell K2 be?

If a student starts the tutor and then gets 3 problems right in a row for the skill, what is his/her final P(Ln) after these three problems?

If a student starts the tutor and then gets 3 problems wrong in a row for the skill, what is his/her final P(Ln)?

Assignment B5 Any questions?

Parameter Fitting Picking the parameters that best predict future performance Any questions or comments on this?

Overparameterization BKT is thought to be overparameterized (Beck et al., 2008) Which means there are multiple sets of parameters that can fit any data

Degenerate Space (Pardos et al., 2010)

Parameter Constraints Proposed Beck P(G)+P(S)<1.0 Baker, Corbett, & Aleven (2008): P(G)<0.5, P(S)<0.5 Corbett & Anderson (1995): P(G)<0.3, P(S)<0.1 Your thoughts?

Does it matter what algorithm you use to select parameters? EM better than CGD Chang et al., 2006 DA’= 0.05 CGD better than EM Baker et al., 2008 DA’= 0.01 EM better than BF Pavlik et al., 2009 DA’= 0.003, DA’= 0.01 Gong et al., 2010 DA’= 0.005 Pardos et al., 2011 D RMSE= 0.005 Gowda et al., 2011 DA’= 0.02 BF better than EM Pavlik et al., 2009 DA’= 0.01, DA’= 0.005 Baker et al., 2011 DA’= 0.001 BF better than CGD Baker et al., 2010 DA’= 0.02

Other questions, comments, concerns about BKT?

Next Assignment Basic assignment 6

Final Projects Let’s discuss final projects Final project presentations 5/2 9am-11am

Next Class Wednesday, April 5 B6: Performance Factors Assessment and Deep Knowledge Tracing Baker, R.S. (2015) Big Data and Education. Ch. 4, V3. Pavlik, P.I., Cen, H., Koedinger, K.R. (2009) Performance Factors Analysis -- A New Alternative to Knowledge Tracing. Proceedings of AIED2009. Pavlik, P.I., Cen, H., Koedinger, K.R. (2009) Learning Factors Transfer Analysis: Using Learning Curve Analysis to Automatically Generate Domain Models. Proceedings of the 2nd International Conference on Educational Data Mining. Khajah, M., Lindsey, R. V., & Mozer, M. C. (2016) How Deep is Knowledge Tracing? Proceedings of the International Conference on Educational Data Mining. 

The End