Addressing the Assessing Challenge with the ASSISTment System

Slides:



Advertisements
Similar presentations
Mastery Learning is a style of assessment in which the student must demonstrate mastery of the assignment by correctly answering a certain number of problems.
Advertisements

Addressing the Testing Challenge with a Web-Based E-Assessment System that Tutors as it Assesses Mingyu Feng, Worcester Polytechnic Institute (WPI) Neil.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Evaluation Kleanthous Styliani
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 23, 2012.
Topic Outline Motivation Representing/Modeling Causal Systems
Managerial Economics Estimation of Demand
Knowledge Inference: Advanced BKT Week 4 Video 5.
Improving learning by improving the cognitive model: A data- driven approach Cen, H., Koedinger, K., Junker, B. Learning Factors Analysis - A General Method.
Collaborative Warrior Tutoring Tom Livak Neil Heffernan 8/24/06.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Brian Junker Carnegie Mellon 2006 MSDE / MARCES Conference 1 Using On-line Tutoring Records to Predict End-of-Year Exam Scores Experience with the Assistments.
©2012 Carnegie Learning, Inc. In-vivo Experimentation Steve Ritter Founder and Chief Scientist Carnegie Learning.
Modeling Student Knowledge Using Bayesian Networks to Predict Student Performance By Zach Pardos, Neil Heffernan, Brigham Anderson and Cristina Heffernan.
Effective Skill Assessment Using Expectation Maximization in a Multi Network Temporal Bayesian Network By Zach Pardos, Advisors: Neil Heffernan, Carolina.
Computer Science Department Jeff Johns Autonomous Learning Laboratory A Dynamic Mixture Model to Detect Student Motivation and Proficiency Beverly Woolf.
The ASSISTment Project Trying to Reduce Bottom-out hinting: Will telling students how many hints they have left help? By Yu Guo, Joseph E. Beck& Neil T.
Towards Designing a User-Adaptive E-Learning System By Leena Razzaq, Neil Heffernan & Robert Lindeman This work-in-progress presents the groundwork for.
Addressing the Testing Challenge with a Web-Based E - Assessment System that Tutors as it Assesses Nidhi Goel Course: CS 590 Instructor: Prof. Abbott.
Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at
Conclusion Our prediction model did a good job at predict 8 th grade math proficiency. It can be used to estimate 10 th grade score fairly well, too. But.
On-demand learning-embedded benchmark assessment using classroom-accessible technology Discussant Remarks: Mark Wilson UC, Berkeley.
Searching for Patterns: Sean Early PSLC Summer School 2007 Question: Which is a better predictor of performance in a cognitive tutor, error rate or assistance.
Using Mixed-Effects Modeling to Compare Different Grain-Sized Skill Models Mingyu Feng, Worcester Polytechnic Institute Neil T. Heffernan, Worcester Polytechnic.
Educational Data Mining Overview John Stamper PSLC Summer School /25/2011 1PSLC Summer School 2011.
A Value-Based Approach for Quantifying Scientific Problem Solving Effectiveness Within and Across Educational Systems Ron Stevens, Ph.D. IMMEX Project.
Educational data mining overview & Introduction to Exploratory Data Analysis with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
Worcester Polytechnic Institute Towards Assessing Students’ Fine Grained Knowledge: Using an Intelligent Tutor for Assessing Mingyu Feng August 18 th,
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Integrating Assessment with Instruction: A Look Forward Ken Koedinger Pittsburgh Science of Learning Center Human-Computer Interaction and Psychology.
John Stamper Human-Computer Interaction Institute Carnegie Mellon University Technical Director Pittsburgh Science of Learning Center DataShop.
Assessing Students’ Performance Longitudinally: Item Difficulty Parameter vs. Skill Learning Tracking Mingyu Feng, Worcester Polytechnic Institute Neil.
Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s ability is defined in terms of a particular test The.
Case Study – San Pedro Week 1, Video 6. Case Study of Classification  San Pedro, M.O.Z., Baker, R.S.J.d., Bowers, A.J., Heffernan, N.T. (2013) Predicting.
A database describing the student’s knowledge of the domain topics.
The Formative Assessment Cycle Solve a selection of problems of a given skill Analysis Students are instantly told if their answers on ASSISTment are correct.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Rhythmic Transcription of MIDI Signals Carmine Casciato MUMT 611 Thursday, February 10, 2005.
Educational Data Mining: Discovery with Models Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
Pearson Copyright 2010 Some Perspectives on CAT for K-12 Assessments Denny Way, Ph.D. Presented at the 2010 National Conference on Student Assessment June.
University of Ostrava Czech republic 26-31, March, 2012.
Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 25, 2012.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Assessing Students with Disabilities Using Computer Technologies Presented to: Dr. Manuel Johnican EDU 510 – Educational Assessment By: Dorian Webb September.
Data-Driven Education
Core Methods in Educational Data Mining
Rhythmic Transcription of MIDI Signals
Michael V. Yudelson Carnegie Mellon University
Automated feedback in statistics education
How to interact with the system?
General principles in building a predictive model
Special Topics in Educational Data Mining
CJT 765: Structural Equation Modeling
Using Bayesian Networks to Predict Test Scores
Towards building a better cognitive model
Core Methods in Educational Data Mining
Mingyu Feng Neil Heffernan Joseph Beck
Towards Assessing Students’ Fine Grained Knowledge: Using an Intelligent Tutor for Assessing Mingyu Feng August 18th, 2009 Where are these people from/
Detecting the Learning Value of Items In a Randomized Problem Set
Welcome to Second Grade
The Behavior of Tutoring Systems
Neil T. Heffernan, Joseph E. Beck & Kenneth R. Koedinger
How to interact with the system?
Jack Mostow* and Joseph Beck Project LISTEN (
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Presentation transcript:

Addressing the Assessing Challenge with the ASSISTment System Mingyu Feng, Worcester Polytechnic Institute (WPI)

The “ASSISTment” Project Teachers are being asked to use formative assessment data to inform their instructions The ASSISTment project integrates assistance & assessment The original question a. Congruence b. Perimeter c. Equation-Solving The 1st scaffolding question Congruence Main goals: Help student Learning [1][2][3] Assess precisely and present to teachers The 2nd scaffolding question Perimeter A buggy message A hint message

Online Report [4] How do we improve the prediction? the “Grade book” report The prediction of score is primitive made totally based on percent correct on original questions. Big offset from real MCAS score avg. 7.5 points (2004-2005) How do we improve the prediction?

Thoughts on Improving our Prediction What if we take into consideration of performance on scaffolding questions? What if we assess student longitudinally over time? Questions have been tagged with skills in different skill models. How do we take advantage of that? Do students learn on different skills at the same rate? Can we make better prediction by incorporating skills? Which skill model did better on predicting students’ score? Difficulty of questions differ. Will our prediction be improved further if we take into consideration of difficulty of each single question?

Thoughts on Improving our Prediction What if we take into consideration of performance on scaffolding questions? What if we assess student longitudinally over time? Questions have been tagged with skills in different skill models. How do we take advantage of that? Do students learn on different skills at the same rate? Can we make better prediction by incorporating skills? Which skill model did better on predicting students’ score? Difficulty of questions differ. Will our prediction be improved further if we take into consideration of difficulty of each single question?

Research Questions Can we do a more accurate job of predicting student's MCAS score using the online assistance information (concerning time, performance on scaffoldings, #attempt, #hint)? Yes! the more hint, attempt, time a student need to solve a problem, the worse his predicted score would be. [Feng, Heffernan & Koedinger, 2006a]

Thoughts on Improving our Prediction What if we take into consideration of performance on scaffolding questions? What if we assess student longitudinally over time? Questions have been tagged with skills in different skill models. How do we take advantage of that? Do students learn on different skills at the same rate? Can we make better prediction by incorporating skills? Which skill model did better on predicting students’ score? Difficulty of questions differ. Will our prediction be improved if we take into consideration of difficulty of each single question?

Research Questions Track students’ learning longitudinally Can our system detect performance improving over time? Can we tell the difference on learning rate of students from different schools? Teacher? Can we do a better job of predicting students’ performance by modeling longitudinally?

17 Student from one class predicted-score (Y-Axis) over months (X Axis)

Research Questions Track learning longitudinally Can our system detect performance improving over time? Can we tell the difference on learning rate of students from different schools? Teacher? Can we do a better job of predicting students’ performance by modeling longitudinally? Yes, we can. (Feng, Heffernan & Koedinger, 2006a, 2006b )

Thoughts on Improving our Prediction What if we take into consideration of performance on scaffolding questions? What if we assess student longitudinally over time? Questions have been tagged with skills in different skill models. How do we take advantage of that? Do students learn on different skills at the same rate? Can we make better prediction by incorporating skills? Which skill model did better on predicting students’ score? Difficulty of questions differ. Will our prediction be improved further if we take into consideration of difficulty of each single question?

Skill Models

Research Questions Can we track the learning of different skills?

Research Questions Can we track the learning of different skills? Yes, we can! In (Feng, Heffernan & Koedinger, 2006a), we constructed models with 5 skills and showed that students started as being comparably good at Algebra while bad at Measurement. They were learning Data Analysis approximately at the same level of speed as Number Sense, significantly faster than learning other skills.

Research Questions Can we improve our prediction of student test scores by introducing skills as predictor in our models? The grain sizes of the skills models differ. How does the finer-grained skill model (WPI-78) do on estimating test scores comparing to the coarse skill? Similar work has been done in (Pardos et al., 2006) using Bayesian Network approach.

Data Item level online data students’ binary response (1/0) to items that are tagged in different skill models

Approach Fit mixed-effects logistic regression model on the longitudinal online data using skills as a factor predicting prob(response=1) on an item tagged with certain skill at certain time The fitted model gives learning parameters (initial knowledge + learning rate) of each skill of individual student All questions in the external test have also been tagged in all skill models. So we can apply the fitted model to get predicted test scores.

Result > Is 12.12% any good for assessment purpose? Real MCAS score Assistment Predicted Score Skill Models WPI-1 WPI-5 WPI-78 Mary 29 28.59 27.65 27.05 Tom 28 27.58 26.43 25.35 … Sue 25 26.56 24.94 24.10 Dick 22 23.70 22.78 21.31 Harry 33 27.54 26.37 28.12 Absolute Difference between Real Score and Assistment Predicted Score WPI-1 WPI-5 WPI-78 0.41 1.35 1.95 0.42 1.57 2.65 … 1.56 0.06 0.90 1.70 0.78 0.69 5.46 6.63 4.88 MAD 4.552 4.343 4.121 %Error 13.39% 12.77% 12.12% > P-values of both Paired t-tests are below 0.05 Is 12.12% any good for assessment purpose? MCAS-simulation result: 11.12%

Thoughts on Improving our Prediction What if we take into consideration of performance on scaffolding questions? What if we assess student longitudinally over time? Questions have been tagged with skills in different skill models. How do we take advantage of that? Do students learn on different skills at the same rate? Can we make better prediction by incorporating skills? Which skill model did better on predicting students’ score? Difficulty of questions differ. Will our prediction be improved further if we take into consideration of difficulty of each single question?

Research Question Does introducing item difficulty information help to build a better predictive model on top of skills?

Approach Getting difficulty parameters Fit one-parameter logistic (1PL) IRT model (also referred to as Rasch model) on our online data (from a different group of students) the dependent variable: probability of correct response for a particular person to a specified item. The independent variables: the person’s trait score and the item’s difficulty level . was used as a covariate in the mixed-effects logistic regression models

Results By introducing as a predicting factor in addition to skills, we can construct statistically significantly better fitted models on the online data. Yet, the prediction of the MCAS scores was not improved (Feng et al., 2006) When using separately, skill learning tracking can better predict MCAS score than simply using item difficulty parameter (Feng & Heffernan, 2007b)

Summary of Approaches on Improving Assessment What if we take into consideration of performance on scaffolding questions? What if we assess student longitudinally over time? Questions have been tagged with skills in different skill models. How do we take advantage of that? Do students learn on different skills at the same rate? Can we make better prediction by incorporating skills? Which skill model did better on predicting students’ score? Difficulty of questions differ. Will our prediction be improved further if we take into consideration of difficulty of each single question?

Reference [1] Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N.T., Koedinger, K. R., Junker, B., Ritter, S., Knight, A., Aniszczyk, C., Choksey, S., Livak, T., Mercado, E., Turner, T.E., Upalekar. R, Walonoski, J.A., Macasek. M.A., Rasmussen, K.P. (2005). The Assistment Project: Blending Assessment and Assisting. Proceedings of the 12th Artificial Intelligence In Education, 555-562. Amsterdam. [2] Razzaq, L., Heffernan, N.T. (2006). Scaffolding vs. hints in the Assistment System. In Ikeda, Ashley & Chan (Eds.). Proceedings of the Eight International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp. 635-644. 2006. [3] Razzaq, L., Heffernan, N. T. (2007, in press) What level of tutor feedback is best? In Luckin & Koedinger (Eds) Proceedings of the 13th Conference on Artificial Intelligence in Education. IOS Press. [4] Feng, M., Heffernan, N.T. (2007a). Towards Live Informing and Automatic Analyzing of Student Learning: Reporting in the Assistment System. Special Issue of Journal of Interactive Learning Research (JILR) entitled "Usage Analysis in Learning Systems: Existing Approaches and Scientific Issues", 18(2). [5] Pardos, Z. A., Heffernan, N. T., Anderson, B. & Heffernan, C. (in press). Using Fine-Grained Skill Models to Fit Student Performance with Bayesian Networks. Workshop in Educational Data Mining held at the Eight International Conference on Intelligent Tutoring Systems. Taiwan. 2006. [6] Feng, M., Heffernan, N., Mani, M., & Heffernan C. (2006). Using Mixed-Effects Modeling to Compare Different Grain-Sized Skill Models. AAAI'06 Workshop on Educational Data Mining, Boston, 2006. [7] Feng, M. & Heffernan, N. T. (2007b). Assessing Students?Performance: Item Difficulty Parameter vs. Skill Learning Tracking, paper to be presented at NCME Annual Conference 2007.