Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assessing Students’ Performance Longitudinally: Item Difficulty Parameter vs. Skill Learning Tracking Mingyu Feng, Worcester Polytechnic Institute Neil.

Similar presentations


Presentation on theme: "Assessing Students’ Performance Longitudinally: Item Difficulty Parameter vs. Skill Learning Tracking Mingyu Feng, Worcester Polytechnic Institute Neil."— Presentation transcript:

1 Assessing Students’ Performance Longitudinally: Item Difficulty Parameter vs. Skill Learning Tracking Mingyu Feng, Worcester Polytechnic Institute Neil T. Heffernan, Worcester Polytechnic Institute

2 The “ASSISTment” System A web-based tutoring system that assists students in learning mathematics and gives teachers assessment of their students’ progress

3 An ASSISTmentASSISTment We break multi-step problems into “scaffolding questions” “Hint Messages”: given on demand that give hints about what step to do next “Buggy Message”: a context sensitive feedback message (Feng, Heffernan & Koedinger, 2006a) Skills –The state reports to teachers on 5 areas –We seek to report on more and finer grain-sized skills (Demo/movie) The original question a. Congruence b. Perimeter c. Equation-Solving The 1 st scaffolding question Congruence The 2 nd scaffolding question Perimeter A buggy message A hint message Geometry

4 The ASSISTment Project What Level of Tutor Interaction is Best? By Leena Razzaq, Neil Heffernan & Robert Lindeman To determine the best level of tutor interaction to help students learn the mathematics required for a state exam based on their math proficiency. Goal  The Assistment System is a web-based assessment system that tutors students on math problems. The system is freely available at www.assistment.org  As of March 2007, 1000’s of Worcester middle school students use ASSISTments every two weeks as part of their math class.  Teachers use the fine-grained reporting that the system provides to inform their instruction. The Interaction Hypothesis When one-on-one tutoring, either by a human tutor or a computer tutor, is compared to a less interactive control condition that covers the same content, then students will learn more in the interactive condition than the control condition. Background on ASSISTments Analysis and Conclusions Experiment Design 3 levels of interaction: Scaffolding + hints represents the most interactive experience: students must answer scaffolding questions, i.e. learning by doing. Hints on demand are less interactive because students do not have to respond to hints, but they can get the same information as in the scaffolding questions by requesting hints. Delayed feedback is the least interactive condition because students must wait until the end of the assignment to get any feedback. 2 levels of math proficiency: Students in Honors math classes. Students in Regular math classes.  566 8 th grade students participated.  Results showed a significant interaction between condition and math proficiency (p < 0.05), a good case for tailoring tutor interaction to types of students. Regular students learned more with scaffolding + hints (p < 0.05): less-proficient students benefit from more interaction and coaching through each step to solve a problem. Honors students learned more with delayed feedback (p = 0.075): more-proficient students benefit from seeing problems worked out and getting the big picture. Delayed feedback performed better than hints on demand (p=.048) for both more- and less-proficient students: students don’t do as well when we depend on student initiative. Experiment Screen Shots This work has been accepted for publication at the 2007 Artificial Intelligence in Education Conference in Los Angeles. Hints on Scaff. Q. Scaff. Q. #1 Scaff. Q. #2 Scaff. Q. #3 Scaff. Q. #4 Hint #1 Hint #2 Hint #3 Hint #4 Hint #5 Hint #6 Hint #7 CollaboratorsSponsors Students in this condition interact with the tutor by answering scaffolding questions. Students in this condition can get hints when they ask for them by pressing the hint button. Students in this condition get no feedback until the end of the assignment when they get answers and solutions. Students see the solution after they finish all of the problems. Is this hypothesis true? We found evidence to support this hypothesis in some cases, not in others. Based on the results of Razzaq & Heffernan (2006), we believe the difficulty of the material influences how effective interactive tutoring will be. Our Hypothesis  More interactive intelligent tutoring will lead to more learning (based on post-test gains) than less interactive tutoring.  Differences in learning will be more significant for students who are less- proficient than students who are more- proficient.

5 By tagging items with skills, teachers can 1) get reports on which skills students are doing poorly on, and 2) track them over time. CAREER: Learning about Learning: Using Intelligent Tutoring Systems as a Research Platform to Investigate Human Learning Free researcher, teacher and student accounts for 7 th -10 th grade math preparation at www.assistment.org IDQuestion Text Correct Answer % Correct Hint Req.# Attempt Common Errors Standard Resp.# Triangles ABC and DEF are congruent. The perimeter of triangle ABC is 23 inches. What is the length of side DF in triangle DEF? 1016%49%188 1614 Geometry 512 87 1 Which side of triangle ABC has the same length as side DF of triangle DEF? ac23%44%144 ab8 Geometry AB4 103 2 What is the perimeter of triangle ABC? 2x + x + 8 41%20%143 2x + 823Measurement 3 Now, given the perimeter of triangle ABC equals 23 inches, you can write the equation 2x + x + 8 = 23 and solve it for x. What is the value of x? 536%44%140 154 Algebra & Number Sense 84 102 4 Remember, we are looking for side DF. Enter the length of side DF: 1043%34%135 516 Algebra & Geometry 163 62 The three hint messages for the second scaffold. The “bottom out” hint. The second scaffold The first scaffold Uploaded image What a student sees This project has 5 main research thrusts; 1) For the designing cognitive models thrust we report that we can do a better job of modeling students by using finer-grained models (i.e., that track more knowledge components) of student than more courser grain model (Zapdos et al, 2006, Feng, et al, 2006). 2) For the research thrust of inferring what students know and are learning we can report two new results. First, we can do a better job of assessing students (as measured by predicting state test scores) by seeing how much tutoring they need to solve a question (Feng, et al, 2006a). Secondly, we have shown that we can do a better job of modeling students’ learning overtime by building models that take allow us to model different rates of learning for different skills (Feng et al, 2006a). 3) For the optimizing learning thrust we have new empirical results that show that students learn more with the type of tutoring we provide that compared to a traditional Computer-Added Instruction (CAI) control (Razzaq & Heffernan, 2006). 4) For the thrust for informing educators, we have some recent publications on the types of feedback we give educators (Feng & Heffernan, 2005& 2006)). Additionally, we have work that shows we can track student motivation and then inform educators in novel manners that increase student motivation (Walonoski & Heffernan, 2006a & 2006b). 5) Finally, for the thrust of allowing user adaptation we have shown that the authoring tools we have built can be used to teachers and quickly create content for their classes (Heffernan, Turner et al, 2006). References are at www.asssistment.org What the teacher who builds the tutoring sees. This shows a student that first guessed 16 (real answer is 24), then got the first scaffolding question correct with “AC”. The student then clicked on “½*8x” and the system spit out the “bug” message in red. The student, twice in a row, asked for a hint shown in the green box. The author wrote this hint message shown in the green box, put typing it in here. This dialog shows the author has tagged the third scaffold with three different grained sized models. Recent Results - 2006 What the State MCAS test provides Teachers get reports per student, per skill, and per item. Teacher Reports 1) To help researchers learn about student learning. 2) To help students learn math and report to teachers valuable information about their students’ knowledge. Goal The Assistment System is a web-based assessment system that tutors students on items they get wrong The system is freely available at www.assistment.org Thousands of students use it in Worcester and surrounding towns every two weeks as part of their math class or for homework. The system tracks 98 skills for 8 th grade math, and reports on those skills to teachers. Teachers and schools (and researchers) can use our web-based tools to create their own content quickly. Funding/People PI Neil Heffernan at WPI with collaborator Kenneth Koedinger at Carnegie Mellon. Over 50 people have helped contribute. Thanks for $3 millions in funding from National Science Foundation (NSF) CAREER, US Department of Education, Office of Naval Research, Spencer Foundation, and US Army Contact: Professor Neil T. Heffernan (508) 831-5569, nth@wpi.edu Do Students Learn from Assistments? Yes! We compared 19 pairs of items that address the same concept with 681 students got significant results (p<.05). See Razzaq et al (2005) and Razzaq & Heffernan (2006). Summary Do Assistments Assess Accurately? Yes, the Assistment System can predict a student’s MCAS score quite reliably and can track different rates of learning for different skills. See Feng, Heffernan & Koedinger(2006) * Feng, M., Heffernan, N.T, Koedinger, K.R.,(2006) Predicting State Test Scores Better with Intelligent Tutoring Systems: Developing Metrics to Measure Assistance Required, The 8th International Conference on Intelligent Tutoring System, 2006, Taiwan * Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N.T., Koedinger, K. R., Junker, B., Ritter, S., Knight, A., Aniszczyk, C., Choksey, S., Livak, T., Mercado, E., Turner, T.E., Upalekar. R, Walonoski, J.A., Macasek. M.A., Rasmussen, K.P. (2005) The Assistment Project: Blending Assessment and Assisting. The 12th Annual Conference on Artificial Intelligence in Education 2005, Amsterdam * Razzaq L., Heffernan, N.T. (2006). Scaffolding vs. Hint in the Assistment System. The 8th International Conference on Intelligent Tutoring Systems, 2006, Taiwan. The third scaffold The original question The fourth and last Scaffold

6 Scaling up a Server-Based Web Tutor Jozsef Patvarczki & Neil Heffernan Our research team has built a web-based tutor, located at www.ASSISTment.org [1], that is used by hundreds of students a day in Worcester and surrounding towns The system’s focus is to teach 8th and 10th grad mathematics and MCAS preparation. Because it is easily accessible, it helps lower the entry barrier for teachers and enable both teachers and researchers to collect data and generate reports. Scaling up a server-based intelligent tutoring system requires developers to care about speed and reliability. We will present how the Assistment system can improve performance and reliability with a fault-tolerant scalable architecture.www.ASSISTment.org Introduction Two concerns when running the Intelligent Tutor on a central server are: 1) building a scalable server architecture; 2) providing reliable service to researchers, teachers, and students. We will answer several research questions: 1) can we reduce the cost of authoring ITS; 2) how can we improve performance and reliability with a better server architecture. In order to server thousands of users, we must achieve high reliability and scalability at different levels. Scalability at our first entry point through the use of a virtual IP for www.assistment.org, provided by the CARP protocol.www.assistment.org Random and round-robin redirection algorithms can provide very effective load-sharing and the load-balancer distributes load over multiple application servers. This will allow us to redirect incoming web requests and build a web portal application in a multiple-server environment. The monitoring system uses Selenium has allowed us to send text messages to our administrators when the system goes down. Multiple database servers with automatic synchronization, pooling, and fail-over detection. System Scalability and Reliability Results Reference 1.Razzaq, L, Feng, M., Nuzzo-Jones, G., Heffernan, N.T. et. al (2005). The Assistment Project: Blending Assessment and Assisting. 12th Annual Conference on Artificial Intelligence in Education 2005, Amsterdam Since each public school classes have about 20 students, we noticed clusters (shown in ovals in the bottom left) of intervals where a single class was logged on. The log-on procedures is the most expensive step in the process and this data shows that this might be a good place for us to improve. We noticed a second cluster of around 40 users, which most likely represents instances where two classes of students were using the system simultaneously. There was no appreciable pattern towards a slower page creation time with more users. Three simulated scenarios with 10s random delay between student actions: In the first scenario we used 50 threads simulating 50 students working without load-balancer, one application server, and one database Second scenario with load-balancer and two application servers Third scenario with web-cache technique and load- balancer We seem to have able to get linear speed-up by the help of the load-balancer and an additional application server. We have a possibility to reduce the execution time of the computation intensive applications by the help of the GRID computing. This problem uses a pseudo- tutor (state-based implementation) with pre- made scaffolding and hint questions selected based upon student input. Incorrect responses are in red, and hints are in green. Assistment Features Contact: Neil Heffernan, nth@wpi.edu nth@wpi.edu Test TypeNumber of Unique Users Response Time [ms] Scenario 1/Normal508955 Scenario 2/Load-Balancer503624 Scenario 3/Web-Cache508073 Horizontal scaled configuration -Scalable -Fault-tolerant -Dynamically configurable Architecture HTTP server as Load Balancer Client’s actions represent the system’s load Users begin interacting with our system through the “Portal” that manages all activities Example of a State-based Pseudo Tutor Additional application servers for load balancing GRID computing: Bayesian Network Application WPI P-GRADE GRID Portal http://pgrade.wpi.edu http://pgrade.wpi.edu Workflow Editor and Manager Visualization and Resource Information System

7 How was the Skill Models Created

8

9 Fine grained skill models in reporting –Teachers get reports that they think are credible and useful. (Feng & Heffernan, 2005, 2006, 2007)

10

11

12 Research Question In the ASSISTment project, which approach works better on assessing students’ performance longitudinally? –Skill learning tracking? –Or using item difficulty parameter? (unidimensional)

13 Data Source 497 students of two middle schools Students used the ASSISTment system every other week from Sep. 2004 to May 2005 Real state test score in May 2005 Item level online data –students’ binary response (1/0) to items that are tagged in different skill models Some statistics –Average usage: 7.3 days –Average questions answered: 250 –138,000 data points

14 Data Source

15 Item Difficulty Parameter Fit one-parameter logistic (1PL) IRT model (Rasch model) on our online data the dependent variable: probability of correct response for student i to item n The independent variables: the person’s trait score and the item’s difficulty level.

16 Longitudinally Modeling Mixed-effects Logistic Regression Models Models we fitted –Model-beta : time + beta -> item response –Model-WPI5: time + skills in WPI5 -> item response –Model-WPI78: time + skills in WPI78 -> item response Evaluation – The accuracy of the predicted MCAS test score was used to evaluate different approaches Singer & Willet (2003). Applied Longitudinal Data Analysis. Oxford University Press: New York. Hedeker & Gibbions (in preparation). Longitudinal Data Analysis.

17 Results Students Real MCAS score Predicted MCAS score Absolute Difference between real score and predicted score Model- Beta Model- WPI-5 Model- WPI-78 Model- Beta Model- WPI-5 Model- WPI-78 Tom 22 20.9119.8617.281.092.143.72 Dick 26 24.1523.7620.961.852.245.04 Harry 25 19.0817.7616.215.927.247.79 Mary 25 20.4419.1818.384.565.825.62 … Lisa 9 17.0417.3515.878.048.356.87 %Error 13.63%13.15%11.97% > > P-values of both Paired t-tests are below 0.05

18 Conclusion We have found evidence that shows skill learning tracking can better predict MCAS score than simply using item difficulty parameter and fine-grained models did even better than coarse-grained model Our skill mapping is good (maybe not optimal) We are considering using these skills models in selecting the next best-problem to present a student with. Although we used Rasch model to train the item difficulty parameter, we were not modeling students' response with IRT. One interesting work will be comparing our results to predictions made through item response modeling approach.

19 Modeling Student Knowledge Using Bayesian Networks to Predict Student Performance By Zach Pardos –Neil Heffernan, Advisor – Computer Science Joint work with Brigham Anderson and Cristina Heffernan To evaluate the predictive performance of various fine-grained student skill models in the ASSISTment tutoring system using Bayesian networks. Goal ASSISTment is a web-based assessment system for 8 th -10 th grade math that tutors students on items they get wrong. There are 1,443 items in the system. The system is freely available at www.assistment.org Question responses from 600 students using the system during the 2004-2005 school year were used. Each student completed around 260 items each. The Skill Models The skill models were created for use in the online tutoring system called ASSISTment, founded at WPI. They consist of skill names and associations (or tagging) of those skill names with math questions on the system. Models with 1, 5, 39 and 106 skills were evaluated to represent varying degrees of concept generality. The skill models’ ability to predict performance of students on the system as well as on a standardized state test was evaluated. The five skill models used: WPI-106: 106 skill names were drafted and tagged to items in the tutoring system and to the questions on the state test by our subject matter expert, Cristina. WPI-5 and WPI-39: 5 and 39 skill names drafted by the Massachusetts Department of Education. WPI-1: Represents unidimensional assessment. Background on ASSISTment Predicting student responses within the ASSISTment tutoring system The ASSISTment fine-grained skill models excel at assessment of student skills (see Ming Feng’s poster for a Mixed-Effects approach comparison) Accurate prediction means teachers can know when their students have attained certain competencies. 1. Skill probabilities are inferred from a student’s responses to questions on the system Bayesian Belief Network Student Test Score Prediction Process This work has been accepted for publication at the 2007 User Modeling Conference in Corfu, Greece. SponsorsCollaborators Result: The finer-grained the model, the better prediction accuracy. The finest-grained WPI-106 performed the best with an average of only 5.5% error in prediction of student answers within the system. Result: The finest-grained model, the WPI-106, came in 2 nd to the WPI-39 which may have performed better than the 106 because 50% of its skills are sampled on the MCAS Test vs. only 25% of the WPI-106’s. Predicting student state test scores Conclusions A Bayesian Network is a probabilistic machine learning method. It is well suited for making predictions about unobserved variables by incorporating prior probabilities with new evidence. Bayesian Networks Arrows represent associations of skills with question items. They also represent conditional dependence in the Bayesian Belief Network. Probability of Guess is set to 10% (tutor questions are fill in the blank) Probability of getting the item wrong even if the student knows it is set to 5% 2. Inferred skill probabilities from above are used to predict the probability the student will answer each test question correctly Probabilities are summed to generate total test score. Probability of Guess is set to 25% (MCAS questions are multiple choice) Probability of getting the item wrong even if the student knows it is set to 5%

20 Tracking skill learning longitudinally


Download ppt "Assessing Students’ Performance Longitudinally: Item Difficulty Parameter vs. Skill Learning Tracking Mingyu Feng, Worcester Polytechnic Institute Neil."

Similar presentations


Ads by Google