Download presentation
Presentation is loading. Please wait.
Published byLouisa Fox Modified over 9 years ago
1
1 / 12 PSLC Summer School, June 21, 2007 Identifying Students’ Gradual Understanding of Physics Concepts Using TagHelper Tools Nava L. Livnenlivne@aoce.utah.edu Oren E. Livneolivne@aoce.utah.edu University of Utah
2
2 / 12 PSLC Summer School, June 21, 2007 Driving Research Question Can a machine identify students’ gradual understanding of physics concepts? Ignoring irrelevant data Basic notions Advanced principles Transfer principles to complex scenarios Time Student Conceptual Learning Hypothesis - IBAT learning model: Students learn in four stages.
3
3 / 12 PSLC Summer School, June 21, 2007 Outline Data Collection Students’ constructed responses to physics questions Human teacher response classification = the reference for analysis Data Analysis TagHelper Tools Discriminatory classifiers: Naïve Bayes, SMO User-defined features Results Discussion How well do TagHelper Tools delineate the four stages of students’ conceptual understanding? Lessons Learned from the Summer School & TagHelper Tools
4
4 / 12 PSLC Summer School, June 21, 2007 Data Collection Data unit = student constructed response to open-ended physics question: “Acceleration is defined as the final amount subtracted from the initial amount divided by the time.” 840 student responses collected Development Set: 420 randomly selected responses Validation Set: the other 420 responses Responses were classified by human teachers into 55 concepts, aggregated into four main categories. Irrelevant Basic notions: e.g. no gravity in vacuum, definition of force Advanced principles: e.g. zero net force [implies body at rest] Complex scenarios: e.g. man drops keys in an elevator
5
5 / 12 PSLC Summer School, June 21, 2007 Data Analysis: Rationale TagHelper Tools can analyze any text response; which algorithm and option set is best for this type of data set? Objective: detect four ordered stages use a discriminatory classifier Naïve Bayes: uses cumulative evidence to distinguish among records Support Vector Machines (SMO): finds distinguished groups in data Models must exhibit reasonable predictions for both the training and validation sets to ensure reliability User features should mainly delineate among scenarios ANY ( EGG, CLOWN ) ALL ( PUMPKIN, ANY ( PERSON, MAN ) ) ANY ( KEYS, ELEVATOR) Shooting for reliability index κ ~ 0.6-0.7
6
6 / 12 PSLC Summer School, June 21, 2007 Data Analysis: Models Best models Model A: Naïve Bayes, no POS, no user-defined features Model B: Naïve Bayes, no POS, with user-defined features Model C: SMO, no POS, exponent = 2.0, no user-defined features Model D: SMO, no POS, exponent = 2.0, with user-defined features Procedure Models were trained on the development set using cross-validation Evaluation measures: κ (>0.5), % Correctly Classified Instances (> 60%) If measures were reasonable, model was further tested on validation set
7
7 / 12 PSLC Summer School, June 21, 2007 Results on Development Set* ModelCorrectly Classified Instances Kappa ( κ ) reliability index A (NB)71%0.544 B (NB + user features)72%0.570 C (SMO)73%0.598 D (SMO + user features)76%0.636 * The model was trained on the development set by dividing it into 10 chunks and running cross-validation among the chunks.
8
8 / 12 PSLC Summer School, June 21, 2007 Results: Development vs. Validation Set ModelCorrectly Classified Instances – Development Set Correctly Classified Instances – Validation Set A (NB)71%67% B (NB + user features)72%50% C (SMO)73%48% D (SMO + user features)76%35%
9
9 / 12 PSLC Summer School, June 21, 2007 Discussion #1 Best model was Naïve Bayes with no user-defined features; it had the lowest κ for the development set, but the highest prediction for the validation set and uniform overall performance. Watch out for and optimize development/validation tradeoff Why didn’t the models generalize well? This may be due to the large skew of the data, causing a large variability even between the development and validation sets. Data skew is evident when optimizing the SMO exponent (for non-skewed data, the optimal exponent=1; here it is 2). This may also be the reason why SMO was not superior to NB. Check data skew (indicated by optimal SMO exponent not equal to 1) Analysis on the non-aggregated 55 concepts resulted in a higher κ = 0.61, however the confusion matrix is much larger. Difficult to interpret errors. Strive for a small number of distinct categories
10
10 / 12 PSLC Summer School, June 21, 2007 Discussion #2: Error Analysis Error analysis provides a fine-grained perspective of the data and sheds light on the characteristic error patterns made by TagHelper. Identify large entries in the confusion matrix Look at response examples that represent dominant error types Design user features to eliminate the errors IBAT I332515 B21207 A 015320 T2882695 Notation: I = Irrelevant responses B = Basic notions A = Advanced principles T = transfer to complex scenarios
11
11 / 12 PSLC Summer School, June 21, 2007 Summary In short, the answer to the driving research question is YES, A MACHINE CAN IDENTIFY STUDENTS’ GRADUAL LEARNING IN PHYSICS. Students develop their conceptual understanding in physics in four stages, that correspond to the four categories found in the data (see page 2): 1.Learning to ignore irrelevant data and focus on the relevant knowledge components 2.Getting familiar with basic notions. 3.Learning advanced principles that use the basic notions. 4.Transfer of the principles to complex real-life scenarios. Each scenario is likely to involve multiple principles.
12
12 / 12 PSLC Summer School, June 21, 2007 Lessons Learned TagHelper Tools can distinguish between different data categories that represent different knowledge components. There is a trade-off between fitting to training set and performance on validation set. We chose the model that optimized this trade-off. The quality of conclusions is limited by the quality of the data. In our case the model validation was reasonable, because the responses were drawn from multiple students but the individual students were not indicated. TagHelper Tools is a state-of-the-art machine learning framework, but its analysis is limited to identifying structured patterns within its feature space. The default feature space includes simple patterns only, but adding creative user features is the key to making TagHelper Tools even more powerful. Future directions may generalize TagHelper Tools to more flexible types of structural text patterns and incorporating imported data from other parsers (e.g. mathematical expression parsers).
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.