1 / 12 PSLC Summer School, June 21, 2007 Identifying Students’ Gradual Understanding of Physics Concepts Using TagHelper Tools Nava L.

Slides:



Advertisements
Similar presentations
Learning Algorithm Evaluation
Advertisements

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Chapter 1 What is Science
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
K nearest neighbor and Rocchio algorithm
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Chapter 2: Algorithm Discovery and Design
Lecture 5 (Classification with Decision Trees)
1 Validation and Verification of Simulation Models.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
Chapter 2: Algorithm Discovery and Design
Chapter 2: Algorithm Discovery and Design
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Chapter 2 Theoretical Perspectives and Methods of Social Research Key Terms.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
CLassification TESTING Testing classifier accuracy
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Framework for Diagnostic Teaching. Framework The framework for diagnostic teaching places a premium on tailoring programs that specifically fit all readers.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
TagHelper: Basics Part 1 Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval.
Quick Glance At ACTASPIRE Math
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition.
Evaluating a Research Report
1 Issues in Assessment in Higher Education: Science Higher Education Forum on Scientific Competencies Medellin-Colombia Nov 2-4, 2005 Dr Hans Wagemaker.
Copyright © Cengage Learning. All rights reserved. CHAPTER 5 Extending the Number System.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
A Language Independent Method for Question Classification COLING 2004.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
1 ICPR 2006 Tin Kam Ho Bell Laboratories Lucent Technologies.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
1 Chapter 3 1.Quality Management, 2.Software Cost Estimation 3.Process Improvement.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Concept Learning General-to Specific Ordering
Class Imbalance in Text Classification
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Software Quality Assurance and Testing Fazal Rehman Shamil.
What is Science? SECTION 1.1. What Is Science and Is Not  Scientific ideas are open to testing, discussion, and revision  Science is an organize way.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Introduction to Research
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
November BLT Training. 2 Outcomes Build team collegiality Look at and start the first steps for Take One! Identify strategies to address district low.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Survey Training Pack Session 3 – Questionnaire Design.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Unit 1 Lesson 2 Scientific Investigations Copyright © Houghton Mifflin Harcourt Publishing Company.
Machine Learning: Ensemble Methods
Lecture 1.31 Criteria for optimal reception of radio signals.
DATA COLLECTION METHODS IN NURSING RESEARCH
Introduction to Machine Learning
Perceptrons Lirong Xia.
K Nearest Neighbors and Instance-based methods
Data Mining Lecture 11.
LECTURE 23: INFORMATION THEORY REVIEW
Chapter 7: Transformations
Perceptrons Lirong Xia.
Presentation transcript:

1 / 12 PSLC Summer School, June 21, 2007 Identifying Students’ Gradual Understanding of Physics Concepts Using TagHelper Tools Nava L. Oren E. University of Utah

2 / 12 PSLC Summer School, June 21, 2007 Driving Research Question Can a machine identify students’ gradual understanding of physics concepts? Ignoring irrelevant data Basic notions Advanced principles Transfer principles to complex scenarios Time Student Conceptual Learning Hypothesis - IBAT learning model: Students learn in four stages.

3 / 12 PSLC Summer School, June 21, 2007 Outline Data Collection Students’ constructed responses to physics questions Human teacher response classification = the reference for analysis Data Analysis TagHelper Tools Discriminatory classifiers: Naïve Bayes, SMO User-defined features Results Discussion How well do TagHelper Tools delineate the four stages of students’ conceptual understanding? Lessons Learned from the Summer School & TagHelper Tools

4 / 12 PSLC Summer School, June 21, 2007 Data Collection Data unit = student constructed response to open-ended physics question: “Acceleration is defined as the final amount subtracted from the initial amount divided by the time.” 840 student responses collected Development Set: 420 randomly selected responses Validation Set: the other 420 responses Responses were classified by human teachers into 55 concepts, aggregated into four main categories. Irrelevant Basic notions: e.g. no gravity in vacuum, definition of force Advanced principles: e.g. zero net force [implies body at rest] Complex scenarios: e.g. man drops keys in an elevator

5 / 12 PSLC Summer School, June 21, 2007 Data Analysis: Rationale TagHelper Tools can analyze any text response; which algorithm and option set is best for this type of data set? Objective: detect four ordered stages use a discriminatory classifier Naïve Bayes: uses cumulative evidence to distinguish among records Support Vector Machines (SMO): finds distinguished groups in data Models must exhibit reasonable predictions for both the training and validation sets to ensure reliability User features should mainly delineate among scenarios ANY ( EGG, CLOWN ) ALL ( PUMPKIN, ANY ( PERSON, MAN ) ) ANY ( KEYS, ELEVATOR) Shooting for reliability index κ ~

6 / 12 PSLC Summer School, June 21, 2007 Data Analysis: Models Best models Model A: Naïve Bayes, no POS, no user-defined features Model B: Naïve Bayes, no POS, with user-defined features Model C: SMO, no POS, exponent = 2.0, no user-defined features Model D: SMO, no POS, exponent = 2.0, with user-defined features Procedure Models were trained on the development set using cross-validation Evaluation measures: κ (>0.5), % Correctly Classified Instances (> 60%) If measures were reasonable, model was further tested on validation set

7 / 12 PSLC Summer School, June 21, 2007 Results on Development Set* ModelCorrectly Classified Instances Kappa ( κ ) reliability index A (NB)71%0.544 B (NB + user features)72%0.570 C (SMO)73%0.598 D (SMO + user features)76%0.636 * The model was trained on the development set by dividing it into 10 chunks and running cross-validation among the chunks.

8 / 12 PSLC Summer School, June 21, 2007 Results: Development vs. Validation Set ModelCorrectly Classified Instances – Development Set Correctly Classified Instances – Validation Set A (NB)71%67% B (NB + user features)72%50% C (SMO)73%48% D (SMO + user features)76%35%

9 / 12 PSLC Summer School, June 21, 2007 Discussion #1 Best model was Naïve Bayes with no user-defined features; it had the lowest κ for the development set, but the highest prediction for the validation set and uniform overall performance. Watch out for and optimize development/validation tradeoff Why didn’t the models generalize well? This may be due to the large skew of the data, causing a large variability even between the development and validation sets. Data skew is evident when optimizing the SMO exponent (for non-skewed data, the optimal exponent=1; here it is 2). This may also be the reason why SMO was not superior to NB. Check data skew (indicated by optimal SMO exponent not equal to 1) Analysis on the non-aggregated 55 concepts resulted in a higher κ = 0.61, however the confusion matrix is much larger. Difficult to interpret errors. Strive for a small number of distinct categories

10 / 12 PSLC Summer School, June 21, 2007 Discussion #2: Error Analysis Error analysis provides a fine-grained perspective of the data and sheds light on the characteristic error patterns made by TagHelper. Identify large entries in the confusion matrix Look at response examples that represent dominant error types Design user features to eliminate the errors IBAT I B21207 A T Notation: I = Irrelevant responses B = Basic notions A = Advanced principles T = transfer to complex scenarios

11 / 12 PSLC Summer School, June 21, 2007 Summary In short, the answer to the driving research question is YES, A MACHINE CAN IDENTIFY STUDENTS’ GRADUAL LEARNING IN PHYSICS. Students develop their conceptual understanding in physics in four stages, that correspond to the four categories found in the data (see page 2): 1.Learning to ignore irrelevant data and focus on the relevant knowledge components 2.Getting familiar with basic notions. 3.Learning advanced principles that use the basic notions. 4.Transfer of the principles to complex real-life scenarios. Each scenario is likely to involve multiple principles.

12 / 12 PSLC Summer School, June 21, 2007 Lessons Learned TagHelper Tools can distinguish between different data categories that represent different knowledge components. There is a trade-off between fitting to training set and performance on validation set. We chose the model that optimized this trade-off. The quality of conclusions is limited by the quality of the data. In our case the model validation was reasonable, because the responses were drawn from multiple students but the individual students were not indicated. TagHelper Tools is a state-of-the-art machine learning framework, but its analysis is limited to identifying structured patterns within its feature space. The default feature space includes simple patterns only, but adding creative user features is the key to making TagHelper Tools even more powerful. Future directions may generalize TagHelper Tools to more flexible types of structural text patterns and incorporating imported data from other parsers (e.g. mathematical expression parsers).