Using NLP to Support Scalable Assessment of Short Free Text Responses Alistair Willis Department of Computing and Communications, The Open University,

Slides:



Advertisements
Similar presentations
Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Advertisements

Visual Scripting of XML
Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Presenter: Han, Yi-Ti Adviser: Chen, Ming-Puu Date: Jan 19, 2009 Sitthiworachart, J. & Joy, M.(2008). Computer support of effective peer assessment in.
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
GIS Error and Uncertainty Longley et al., chs. 6 (and 15) Sources: Berry online text, Dawn Wright.
The use of a computerized automated feedback system Trevor Barker Dept. Computer Science.
A new system for Assignment Hand-in through Studynet Assignments V3 Meeting University needs for on-line hand-in Streamlining the marking and feedback.
Traditional Information Extraction -- Summary CS652 Spring 2004.
Investigating the use of short answer free-text questions in online interactive assessment Sally Jordan 17 th January 2008 Centre for Open Learning of.
The Use of Corpora for Automatic Evaluation of Grammar Inference Systems Andrew Roberts & Eric Atwell Corpus Linguistics ’03 – 29 th March Computer Vision.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
© AJC /18 Extended Matching Sets Questions for Numeracy Assessments: A Case Study Alan J. Cann Department of Microbiology & Immunology University.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Guidelines for Examination Candidates Raymond Hickey English Linguistics University of Duisburg and Essen (August 2015)
Teaching Fellow Admissions Tutor for Computer Science Director of Undergraduate Studies.
MASTERS THESIS DEFENSE QBANK A Web-Based Dynamic Problem Authoring Tool BY ANN PAUL ADVISOR: PROFESSOR CLIFF SHAFFER JUNE 2013 Computer Science Department.
Exploring Interactive Computer Marked Assessments for Logic Constructs in Level 1 Computing Centre for Open Learning of Mathematics, Science, Computing.
Interactive online assessment with teaching feedback – for open learners Valda Stevens and Sally Jordan Student Feedback and Assessment Meeting, 10 th.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Maintaining student engagement with formative assessment (using Questionmark) Tracey Wilkinson Centre for Biomedical Sciences Education School of Medicine,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 12 – Design Procedure.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
EAssessment Colin Milligan Heriot-Watt University.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Using Learning Technology to Design and Administer Student Assessments Catherine Kane Centre For Learning Technology, CAPSL Ref:
Designing interactive assessments to promote independent learning Sally Jordan, Phil Butcher & Richard Jordan The Open University Effective Assessment.
Eurostat Expression language (EL) in Eurostat SDMX - TWG Luxembourg, 5 Jun 2013 Adam Wroński.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
David L. Chen and Raymond J. Mooney Department of Computer Science The University of Texas at Austin Learning to Interpret Natural Language Navigation.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
AutoTutor Benjamin Kempe Tutoring Research Group, University of Memphis
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
RULES Patty Nordstrom Hien Nguyen. "Cognitive Skills are Realized by Production Rules"
Of An Expert System.  Introduction  What is AI?  Intelligent in Human & Machine? What is Expert System? How are Expert System used? Elements of ES.
Towards Semi-Automated Annotation for Prepositional Phrase Attachment Sara Rosenthal William J. Lipovsky Kathleen McKeown Kapil Thadani Jacob Andreas Columbia.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
E-assessment for learning? Short-answer free-text questions with tailored feedback Sally Jordan ENAC Potsdam August 2008 Centre for Open Learning of Mathematics,
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.
Introduction Chomsky (1984) theorized that language is an innate ability ingrained in all humans as expressed by universal grammar. Later, Mitchell and.
Andy Nguyen Christopher Piech Jonathan Huang Leonidas Guibas. Stanford University.
WP4 Models and Contents Quality Assessment
What is a CAT? What is a CAT?.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
ASSESSMENT OF STUDENT LEARNING
Social Knowledge Mining
CSc4730/6730 Scientific Visualization
Tomás Murillo-Morales and Klaus Miesenberger
Parsing Unrestricted Text
ONTOMERGE Ontology translations by merging ontologies Paper: Ontology Translation on the Semantic Web by Dejing Dou, Drew McDermott and Peishen Qi 2003.
Information Retrieval
Getting Started with Microsoft Azure Machine Learning
Presentation transcript:

Using NLP to Support Scalable Assessment of Short Free Text Responses Alistair Willis Department of Computing and Communications, The Open University, UK

e-assessment at the Open University –Short answer questions –assess detail and specific knowledge –ability to verbalise knowledge –cognitively different from Multiple Choice Questions: –student is unprompted –Valuable assessment method for distance learning –eg. Open University –MOOCS –Open University has large answer sets for training data –high population courses

e-assessment at the Open University –Online, automatically marked questions in use for Level 1 OU Science –successful in formative assessment –positive student feedback –moving towards summative assessment –BUT –Existing system uses (simplified) regular expressions –need to be written by computing expert –Difficult to write effective marking rules –(near) impossible for non-computing expert

Automatic assessment at the OU

Answer Server Student input Mark/ feedback Question mark scheme Virtual Learning Environment e-assessment architecture –Mark scheme maps student input onto mark –Can we simplify mark scheme authoring?

Requirements for automatic assessment system –Award of marks should be: –consistent –explanatory –why was this mark (not) awarded? –maintainable –by a domain expert –not necessarily a computer scientist

Requirements for automatic assessment system –Award of marks should be: –consistent –explanatory –why was this mark (not) awarded? –maintainable –by a domain expert –not necessarily a computer scientist –“Traceable” (eg. QAA requirement)

NLP for e-assessment –How much linguistic analysis is actually needed? –bag of words/keyword analysis not good enough high temperature and pressure( ) high pressure and temperature( ) temperature and high pressure( X ) –But deep parsing can overcommit (eg. IAT by Mitchell et al, 2002) ( high pressure ) and temperature v. high ( pressure and temperature )

Marking as Information Extraction –Treat marking as rule-based Information Extraction task –matching rules express the mark scheme (Junker et al 1999) –keyword recognition + linear word order –simple spelling correction ( edit distance ≤ 1 ) term(R, Term, I) template(R, Template, I) precedes(I1, I2) closely_precedes(I1, I2)

Example... term(R, oil, I) ∧ term(R, less, J) ∧ template(R, dens, K) ∧ precedes(I, J)  correct(R) oil is less dense than water( ) oil has less density than water( ) water is less dense than oil( X )

Amati –Amati system –make mark scheme authoring easy for tutors –domain specialist –not (necessarily) computing specialist –bootstrapping model load an initial set of responses mark responses (by hand) repeat: write/edit rules to reflect marks extend set of responses apply rules to extended set confirm/change assigned marks until no further improvement apply rules to remaining responses

Evaluation –Is the representation language expressive enough? –How good are the Amati mark schemes? –How do they compare to a gold standard? –How do they compare to a human marker? –Tested on 6 questions –Student responses from two presentations (2008/2009) –Mark scheme built on 2008 responses –Tested on 2009 responses (unseen)

Test Set Construction –How to get a gold standard/ground truth? –Human marks can be unreliable –Low inter-marker agreement –Multiple-pass annotation –Initially marked by two subject-specialist tutors –allowed to confer during marking –Module chair as final authority –third marker –also adjudicates on disagreements

Results QuestionNumber of responses Accuracy / %α / % Sandstone Snowflake Charge Rocks Sentence Oil –High accuracy –although task is highly constrained –α higher than human inter-marker agreement on the same questions –71.2% ≤ α ≤ 88.2%

Rule Induction –To induce mark schemes, try Inductive Logic Programming –Aleph (Srinivasan 2004) –Amati proposes rules from marked responses –Still requires manual post-editing –eg. predicted rule: –matches high pressure and temperature high pressure and heat –Question author can make more intuitively plausible –repeated runs improve the overall coverage term(R, high, A) ∧ term(R, pressure, B) ∧ term(R, and, C)  correct(R)

Some difficult responses –Lack of linguistic knowledge not necessarily a drawback The oil floats on the water because it is lighter The oil floats on the water because it is heavier –Applying BOD, mark both correct –better than attempting to resolve pronoun –easier with linear word order than syntactic analysis

Some difficult responses –Hard to mark where students can give particular examples A snowflake falls vertically with a constant speed. What can you say about the forces acting on the snowflake? –abstract answers easy to mark –They are equal and opposite –no net forces –all forces cancel out –etc. –Much harder to predict all specific examples –the gravity and air resistance are the same –gravity is counteracted by drag –etc.

Discussion –Ongoing work –User studies –Formal comparison of responses marked “by hand” and with Amati –Compare time to mark and inter-marker agreement –Linguistic complexity –Multiple-mark responses –Better synonym handling –Feedback generation

Thank you!