Annie Louis University of Pennsylvania Derrick Higgins Educational Testing Service 1.

Slides:



Advertisements
Similar presentations
Writing constructed response items
Advertisements

Progress Monitoring. Progress Monitoring Steps  Monitor the intervention’s progress as directed by individual student’s RtI plan  Establish a baseline.
What is Word Study? PD Presentation: Union 61 Revised ELA guide Supplement (and beyond)
Test of English as a Foreign Language - Measures English language proficiency and aptitude - College or university admissions requirement - World’s accessible.
Programming Paradigms and languages
Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
Decision Errors and Power
Objective: 1)Students will be able to put standards to a high school proficient funnel paragraph through a short discussion/lecture. 2)Students will create.
Writing Section Integrated Task Independent Task Essay Planning (*) Tips (*) for the Independent Task.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
Validation and Monitoring Measures of Accuracy Combining Forecasts Managing the Forecasting Process Monitoring & Control.
Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group.
K nearest neighbor and Rocchio algorithm
Feature Selection for Regression Problems
Ensemble Learning: An Introduction
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Courses. General English 1. Low Beginner Overview: Low Beginner enables learners to gain a general knowledge of basic English through learning how to.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
Article Summary – EDU 215 Dr. Megan J. Scranton 1.
Preparing For the N.J. GEPA What Skills Do Students Need?
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –T ransformation B ased E rror D riven.
(2) Using age-appropriate activities, students expand their ability to perform novice tasks and develop their ability to perform the tasks of the intermediate.
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English Ryo Nagata et al. Hyogo University of Teacher Education ACL 2006.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
TEACHING VOCABULARY Калинина Е.А. доцент кафедры филологического образования СарИПКиПРО.
1 / 12 PSLC Summer School, June 21, 2007 Identifying Students’ Gradual Understanding of Physics Concepts Using TagHelper Tools Nava L.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
How to write better text responses A Step by Step Guide.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
1 Query Operations Relevance Feedback & Query Expansion.
Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
ESL Teacher Networking Meeting Session - 2 Raynel Shepard, Ed.D.
“I Can” Learning Targets 4 th English/Writing 5th Six Weeks.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Lectures ASSESSING LANGUAGE SKILLS Receptive Skills Productive Skills Criteria for selecting language sub skills Different Test Types & Test Requirements.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
SEAMEO RETRAC: International Conference on TESOL English for All 15 – 17 September
GRE Test Preparation Workshop for Campus Educators Overview of the GRE Program.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A Ralph Grishman NYU.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.
Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.
Written Assignment NOTES AND TIPS FOR STUDENTS.  MarksLevel descriptor 0The work does not reach a standard described by the descriptors below. 1–2The.
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
READY TO TEACH SM EDI ® Lessons ©2013 All rights reserved. EDI Lesson Overview 4 th Grade Math Learning Objective: We will compare multi-digit numbers.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Abstracting.  An abstract is a concise and accurate representation of the contents of a document, in a style similar to that of the original document.
Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
From Frequency to Meaning: Vector Space Models of Semantics
Year 2 Stay and Play!.
Microsoft Word - Formatting Pages
Proportion of Original Tweets
Presentation transcript:

Annie Louis University of Pennsylvania Derrick Higgins Educational Testing Service 1

 Limit opportunity to game educational software  Compare with previously written essays for a question ◦ But not always available  Compare with question text ◦ No training data Some times question texts are very short Higgins et al

 ‘In the past, people were more friendly than they are today’  Task - write an essay for/against the opinion expressed by the prompt 3

 Two methods for better comparison of essays and prompt text ◦ Expansion of prompt text ◦ Spelling correction of essay text  Questions with approx content words  Lower error rates for off-topic essay detection 4

5

Skill LevelTaskAvg. prompt length  English writing tasks in high stakes tests ◦ GRE, TOEFL TOEFL GRE Learners of English Applicants to graduate school Write a summary of a passage Write an argument for/against opinion in the prompt

 7 prompts for each task  350 essays written to each prompt ◦ Positive examples  350 essays randomly sampled from responses to other prompts ◦ Pseudo negative examples 7

 Vector space similarity ◦ Dimensions ~ prompt and essay content words ◦ Values ~ tf*idf essay Higgins et al Target prompt Reference prompts (9) Similarity with target prompt is highest? On-topic Off-topic yes no 8

 Two error rates Off-topic On-topic Off-topic ActualPredicted False negative False positive  Important to keep low false positive rates ◦ Incorrect flagging of an on-topic essay undesirable 9

Avg. prompt length Skill levelAvg. false positive rate Avg. false negative rate * Learners – TOEFL Advanced - GRE Advanced Advanced Learners Higher false positive rates Error rates are higher for essays written by learners 10

 Prompt length ◦ Shorter prompts provide little evidence  Skill level of test taker ◦ Essays written by learners harder to classify  Spelling correction of essay text ◦ Reduce problems due to spelling errors  Expand prompt text before comparison ◦ Adding related words to provide more context 11

Inflected forms Synonyms Distributionally similar words Word association norms 12

 friendly ~ friend, friendlier, friendliness…  Simplest, very restrictive  Tool from Leacock and Chodorow, 2003 ◦ Add/modify prefix suffixes of words ◦ Rule-based 13

 Friendly ~ favorable, well-disposed…  Synonyms from WordNet for a chosen sense ◦ Involves a word sense disambiguation step  In-house tool 14

 Friendly ~ cordial, cheerful, hostile, calm…  Words in same context as prompt words ◦ Corpus driven – implemented by Lin 1998 ◦ Less restrictive – related words, antonyms… ◦ Cutoff on similarity value  Tool from Leacock and Chodorow

 friendly ~ smile, amiable, greet, mean…  Produced by humans participants during psycholinguistic experiments ◦ Target word  mention related words  Lists collected by University of South Florida ◦ 5000 target words, 6000 participants ◦ Nelson et al

 Original prompts ~ content words  After expansion ◦ 87 (WAN) – 229 (distributional similarity)  Simple heuristic weights Prompt word typeWeight Original20 Expansions1 17

 ‘Directed’ spelling correction ◦ Focus on correcting words similar to a ‘target’ list  Target list = prompt words ◦ Either original or expanded prompt  Tool from Leacock and Chodorow,

Prompt only Synonym Dist similarity WAN Inflected forms False neg.False pos. Infl + WAN Spell corr Spell + Infl + WAN All expansions lower false pos. Spelling correction works even better Overall best 19

Prompt only Synonym Dist similarity WAN Inflected forms False neg.False pos. Spell corr Spell + WAN Best expansion - WAN No benefits from spelling correction Overall best 20

ExpansionSpelling correction Essays by learners Advanced users  Overall best = expansion + spelling correction ◦ For both essay types  Usefulness of methods ~ population- dependent 21

 Accuracy of prompt-based off-topic detection depends on ◦ Prompt length ◦ Essay properties  We have introduced two methods to improve performance ◦ Prompt expansion ◦ Spelling correction  Improved performance ◦ Individually and combination of both methods 22

23