Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.

Slides:



Advertisements
Similar presentations
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Advertisements

1. The pen may be mightier than the sword, but only if you know how to use it. 1. Choose your weapon wisely. 2. Stay focused. 3. Play by the rules. 2.
HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task Robert Dale, Ilya Anisimoff and George Narroway Centre for Language Technology.
Confidential and Proprietary. Copyright © 2010 Educational Testing Service. All rights reserved. Catherine Trapani Educational Testing Service ECOLT: October.
Rethinking Grammatical Error Detection and Evaluation with the Amazon Mechanical Turk Joel Tetreault[Educational Testing Service] Elena Filatova[Fordham.
A Statistical Model for Domain- Independent Text Segmentation Masao Utiyama and Hitoshi Isahura Presentation by Matthew Waymost.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Annie Louis University of Pennsylvania Derrick Higgins Educational Testing Service 1.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Alias Detection in Link Data Sets Master’s Thesis Paul Hsiung.
Evaluation.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Introduction to experimental errors
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at
Distributed Representations of Sentences and Documents
Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Review of normal distribution. Exercise Solution.
Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May , LREC.
Automated Essay Evaluation Martin Angert Rachel Drossman.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,
A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English Ryo Nagata et al. Hyogo University of Teacher Education ACL 2006.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
The Hypothesis of Difference Chapter 10. Sampling Distribution of Differences Use a Sampling Distribution of Differences when we want to examine a hypothesis.
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Oh-Woog Kwon KLE Lab. CSE POSTECH.
Detecting Promotional Content in Wikipedia Shruti Bhosale Heath Vinicombe Ray Mooney University of Texas at Austin 1.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
EDU 8603 Day 6. What do the following numbers mean?
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Detecting Influenza Outbreaks by Analyzing Twitter Messages By Aron Culotta Jedsada Chartree 02/28/11.
Prediction of Influencers from Word Use Chan Shing Hei.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Copyright © 2013 by Educational Testing Service. All rights reserved. 14-June-2013 Detecting Missing Hyphens in Learner Text Aoife Cahill *, Martin Chodorow.
Yuya Akita , Tatsuya Kawahara
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Emotion Detection in Customer Care Narendra Gupta, Mazin Gilbert, and Giuseppe Di Fabbrizio AT&T Labs - Research, Inc ACL.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.
Automatic acquisition for low frequency lexical items Nuria Bel, Sergio Espeja, Montserrat Marimon.
Higher National Certificate in Engineering Unit 36 –Lesson 4 – Parameters used to Describe the Normal Distribution.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
On using context for automatic correction of non-word misspellings in student essays Michael Flor Yoko Futagi Educational Testing Service 2012 ACL.
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
1 Hypothesis Tests on the Mean H 0 :  =  0 H 1 :    0.
Correcting Comma Errors in Learner Essays, and Restoring Commas in Newswire Text Ross Israel Indiana University Joel Tetreault Educational Testing Service.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
m/sampling_dist/index.html.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
A classifier-based approach to preposition and determiner error correction in L2 English Rachele De Felice, Stephen G. Pulman Oxford University Computing.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Statistical Sampling in Audit for the NSO Comptroller & Auditor General of India 1 Introduction to Statistical Sampling.
Click Through Rate Prediction for Local Search Results
Multiple Imputation using SOLAS for Missing Data Analysis
Chapter 11: Learning Introduction
Higher National Certificate in Engineering
Regression Computer Print Out
Introduction Task: extracting relational facts from text
Warm Up A 2010 study looked for an association between median SAT scores and 6 year graduation rates at 10 colleges. College SAT Grad rate College SAT.
Presentation transcript:

Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College and the Graduate Center

Outline  Introduction  Baselines  System Description  Evaluation  Conclusions

Introduction (1) Schools may have more after school sports. (2) I went to the dentist after school today. (3) My father like play basketball with me. Missing Hyphens :

Outline  Introduction  Baselines  System Description  Evaluation  Conclusions

Baselines (1) Collins Dictionary (2) More than 1,000 times in Wikipedia (3) Probability of the hyphenated form as estimated from Wikipedia is greater than 0.66

Outline  Introduction  Baselines  System Description  Evaluation  Conclusions

System Description Learner text: Schools may have more after school sports.

System Description Model: Logistic regression model Probability: Only predict a missing hyphen error when the probability of the prediction is >0.99

System Description SJM-trained: - San Jose Mercury News corpus - For training, hyphenated words are automatically split (i.e. well-known becomes well known) - The training data contains 1% of the positive examples and 3% of the negative examples

System Description Negative examples selected: Only contexts that occur more than 20 times are selected during training.

System Description Wiki-revision-trained: - Wikipedia articles

System Description

Combined: - Combine both data sources

Outline  Introduction  Baselines  System Description  Evaluation  Conclusions

Evaluation Artificial Data: - Brown corpus - taking 24,243 sentences - 2,072 hyphenated words

Evaluation

Learner Text: - CLC-FCE - The corpus contains 1,244 exam scripts - Totally 173 instances of missing hyphen errors Evaluation 1

Evaluation

There are 131 true positives for the learner data reveal that 87 of these are cases of a single type, the word “make-up”.

Evaluation Evaluation 2 Learner Text: - A data set of 1,000 student GRE and TOEFL essays - Drawn from 295 prompts - Ranged in length from 1 to 50 sentences - Average of 378 words per essay

Evaluation Learner Text (Cont.): - Manually inspect a random sample of 100 instances where each system detected a missing hyphen - Two native-English speakers judge - Using the Chicago Manual of Style as a guide - High agreement

Evaluation

Outline  Introduction  Baselines  System Description  Evaluation  Conclusions

Conclusions 1 ) Automatically detecting missing hyphen errors in learner text 2 ) The classifiers generally performed better than the baseline systems 3 ) Taking context into account when detecting the errors is important.