Correcting Comma Errors in Learner Essays, and Restoring Commas in Newswire Text Ross Israel Indiana University Joel Tetreault Educational Testing Service.

Slides:



Advertisements
Similar presentations
1 Train to Gain Continuing professional development ILP Targets for Company Classes EHWL College This resource was produced as part of the Effective Practice.
Advertisements

The English House of Commas
Changes to the Provincial Exam. Previously, the Provincial was written in 2 sittings: a morning session (2 ½ hours) and an afternoon session (2 hours)
INTRODUCTION TO L3 P1 AND P2 MATERIALS A training session for Senior Mentors.
A learner corpus of students’ examination work in English language (a project) Sylwia Twardo Centre for Foreign Language Teaching, Warsaw University, Poland.
M. George Physics Dept. Southwestern College
SAT Preparation Writing Section: The Essay. Today’s Agenda minutes: Review SAT essay scoring rubric, strategies, and sample essays minutes:
Event Extraction Using Distant Supervision Kevin Reschke, Martin Jankowiak, Mihai Surdeanu, Christopher D. Manning, Daniel Jurafsky 30 May 2014 Language.
Methods in Computational Linguistics II Queens College Lecture 1: Introduction.
Rethinking Grammatical Error Detection and Evaluation with the Amazon Mechanical Turk Joel Tetreault[Educational Testing Service] Elena Filatova[Fordham.
Writing the report.
College Entrance Exams An overview of the SAT I, SAT II, and ACT.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Procedural Writing Writing a How-To Paper.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
WHO WAS CONFUCIUS? Westfield State College Computers In Education Fall 2008 Megan M Banks.
Page 1 NAACL-HLT BEA Los Angeles, CA Annotating ESL Errors: Challenges and Rewards Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign.
Part of speech (POS) tagging
Introduction to Machine Learning Approach Lecture 5.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Preposition Usage Errors by English as a Second Language (ESL) learners: “ They ate by* their hands.”  The writer used by instead of with. This work is.
Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May , LREC.
ELN – Natural Language Processing Giuseppe Attardi
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Purdue University Writing Lab 1 Global Business Writing Powerful Business Writing Skills for ESL Writers February 10, 2013.
Welcome to Grade 1 Second Semester! Teacher Debbie & Teacher Iris.
Section I Concept Development in Mathematics and Science Unit 4 Assessing the Child’s Developmental Level ©2013 Cengage Learning. All Rights Reserved.
Lily  It is the kind of writing used in high school and college classes.  Academic writing is different from creative writing, which is the kind.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
1 Project of Reading Course Development Designer: Erin M Instructor: Mavis Shang Date: 06/09/2008.
GRAMMAR That nasty little part of writing…... What is it? Grammar is the study of the structure of language including its: Syntactic structures Patterns.
Tentative Unit 1 Schedule Week 2 1/19- MLK Day-No Class 1/21-Using library databases (bring computer to class) 1/23- Intro to Exploratory Narrative & Source.
Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Lexical Analysis Hira Waseem Lecture
Our assessment objectives
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Cambridge IGCSE -GCSEs taken in England at the age of Taken in a variety of subjects -Graded from A* - G -At the EABJM: English Language and Literature.
Introduction the TKT teacher certificates
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Sheffield -- Victims of Mad Cow Disease???? Or is it really possible to develop a named entity recognition system in 4 days on a surprise language with.
Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Communicative and Academic English for the EFL Professional.
March 2006Introduction to Computational Linguistics 1 CLINT Tokenisation.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.
How to Write Lesson Plan Using the Project-Based Instructional Model.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
CiSELT Module 6.1: EVP. 1. Introduction v a n r t i g o a l t c a i n i n o Vocational training Did you receive training for a job? What job?When? Is.
A classifier-based approach to preposition and determiner error correction in L2 English Rachele De Felice, Stephen G. Pulman Oxford University Computing.
Monitoring and Assessment Presented by: Wedad Al –Blwi Supervised by: Prof. Antar Abdellah.
English Course Structure National 3 to National 5.
T HE W RITING P ROCESS :D RAFTING, R EVISING, AND E DITING Paola Álvarez Ezqueda English 6th semester.
International English Language Testing System (IELTS) Listening WritingSpeaking.
Unit 1 Language testing. About test: What is testing?  A procedure for critical evaluation; a means of determining the presence, quality, or truth of.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Language Identification and Part-of-Speech Tagging
ACT English Test Preparation
Sentiment analysis algorithms and applications: A survey
PSSA Parent University
Exploring the BNC Corpus
Using GOLD to Tracking L2 Development
The Nature of learner language
Presentation transcript:

Correcting Comma Errors in Learner Essays, and Restoring Commas in Newswire Text Ross Israel Indiana University Joel Tetreault Educational Testing Service Martin Chodorow Hunter College of CUNY 2012 NAACL

Outline  [1. Introduction ]  [2.Comma Usage]  [3.Classifier and Features]  [4.Comma Restoration]  [5.Annotation]  [6.Error Correction] Error Analysis

1.Introduction Proper comma placement can lead to faster reading times and reduce the need to re-read entire sentences. Commas also help remove or reduce problems arising from difficult ambiguities; the garden path effect. e.g., The old man the boat.

1.Introduction A quick examination of English learner essays reveals a variety of errors, with writers both overusing and underusing commas in certain contexts. Consider examples (1) and (2):

Outline  [1. Introduction ]  [2.Comma Usage]  [3.Classifier and Features]  [4.Comma Restoration]  [5.Annotation]  [6.Error Correction] Error Analysis

2.Comma Usage

Outline  [1. Introduction ]  [2.Comma Usage]  [3.Classifier and Features]  [4.Comma Restoration]  [5.Annotation]  [6.Error Correction] Error Analysis

3.Classifier and Features (Classifier) We use CRF as the basis for our system and treat the task of comma insertion as a sequence labeling task; each space between words is considered by the classifier, and a comma is either inserted or not.

3.Classifier and Features (Features) If the teacher easily gets mad, then the child will always fear going to school and class.

Outline  [1. Introduction ]  [2.Comma Usage]  [3.Classifier and Features]  [4.Comma Restoration]  [5.Annotation]  [6.Error Correction] Error Analysis

4.Comma Restoration Before applying our system to the task of error correction, we tested its utility in restoring commas in newswire texts. Specifically, we evaluate on section 23 of the WSJ, training on sections

4.Comma Restoration After stripping all commas from our test data, the text is tokenized and POS tagged using a maximum entropy tagger (Ratnaparkhi, 1996).

4.Comma Restoration

However, they evaluate on the entire WSJ section of the Penn Treebank, so it is not totally fair to compare results.

Outline  [1. Introduction ]  [2.Comma Usage]  [3.Classifier and Features]  [4.Comma Restoration]  [5.Annotation]  [6.Error Correction] Error Analysis

5.Annotation For the comma restoration task, we needed only to obtain well-formed text and remove the commas to produce a test set. After a one-hour training session on comma usage rules, three native English speakers were given a set of ten learner essays comprising 3,665 tokens to annotate for comma errors.

5.Annotation

After completing the training phase, we assigned one annotator the task of annotating our development and test data from two different corpora: 1)essays written by English as a foreign language learners (EFL) 2)essays written by native speakers of English (Native). For both data sets we selected 60 essays for development and 60 essays for test.

5.Annotation

Outline  [1. Introduction ]  [2.Comma Usage]  [3.Classifier and Features]  [4.Comma Restoration]  [5.Annotation]  [6.Error Correction] Error Analysis

6.Error Correction In these experiments, we use the annotated essays described in section 5 for evaluation and train on 40,000 sentences taken from essays written by both native and non-native high level college students.

6.Error Correction We also augment the system with three postprocessing filters that we tuned on the development set. 1) The first filter requires that the classifier be at least 100% confident in a decision to insert a new comma. 2) The second filter requires that the classifier be at least 90% confident in a decision to insert a new comma. 3) The final filter, which overrides any other information provided by the system, does not allow commas to be inserted before the word because.

6.Error Correction

6.Error Correction (Error Analysis) 1) Here we can get specific knowledge in the science that we like the most. 2) I entered college, I could learn it and make an effort to achieve my goal. 3) They wants to see their portfolio, and what kind of skill do they have for company. 4) I have many things but the best is my parents. 5) erroneous: In the other hand, having just one specific subject, which represents a great downfall for many students. corrected: On the other hand, knowing only one subject is a downfall for many students.