The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Communication is the KEY: Co-Teaching & Co-Planning
Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.
Fluency Builders Reading For all Learners Written By: Dr. Alan Hofmeister Presented by: Kathy Warnick.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
Rethinking Grammatical Error Detection and Evaluation with the Amazon Mechanical Turk Joel Tetreault[Educational Testing Service] Elena Filatova[Fordham.
Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 9 Subjective Test Items.
Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May , LREC.
Automated Essay Evaluation Martin Angert Rachel Drossman.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
AP English Language & Composition Exam Review
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Invitation to Computer Science, Java Version, Second Edition.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010.
A Web Application for Customized Corpus Delivery Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science Vassar College USA.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Researcher affiliation extraction from homepages I. Nagy, R. Farkas, M. Jelasity University of Szeged, Hungary.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
The Writing Section of the SAT Strategies for the Multiple Choice Questions.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Tentative Unit 1 Schedule Week 2 1/19- MLK Day-No Class 1/21-Using library databases (bring computer to class) 1/23- Intro to Exploratory Narrative & Source.
1 Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Cornell University Department of Computer Science.
Resources: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al. Helping Our Own: The HOO 2011 Pilot Shared Task, Dale and Kilgarriff.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Letter Grades in College English College instructors usually consider these four features when they evaluate writing. Length and Manuscript Format Topic.
Natural Language Processing Course Project: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Communicative and Academic English for the EFL Professional.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Detection of Spelling Errors in Swedish Clinical Text Nizamuddin Uddin and Hercules Dalianis Department of Computer and Systems Sciences, (DSV)
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
Correcting Comma Errors in Learner Essays, and Restoring Commas in Newswire Text Ross Israel Indiana University Joel Tetreault Educational Testing Service.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Language Identification and Part-of-Speech Tagging
Social Knowledge Mining
The CoNLL-2014 Shared Task on Grammatical Error Correction
Grammar correction – Data collection interface
Chapter 10: Compilers and Language Translation
Presentation transcript:

The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew Mei Wu 2 Joel Tetreault 3 (1) Department of Computer Science, NUS (2) Centre for English Language Communication, NUS (3) Nuance Communications, Inc. CoNLLST 2013

Introduction Grammatical error correction is the shared task of the Seventeenth Conference on Computational Natural Language Learning in 2013 (CoNLL-2013). In this task, given an English essay written by a learner of English as a second language, the goal is to detect and correct the grammatical errors present in the essay, and return the corrected essay. 2

Task Definition To illustrate, consider the following sentence: –Although we have to admit some bad effect which is brought by the new technology, still the advantages of the new technologies cannot be simply discarded. The noun number error effect needs to be corrected (effect → effects). This necessitates the correction of a subject- verb agreement error (is → are). 3

Task Definition (Cont’d) 4

Training Data The training data provided in our shared task is the NUCLE corpus, the NUS Corpus of Learner English (Dahlmeier et al., 2013). The essays were written in response to some prompts, and they cover a wide range of topics, such as environmental pollution, health care, etc. The grammatical errors in these essays have been hand-corrected by professional English instructors at NUS. 5

Error Annotation The error annotation, also called correction or edit. To illustrate, consider the following sentence at the start of the first paragraph of an essay: –From past to the present, many important innovations have surfaced. There is an article/determiner error (past → the past) in this sentence. 6

Error Annotation (Cont’d) The error annotations are saved as stand-off annotations, in SGML format. ArtOrDet the past –start_par ( end_par ) denotes the paragraph ID of the start (end) of the erroneous text span. –start_off ( end_off ) denotes the character offset of the start (end) of the erroneous text span. –The error tag is ArtOrDet, and the correction string is the past. 7

Preprocessed form of NUCLE The preprocessing operations performed on the NUCLE essays include: 1.Sentence segmentation and word tokenization using the NLTK toolkit (Bird et al., 2009) 2.Part-of-speech (POS) tagging, constituency and dependency tree parsing using the Stanford parser (Klein and Manning, 2003; de Marneffe et al., 2006) To help participating teams in their preparation for the shared task. 8

NUCLE Corpus (release 2.3 version) 9

Test Data Essays written using the first prompt are present in the NUCLE training data, while the second prompt is a new prompt not used previously. 10

Evaluation Metric (1/3) A grammatical error correction system is evaluated by how well its proposed corrections or edits match the gold-standard edits. To illustrate, consider the following tokenized sentence S written by an English learner: –There is no a doubt, tracking system has brought many benefits in this information age. The set of gold-standard edits of a human annotator is g = {a doubt → doubt, system → systems, has → have}. 11

Evaluation Metric (2/3) Suppose the tokenized output sentence H of a grammatical error correction system given the above sentence is: –There is no doubt, tracking system has brought many benefits in this information age. That is, the set of system edits is e = {a doubt → doubt}. The performance of the grammatical error correction system is measured by how well the two sets g and e match, in the form of recall R, precision P, and F 1 measure: –R = 1/3, P = 1/1, F 1 = 2RP/(R + P) = 1/2. 12

Evaluation Metric (3/3) 13

Scorer The MaxMatch (M 2 ) scorer (Dahlmeier and Ng, 2012b) uses an efficient algorithm to search for such a set of system edits using an edit lattice. The key idea is that the set of system edits automatically computed and used in scoring should be the set that maximally matches the set of gold-standard edits specified by the annotator. 14

Participating Teams 15

Approaches 16

Results (1/5) In order to allow the participating teams to raise their disagreement with the original gold- standard annotations provided by the annotator, and not understate the performance of the teams, we allow the teams to submit their proposed alternative answers. –In all, five teams (NTHU, STEL, TOR, UIUC, UMC) submitted alternative answers. –The F 1 measure of every team improves when evaluated with alternative answers. 17

Results (2/5) 18

Results (3/5) In order to determine the error type of system edits, we first perform POS tagging on the submitted system output using the Stanford parser (Klein and Manning, 2003). We then assign an error type to a system edit based on the automatically determined POS tags. –The verb form and subject-verb agreement error types are grouped together into one category, since it is difficult to automatically distinguish the two in a reliable way. 19

Results (4/5) 20

Results (5/5) 21

Conclusions The CoNLL-2013 shared task saw the participation of 17 teams worldwide to evaluate their grammatical error correction systems on a common test set, using a common evaluation metric and scorer. The five error types included in the shared task account for at least one-third to close to one- half of all errors in English learners’ essays. 22

Conclusions (Cont’d) The best system in the shared task achieves an F 1 score of 42%, when it is scored with multiple acceptable answers. –UIUC team (Rozovskaya et al., 2013) There is still much room for improvement, both in the accuracy of grammatical error correction systems, and in the coverage of systems to deal with a more comprehensive set of error types. 23

THANKS Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pages 1–12, Sofia, Bulgaria, August © 2013 Association for Computational Linguistics 24