Page 1 NAACL-HLT BEA-5 2010 Los Angeles, CA Annotating ESL Errors: Challenges and Rewards Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Mini Presentations: How To
Advertisements

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Dr. Dana Ferris University of California, Davis PREPARING TEACHERS TO TREAT ERRORS IN THE K-12 CLASSROOM.
From Elaboration to Collaboration: Understanding and Supporting Second Language Writers Alfredo Urzúa, Languages and Linguistics Kate Mangelsdorf, English.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
PSSA WRITING TEST …Meeting the Challenge!. Pennsylvania's General Performance Level Descriptors Advanced The Advanced Level reflects superior academic.
What is VOICE? VOICE, the Vienna-Oxford International Corpus of English, is a structured collection of language data, the first computer-readable corpus.
HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task Robert Dale, Ilya Anisimoff and George Narroway Centre for Language Technology.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
Rethinking Grammatical Error Detection and Evaluation with the Amazon Mechanical Turk Joel Tetreault[Educational Testing Service] Elena Filatova[Fordham.
Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
HTL-ACTS Workshop, June 2006, New York City Improving Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon.
Corpora and Language Teaching
A community-based project Maria Carreira. Background Spanish 250: A class for Spanish HL speakers; Six-units, hybrid. Meets two days a week for a total.
August 23, ELLs at CV are a diverse group National origin Educational background Attitudes about school Experience with technology Speaking ability.
A Collaboration between: Los Angeles USD University of California, San Diego San Diego State University University of California, Irvine Preparing for.
Preposition Usage Errors by English as a Second Language (ESL) learners: “ They ate by* their hands.”  The writer used by instead of with. This work is.
Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May , LREC.
Article Summary – EDU 215 Dr. Megan J. Scranton 1.
Shonda Kuiper Grinnell College April 27 th, 2010.
Group 8 ‘GudBoyz’ teaching writing to L2 learners Agus Prayogo Asih Nurakhir Nico Ouwpoly Sutarno.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Eric Cohen Books 2007 Simply Writing - Task to Project to Bagrut What's expected and what you can do! Clarity, Expectation and Format.
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
Grammar Translation Method
Assisting cloze test making with a web application Ayako Hoshino ( 星野綾子 ) Hiroshi Nakagawa ( 中川裕志 ) University of Tokyo ( 東京大学 ) Society for Information.
The Importance of Language Diversity in ESL Writing Workgroups By Aseel Kanakri The University of Akron.
Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008.
Reading & Responding to ‘Error’ in International Student Writing.
Chris Luszczek Biol2050 week 3 Lecture September 23, 2013.
Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
Reading & Responding to ‘Error’ in International Student Writing.
Fall 2005 English Communication (ENG101 X 3) This course introduced students to everyday English and provided them with a strong foundation in the language.
Academic Affinity and Beyond Susan DePhilippis Judith Otterburn-Martinez Atlantic Cape Community College, NJ.
Elaine Ménard & Margaret Smithglass School of Information Studies McGill University [Canada] July 5 th, 2011 Babel revisited: A taxonomy for ordinary images.
The Four P’s of an Effective Writing Tool: Personalized Practice with Proven Progress April 30, 2014.
Misuse of Articles By: Liz M. LaboyWorkshop four Albanice FloresProf. C. Garcia Jennifer M. Serrano ENGL 245.
Error Correction: For Dummies? Ellen Pratt, PhD. UPR Mayaguez.
Lectures ASSESSING LANGUAGE SKILLS Receptive Skills Productive Skills Criteria for selecting language sub skills Different Test Types & Test Requirements.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Lina Bikelienė Vilnius University 3 September, 2010 Connector usage in advanced Lithuanian learners’ English writing.
Grammar Translation Method
Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.
Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.
A Pilot Study of a Multimedia Instructional Program for Teaching of ESL Grammar with Embedded Tracking.
Learning a foreign language: Twice As Hard?. Scanning Scan the text and tell the following statements “true” or “false”.
Correcting Comma Errors in Learner Essays, and Restoring Commas in Newswire Text Ross Israel Indiana University Joel Tetreault Educational Testing Service.
Training The Spanish Language and Culture Instructors at FSI
GCSE English Language 8700 GCSE English Literature 8702 A two year course focused on the development of skills in reading, writing and speaking and listening.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Direct Corpus Consultation in Producing Lexical Collocations in L2 Writing by Korean Learners of English Chang, Yiboon Seoul National University GLoCALL.
1 Instructing the English Language Learner (ELL) in the Regular Classroom.
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
Teaching Second-Language Learners on Mainstream Courses An introduction to ESOL.
1 Vocabulary acquisition from extensive reading: A case study Maria Pigada and Norbert Schmitt ( 2006)
MUS Outcomes Assessment Workshop University-wide Program-level Writing Assessment at The University of Montana Beverly Ann Chin Chair, Writing Committee.
TagHelper Track Overview Carolyn Penstein Rosé Carnegie Mellon University Language Technologies Institute & Human-Computer Interaction Institute School.
This research is supported by the U.S. Department of Education and DARPA. Focuses on mistakes in determiner and preposition usage made by non-native speakers.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
PROPOSED TITLE OF DOCTORAL THESIS
Ace it! Summer Conference 2011 Reading Revision
Annotating ESL Errors: Challenges and Rewards
The CoNLL-2014 Shared Task on Grammatical Error Correction
Grammar correction – Data collection interface
Using GOLD to Tracking L2 Development
CUTM 4012: Methods of Teaching English
University of Illinois System in HOO Text Correction Shared Task
Testing Schedule.
Presentation transcript:

Page 1 NAACL-HLT BEA Los Angeles, CA Annotating ESL Errors: Challenges and Rewards Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

Page 2 Annotating a corpus of English as a Second Language (ESL) writing: Motivation Many non-native English speakers ESL learners make a variety of mistakes in grammar and usage Conventional proofing tools do not detect many ESL mistakes – target native English speakers and do not address many mistakes of ESL writers We are not restricting ourselves to ESL mistakes

Page 3 Goals Developing automated techniques for detecting and correcting context-sensitive mistakes  Paving the way for better proofing tools for ESL writers E.g., providing instructional feedback  Developing automated scoring techniques E.g., automated evaluation of student essays Annotation is an important part of that process

Page 4 Annotating ESL errors: a hard problem A sentence usually contains multiple errors  In Western countries prisson conditions are more better than in Russia, and this fact helps to change criminals in better way of life. Not always clear how to mark the type of a mistake  “…which reflect a traditional female role and a traditional attitude to a woman…” “…which reflect a traditional female role and a traditional attitude towards women…” women awoman women a woman

Page 5 Annotating ESL errors: a hard problem Distinction between acceptable/unacceptable usage is fuzzy  Women were indignant at inequality from men. Women were indignant at the inequality from men.

Page 6 Common ESL mistakes English as a Second Language (ESL) mistakes  Mistakes involving prepositions We even do good to*/for other people */by spending money on this and asking */for nothing in return.  Mistakes involving articles The main idea of their speeches is that a*/the romantic period of music was too short. Laziness is the engine of the*/ progress. Do you think anyone will help you? There are not many people who are willing to give their*/a hands*/hand.

Page 7 Purpose of the annotation To have a gold standard set for the development and evaluation of an automated system that corrects ESL mistakes There is currently no gold standard data set available for researchers  Systems are evaluated on different data sets – performance comparison across different systems is hard Results depend on the source language of the speakers and proficiency level  The annotation of this corpus is available and can be used by researchers who gain access to the ICLE and the CLEC corpora. This corpus is used in the experiments described in [Rozovskaya and Roth, NAACL, ’10]

Page 8 Outline Annotating ESL mistakes: Motivation Annotating ESL mistakes: Motivation Annotation  Data selection  Annotation procedure  Error classification Annotation tool Annotation statistics Statistics on article corrections Statistics on preposition corrections Inter-annotator agreement

Page 9 Annotation: Overview Annotated a corpus of ESL sentences (63K words) Extracted from two corpora of ESL essays:  International Corpus of Learner English (ICLE) [Granger et al.,’02]  Chinese Learner English Corpus (CLEC) [Gui and Yang,’03] Sentences written by ESL students of 9 first language backgrounds Each sentence is fully corrected and error tagged Annotated by native English speakers

Page 10 Annotation: focus of the annotation Focus of the annotation: Mistakes in article and preposition usage  These mistakes have been shown to be very common mistakes for learners of different first language backgrounds [Dagneaux et al, ’98; Gamon et al., ’08; Tetreault et al., ’08; others]

Page 11 Annotation: data selection Sentences for annotation extracted from two corpora of ESL essays  International Corpus of Learner English (ICLE) Essays by advanced learners of English First language backgrounds: Bulgarian, Czech, French, German, Italian, Polish, Russian, Spanish  Chinese Learner of English Corpus (CLEC) Essays by Chinese learners of different proficiency levels Garbled sentences and sentences with near-native fluency excluded with a 4-gram language model 50% of sentences for annotation randomly sampled from the two corpora 50% of sentences selected manually to collect more preposition errors

Page 12 Annotation: procedure Annotation performed by three native English speakers  Graduate and undergraduate students in Linguistics/foreign languages  With previous experience in natural language annotation Annotation performed at the sentence level – all errors in the sentence are corrected and tagged The annotators were encouraged to propose multiple alternative corrections  Useful for the evaluation of an automated error correction system “ They contribute money to the building of hospitals” to to/towards

Page 13 Annotation: error classification Focus of the annotation: mistakes in article and preposition usage Error classification (inspired by [Tetreault and Chodorow,’08])  developed with the focus on article and preposition errors “…which reflect a traditional female role and a traditional attitude to a woman…”  “…which reflect a traditional female role and a traditional attitude towards a*/ woman*/women…”  was intended to give a general idea about the types of mistakes ESL students make

Page 14 Annotation: error classification Error typeExample Article error Women were indignant at */the inequality from men. Preposition error …to change their views to*/for the better. Noun number Science is surviving by overcoming the mistakes not by uttering the truths*/truth. Verb form He write*/writes poetry. Word form It is not simply*/simple to make professional army. Spelling …if a person commited*/committed a crime… Word replacement (lexical error) There is a probability*/possibility that today’s fantasies will not be fantasies tomorrow.

Page 15 Outline Annotating ESL mistakes: Motivation Annotating ESL mistakes: Motivation Annotation Annotation  Data selection  Annotation procedure  Error classification The annotation tool Annotation statistics Statistics on article corrections Statistics on preposition corrections Inter-annotator agreement

Page 16 The annotated ESL corpus Annotating ESL sentences with an annotation tool Sentence for annotation Flexible infrastructure allows for an easy adaptation to a different domain

Page 17 Example of an annotated sentence Before annotation “This time asks for looking at things with our eyes opened.” With annotation comments “This age, asks $us$ for looking *look* at things with our eyes opened.” After annotation “This period asks us to look at things with our eyes opened.” Annotation rate: sentences per hour

Page 18 Outline Annotating ESL mistakes: Motivation Annotating ESL mistakes: Motivation Annotation Annotation  Data selection  Annotation procedure  Error classification Annotation tool Annotation tool Annotation statistics Statistics on article corrections Statistics on preposition corrections Inter-annotator agreement

Page 19 Annotation statistics

Page 20 Common article and preposition mistakes Article mistakes  Missing articles But this, as such, is already */a new subject for discussion.  Extraneous articles Laziness is the engine of the*/ progress. Preposition mistakes  Confusing different prepositions Education gives a person a better appreciation of*/for such fields as art, literature, history, human relations, and science

Page 21 Statistics on article corrections Source languageErrors totalErrors per hundred words Bulgarian761.2 Chinese Czech French220.4 German230.5 Italian430.6 Polish711.5 Russian Spanish All9571.5

Page 22 Distribution of article errors by error type Not all confusions are equally likely Errors are dependent on the first language of the writer

Page 23 Statistics on preposition corrections Unlike with articles, preposition confusions account for over 50% of all preposition errors Many contexts license multiple prepositions [Tetreault and Chodorow, ’08]

Page 24 Inter-annotator agreement

Page 25 Inter-annotator agreement

Page 26 Conclusions We presented the annotation of a corpus of ESL sentences Annotating ESL mistakes is an important but a challenging task  Interacting mistakes in a sentence  Fuzzy distinction between acceptable/unacceptable usage We have described an annotation tool that facilitates the error-tagging of a corpus of text The inter-annotator agreement on the task is low and shows that this is a difficult problem The annotated data can be used by other researchers for the evaluation of their systems

Page 27  Annotation tool  ESL annotation Thank you! Questions?