The Montclair Electronic Language Learner Database (MELD) www.chss.montclair.edu/linguistics/MELD/ Eileen Fitzpatrick & Steve Seegmiller Montclair State.

Slides:



Advertisements
Similar presentations
Dr. Dana Ferris University of California, Davis PREPARING TEACHERS TO TREAT ERRORS IN THE K-12 CLASSROOM.
Advertisements

English only VS. L1 support Looking at English language acquisition of native Spanish speaking children.
A learner corpus of students’ examination work in English language (a project) Sylwia Twardo Centre for Foreign Language Teaching, Warsaw University, Poland.
Uses of a Corpus “[E]xplore actual patterns of language use”
Learners First: Explicit Language Instruction in EAP Writing Courses Gena Bennett
What is VOICE? VOICE, the Vienna-Oxford International Corpus of English, is a structured collection of language data, the first computer-readable corpus.
1 A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors Joachim Wagner, Jennifer Foster, and.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
Page 1 NAACL-HLT BEA Los Angeles, CA Annotating ESL Errors: Challenges and Rewards Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
CALL: Computer-Assisted Language Learning. 2/14 Computer-Assisted (Language) Learning “Little” programs Purpose-built learning programs (courseware) Using.
Conducting a Needs Analysis Instructor: Prof. Mavis Shang Sophia M 97/05/01.
Corpora and Language Teaching
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Research methods in corpus linguistics Xiaofei Lu.
Linguistic annotation of learner corpora A. Díaz-Negrillo, D. Meurers & H. Wunsch University of Jaén, University of Tübingen Spain Germany.
Quiz 1.What is the purpose of the SLALOM model? (a) Generate errors (b) Provide user feedback (c) Captures user proficiency in grammatical structures (d)
Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May , LREC.
C HAPTER 10 S UMMARY By: Susan Marshall, Tracy Robart, and Cindy Smith.
[1] Processing the Prosody of Oral Presentations Rebecca Hincks KTH, The Royal Institute of Technology Department of Speech, Music and Hearing The Unit.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Prof. Karīna Aijmere ( Karin Aijmer ) Gēteborgas Universitāte, Zviedrija „Valodas apguvēju korpuss – tā veidošana un izmantošana valodu apguvē, mācību.
Assisting cloze test making with a web application Ayako Hoshino ( 星野綾子 ) Hiroshi Nakagawa ( 中川裕志 ) University of Tokyo ( 東京大学 ) Society for Information.
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Final Review 31 October WP2: Named Entity Recognition and Classification Claire Grover University of Edinburgh.
Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010.
English As A Second Language (ESL) The purpose of the ESL Program at Sojourner Douglass College is to provide a foundation to adult English Language Learners.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
1.The COBUILD approach to grammar is simple and direct.
Introducing the SILS Learner Corpus Victoria Muehleisen Waseda University.
ENGLISH LANGUAGE TEACHING (ELT) Applied Linguistics Lecture 4 March
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Developing an automated assessment tool for children’s oral reading Leen Cleuren March
How Can Corpora Help Me To Be Successful in CO150?
Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Natural Language Programming David Vadas The University of Sydney Supervisor: James Curran.
1 A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors Joachim Wagner, Jennifer Foster, and.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Word Editing Tools. Word Automatic Editing Tools §Word has three features that automatically change or insert text and graphics as you type §You can easily.
学习者书面语中的程序化词汇研究 Procedural vocabulary and EFL writing quality 梁茂成
金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.
On using context for automatic correction of non-word misspellings in student essays Michael Flor Yoko Futagi Educational Testing Service 2012 ACL.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Public Libraries Survey Data File Overview. 2 What We’ll Talk About PLS: Public Library Survey State level data Public library data (Administrative Entities)
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
Direct Corpus Consultation in Producing Lexical Collocations in L2 Writing by Korean Learners of English Chang, Yiboon Seoul National University GLoCALL.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
LAB: Linguistics Annotated Bibliography – A searchable Portal for Normed Database Information Erin M. Buchanan, Kathrene D. Valentine, Marilee L. Teasley,
Approaches to teaching English The differences between EAP and General EFL Louis Rogers.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
The Origin of Language Curriculum Development
AMANY ALKHAYAT PSCW ENG371 INTRODUCTION TO CORPUS PROCESSING Corpus Processing Ch1.
Language Identification and Part-of-Speech Tagging
Collecting Written Data
1. Review of last Friday (Form, Function, Fluency)
By: Susan Marshall, Tracy Robart, and Cindy Smith
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Computational and Statistical Methods for Corpus Analysis: Overview
Word Editing Tools.
Writing Analytics Clayton Clemens Vive Kumar.
Course Selection World Language/ESL Department
Grammar correction – Data collection interface
Applied Linguistics Chapter Four: Corpus Linguistics
Building an annotated Corpus
Presentation transcript:

The Montclair Electronic Language Learner Database (MELD) Eileen Fitzpatrick & Steve Seegmiller Montclair State University

2 Non-native speaker (NNS) corpora Begun in early 1990’s Data –written performance only –essays of students of English as a foreign language Corpus development (academic) –in Europe: Louvain, Lodz, Uppsala –in Asia: Tokyo Gakugei University, Hong Kong Univ of Science and Technology Annotation –Lodz: part of speech –HKUST, Lodz: error tags

3 Gaps in NNS Corpus Creation No NNS Corpus in America, so no corpus of English as a Second Language (ESL) No NNS corpus is publicly available No NNS corpus annotates errors without a predetermined list of error types

4 MELD Goals Initial Goals –Collect ESL student writing –Tag writing for error –Provide publicly available NNS data Initial Goals support –2nd language pedagogy –Language acquisition research –tool building (grammar checkers, student editing aids, parallel texts from NS and NNS)

5 MELD Overview Data –44477 words of text annotated –53826 more words of raw data –language, education data for each student author –upper level ESL students Tools written to –link essays to student background data –produce an error-free version from tagged text –allow fast entry of background data

6 Annotation Annotators “reconstruct” a grammatical form {error/reconstruction} school systems {is/are} since children {0/are} usually inspired becoming {a/0} good citizens Agreement between annotators is an issue

7 Error Classification from a Predetermined List Benefit –annotators agree on what an error is: only those items in the classification scheme Problems –annotators have to learn a classification scheme –the existence of a classification scheme means that the annotators can misclassify –errors not in the scheme will be missed

8 Error Identification & Reconstruction Benefits –speed in annotating since there is no classification scheme to learn –no chance of misclassifying –less common errors will be captured –a reconstructed text can be more easily parsed and tagged for part of speech Question –How well can we agree on what is an error?

9 Agreement Measures Reliability: What percentage of the errors do both taggers tag? T1  T2 (T1 +T2)/2 Precision: What percentage of the non-expert’s (T2) tags are accurate? T1  T2 T2 Recall: What percent of true errors did the non- expert (T2) find? T1  T2 T1 1 -

10 Agreement Measures High precision Low Recall Low Reliability Expert Non-expert

11 Agreement Measures J&L Essay Recall Precision Reliability J&N Essay Recall Precision Reliability L&N Essay Recall Precision Reliability

12 Conclusions on Tagging Agreement Unsatisfactory level of agreement as to what is an error Disagreements resolved through regular meetings There are now 2 types of tags: one for lexico-syntactic errors and one for stylistic The tags are transparent to the user and can be deleted or ignored

13 The Future Immediate –Internet access to data and tools –an error concordancer –automatic part of speech and syntactic markup –data from different ESL skill levels Long Range –statistical tool to correlate error frequency with student background –student editing aid –grammar checker –NNS speech data

14 Some Possible Applications Preparation of instructional materials Studies of progress over a semester Research on error types by L1 Research on writing characteristics by L1

15 Writing Characteristics by L1 L1 Spanish tense 1 {would/will} 1 {went/go} 1 {stay/stayed} 1 {gave/give} 1 {cannot/could} 1 {can/could} TOTAL: 6 Word Ct: 2305 L1 Gujarati tense 5 {was/is} 1 {passes/passed} 3 {were/are} 1 {love/loved} 2 {would/will} 1 {left/leave} 2 {is/was} 1 {kept/keeps} 2 {have/had} 1 {involved/involves} 2 {had/have} 1 {get/got} 1 {would start/started}1 {do/did} 1 {will/0}1 {can/could} 1 {will/were to}1 {are/were} 1 {was/were} 1 {wanted/want} 1 {spend/spent}TOTAL: 31 Word Ct: 2500

16 Acknowledgments Jacqueline Cassidy Jennifer Higgins Norma Pravec Lenore Rosenbluth Donna Samko Jory Samkoff Kae Shigeta