The Montclair Electronic Language Learner Database (MELD) www.chss.montclair.edu/linguistics/MELD/ Eileen Fitzpatrick & Steve Seegmiller Montclair State.

The Montclair Electronic Language Learner Database (MELD) www.chss.montclair.edu/linguistics/MELD/ Eileen Fitzpatrick & Steve Seegmiller Montclair State University

2 Non-native speaker (NNS) corpora Begun in early 1990’s Data –written performance only –essays of students of English as a foreign language Corpus development (academic) –in Europe: Louvain, Lodz, Uppsala –in Asia: Tokyo Gakugei University, Hong Kong Univ of Science and Technology Annotation –Lodz: part of speech –HKUST, Lodz: error tags

3 Gaps in NNS Corpus Creation No NNS Corpus in America, so no corpus of English as a Second Language (ESL) No NNS corpus is publicly available No NNS corpus annotates errors without a predetermined list of error types

4 MELD Goals Initial Goals –Collect ESL student writing –Tag writing for error –Provide publicly available NNS data Initial Goals support –2nd language pedagogy –Language acquisition research –tool building (grammar checkers, student editing aids, parallel texts from NS and NNS)

5 MELD Overview Data –44477 words of text annotated –53826 more words of raw data –language, education data for each student author –upper level ESL students Tools written to –link essays to student background data –produce an error-free version from tagged text –allow fast entry of background data

6 Annotation Annotators “reconstruct” a grammatical form {error/reconstruction} school systems {is/are} since children {0/are} usually inspired becoming {a/0} good citizens Agreement between annotators is an issue

7 Error Classification from a Predetermined List Benefit –annotators agree on what an error is: only those items in the classification scheme Problems –annotators have to learn a classification scheme –the existence of a classification scheme means that the annotators can misclassify –errors not in the scheme will be missed

8 Error Identification & Reconstruction Benefits –speed in annotating since there is no classification scheme to learn –no chance of misclassifying –less common errors will be captured –a reconstructed text can be more easily parsed and tagged for part of speech Question –How well can we agree on what is an error?

9 Agreement Measures Reliability: What percentage of the errors do both taggers tag? T1  T2 (T1 +T2)/2 Precision: What percentage of the non-expert’s (T2) tags are accurate? T1  T2 T2 Recall: What percent of true errors did the non- expert (T2) find? T1  T2 T1 1 -

10 Agreement Measures High precision Low Recall Low Reliability Expert Non-expert

11 Agreement Measures J&L Essay Recall Precision Reliability 1-10.54.58.39 11-22.57.78.49 J&N Essay Recall Precision Reliability 1-10.58.48.23 11-22.37.54.27 L&N Essay Recall Precision Reliability 1-10.65.70.37 11-22.60.78.36

12 Conclusions on Tagging Agreement Unsatisfactory level of agreement as to what is an error Disagreements resolved through regular meetings There are now 2 types of tags: one for lexico-syntactic errors and one for stylistic The tags are transparent to the user and can be deleted or ignored

13 The Future Immediate –Internet access to data and tools –an error concordancer –automatic part of speech and syntactic markup –data from different ESL skill levels Long Range –statistical tool to correlate error frequency with student background –student editing aid –grammar checker –NNS speech data

14 Some Possible Applications Preparation of instructional materials Studies of progress over a semester Research on error types by L1 Research on writing characteristics by L1

15 Writing Characteristics by L1 L1 Spanish tense 1 {would/will} 1 {went/go} 1 {stay/stayed} 1 {gave/give} 1 {cannot/could} 1 {can/could} TOTAL: 6 Word Ct: 2305 L1 Gujarati tense 5 {was/is} 1 {passes/passed} 3 {were/are} 1 {love/loved} 2 {would/will} 1 {left/leave} 2 {is/was} 1 {kept/keeps} 2 {have/had} 1 {involved/involves} 2 {had/have} 1 {get/got} 1 {would start/started}1 {do/did} 1 {will/0}1 {can/could} 1 {will/were to}1 {are/were} 1 {was/were} 1 {wanted/want} 1 {spend/spent}TOTAL: 31 Word Ct: 2500

16 Acknowledgments Jacqueline Cassidy Jennifer Higgins Norma Pravec Lenore Rosenbluth Donna Samko Jory Samkoff Kae Shigeta

The Montclair Electronic Language Learner Database (MELD) www.chss.montclair.edu/linguistics/MELD/ Eileen Fitzpatrick & Steve Seegmiller Montclair State.

Similar presentations

Presentation on theme: "The Montclair Electronic Language Learner Database (MELD) www.chss.montclair.edu/linguistics/MELD/ Eileen Fitzpatrick & Steve Seegmiller Montclair State."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Montclair Electronic Language Learner Database (MELD) www.chss.montclair.edu/linguistics/MELD/ Eileen Fitzpatrick & Steve Seegmiller Montclair State.

Similar presentations

Presentation on theme: "The Montclair Electronic Language Learner Database (MELD) www.chss.montclair.edu/linguistics/MELD/ Eileen Fitzpatrick & Steve Seegmiller Montclair State."— Presentation transcript:

Similar presentations

About project

Feedback