Download presentation
Presentation is loading. Please wait.
Published byJewel Welch Modified over 9 years ago
1
Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010
2
Overview Analyzing raw corpora Error annotation Issues in corpus annotation Granger (2003)
3
Analyzing raw corpora Concordancing software GOLD AntConc Other software CLAN
4
Issues in corpus annotation Annotation scheme and format Annotation procedure Annotation quality
5
Annotation scheme and format What are the categories you are using? Linguistically consensual Overspecification vs. underspecification Use short, meaningful codes for your categories Annotation format considerations Compatible with annotation scheme Facilitates corpus query
6
Annotation procedure and quality Annotator training Scheme and format Problematic cases and disagreements Computer-assisted manual annotation Stanford annotation tool Stanford annotation tool UAM Corpus Tool and NoteTab UAM Corpus ToolNoteTab Inter-annotator agreement Cohen’s Kappa Cohen’s Kappa Online Kappa calculator Online Kappa calculator
7
Granger (2003) Learner corpora Error annotation Error statistics and analysis Integration of results into CALL Conclusion
8
Learner corpora What is a learner corpus? Difference from traditional data in SLA Difference from native language data Frequencies Errors From error annotation to error detection
9
Computer-aided error annotation Dagneaux, Denness and Granger (1998) Manual correction of L2 French corpus Elaboration of an error tagging system Insertion of error tags and corrections Retrieval of lists of error types and statistics Concordance-based error analysis Tagging system Informative but manageable Reusable, flexible, consistent
10
Error tagging system Dulay, Burt & Krashen (1982) System based on linguistic categories (e.g., syntax) Surface structure alternations (e.g., omission) Granger’s (2003) three-dimensional taxonomy Error domain Error category Word category
11
Error tagging system (cont.) Error domain and category General level: grammatical, lexical, etc. Domains subdivided into error categories Table 1, page 468 Word category A POS tagset with 11 major and 54 sub-categories Makes it possible to sort errors by POS categories
12
Error tagging system (cont.) Correct forms inserted next to erroneous forms Facilitates interpretation of error annotations Allows for automatic sorting on correct forms Tag insertion using a menu-driven editor
13
Error statistics and analysis Error frequency by domain or (word) category Highest ranked domains: grammar and form Error trigrams Concordancers for searching error codes AntConc AntConc WordSmith Tools WordSmith Tools
14
Integrating results into CALL Goal: a hypermedia CALL program Using NLP and Communicative approaches to SLA Traditional and NLP-enabled exercises Automatic error diagnosis and feedback generation Error statistics and analysis used to Select linguistic areas to focus on Adapt exercises as a function of attested error types Adapt NLP tools for error diagnosis
15
Integrating results into CALL (cont.) Most error-prone linguistic areas Tense and mood, agreement Articles, complementation, prepositions Adapting exercises Exercises reflect type of error-prone context Formal errors through dictation and exercises targeting specific difficulties Attention to punctuation
16
Integrating results into CALL (cont.) Adapting NLP tools for error diagnosis Spell checker and parser Handles orthographic, grammatical, syntactic, and lexical errors Not punctuation, semantic, and tense errors
17
Granger (2003) summary Effective 3-tier error annotation system Limited number of categories per tier Versatile automated data manipulation Limitations of error-tagging Element of subjectivity in annotation Focuses on misuse Usefulness of error-tagged learner corpus Error statistics helps understand learner interlang Helps adapt pedagogical materials and programs
18
Activity Using the Stanford annotation tool Annotate a short text using your own scheme, or Annotate a short learner text using Granger’s (2003) scheme Query the annotated text using AntConc
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.