Download presentation
Presentation is loading. Please wait.
Published byRuby Tyler Modified over 9 years ago
1
On using context for automatic correction of non-word misspellings in student essays Michael Flor Yoko Futagi Educational Testing Service 2012 ACL
2
Outline [ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]
3
1. Introduction Non-word misspellings: e.g., Businees inthe mor efun
4
Outline [ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]
5
2. Corpus High-stakes standardized tests: - TOEFL - GRE The corpus includes 3000 essays, for a total of 963,428 words.
6
2. Corpus TOEFL essaysGRE essays ELL98.73%57.86% English speakers1.27%42.14%
7
Outline [ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]
8
3. Annotation Annotators were asked to identify all non-word misspellings. Two annotators: - native English speakers - experienced in linguistic annotation
9
3. Annotation Annotators agreed in 82.6% of the cases (Cohen’s Kappa=0.8, p<.001). All disagreements were resolved by a third annotator (adjudicator).
10
3. Annotation
11
The annotated corpus of 3,000 essays has the following statistics: - Average essay length is 321 words (the range is 28-798 words) - 148 essays turned out to have no misspellings at all - 2.24% of the words in the corpus are non-word misspellings
12
Outline [ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]
13
4. Spelling correction systems (ConSpel system) The system focused on non-word misspellings for detection and correction of spelling errors.
14
4. Spelling correction systems (ConSpel system) By default, the system will ignore: - numbers - dates - web - email addresses - mixed alpha-numeric strings (e.g. ‘RV400’) - capitalized words (e.g. ‘London’) - all uppercase (e.g. ‘ROME’)
15
4. Spelling correction systems (ConSpel system) ConSpel spelling dictionaries include about 360,000 entries. - includes all inflectional variants (e.g. ‘love’, ‘loved’, ‘loves’, ‘loving’) - international spelling variants (e.g. American and British English) The core set includes 245,000 entries (modern English vocabulary) Additional dictionaries include about 120,000 entries. - international surnames and first names - names for geographical places
16
4. Spelling correction systems (ConSpel system) Detection of Misspellings The string is not in the system dictionaries.
17
4. Spelling correction systems (ConSpel system) Correction of Misspellings Dictionaries are also the source of suggested corrections. Candidate suggestions: Use edit distance with the default threshold of 5. Problem: Can easily get hundreds of correction candidates.
18
4. Spelling correction systems (ConSpel system) Candidate suggestions are ranked using a set of algorithms: - edit distance - phonetic similarity - word frequency - local context - context-sensitive
19
Outline [ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]
20
5. Comparative evaluation All evaluations were performed in “ full context” (rather than word-by-word)
21
5. Comparative evaluation Error Detection
22
5. Comparative evaluation Error Correction
23
5. Comparative evaluation
24
Error Detection (native and non-native English speakers.)
25
5. Comparative evaluation Error Correction (native and non-native English speakers.)
26
5. Comparative evaluation
29
Outline [ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]
30
6. Discussion Absence of grammatical errors. For example: “They received fresh air, interacte with other youth their age, solved problems...”. Ranked Candidate: Rank 1: Interacts Rank 2: Interact Rank 3: interacted
31
Outline [ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]
32
7. Conclusions Results with ConSpel system demonstrate that utilizing contextual information helps improve automatic correction of non-word misspellings, for both native and non-native speakers of English.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.