Presentation is loading. Please wait.

Presentation is loading. Please wait.

On using context for automatic correction of non-word misspellings in student essays Michael Flor Yoko Futagi Educational Testing Service 2012 ACL.

Similar presentations


Presentation on theme: "On using context for automatic correction of non-word misspellings in student essays Michael Flor Yoko Futagi Educational Testing Service 2012 ACL."— Presentation transcript:

1 On using context for automatic correction of non-word misspellings in student essays Michael Flor Yoko Futagi Educational Testing Service 2012 ACL

2 Outline  [ 1. Introduction ]  [ 2. Corpus ]  [ 3. Annotation ]  [ 4. Spelling correction systems ]  ConSpel system  [ 5. Comparative evaluation ]  [ 6. Discussion ]  [ 7. Conclusions ]

3 1. Introduction Non-word misspellings: e.g., Businees inthe mor efun

4 Outline  [ 1. Introduction ]  [ 2. Corpus ]  [ 3. Annotation ]  [ 4. Spelling correction systems ] ConSpel system  [ 5. Comparative evaluation ]  [ 6. Discussion ]  [ 7. Conclusions ]

5 2. Corpus High-stakes standardized tests: - TOEFL - GRE The corpus includes 3000 essays, for a total of 963,428 words.

6 2. Corpus TOEFL essaysGRE essays ELL98.73%57.86% English speakers1.27%42.14%

7 Outline  [ 1. Introduction ]  [ 2. Corpus ]  [ 3. Annotation ]  [ 4. Spelling correction systems ] ConSpel system  [ 5. Comparative evaluation ]  [ 6. Discussion ]  [ 7. Conclusions ]

8 3. Annotation Annotators were asked to identify all non-word misspellings. Two annotators: - native English speakers - experienced in linguistic annotation

9 3. Annotation Annotators agreed in 82.6% of the cases (Cohen’s Kappa=0.8, p<.001). All disagreements were resolved by a third annotator (adjudicator).

10 3. Annotation

11 The annotated corpus of 3,000 essays has the following statistics: - Average essay length is 321 words (the range is 28-798 words) - 148 essays turned out to have no misspellings at all - 2.24% of the words in the corpus are non-word misspellings

12 Outline  [ 1. Introduction ]  [ 2. Corpus ]  [ 3. Annotation ]  [ 4. Spelling correction systems ] ConSpel system  [ 5. Comparative evaluation ]  [ 6. Discussion ]  [ 7. Conclusions ]

13 4. Spelling correction systems (ConSpel system) The system focused on non-word misspellings for detection and correction of spelling errors.

14 4. Spelling correction systems (ConSpel system) By default, the system will ignore: - numbers - dates - web - email addresses - mixed alpha-numeric strings (e.g. ‘RV400’) - capitalized words (e.g. ‘London’) - all uppercase (e.g. ‘ROME’)

15 4. Spelling correction systems (ConSpel system) ConSpel spelling dictionaries include about 360,000 entries. - includes all inflectional variants (e.g. ‘love’, ‘loved’, ‘loves’, ‘loving’) - international spelling variants (e.g. American and British English) The core set includes 245,000 entries (modern English vocabulary) Additional dictionaries include about 120,000 entries. - international surnames and first names - names for geographical places

16 4. Spelling correction systems (ConSpel system) Detection of Misspellings The string is not in the system dictionaries.

17 4. Spelling correction systems (ConSpel system) Correction of Misspellings Dictionaries are also the source of suggested corrections. Candidate suggestions: Use edit distance with the default threshold of 5. Problem: Can easily get hundreds of correction candidates.

18 4. Spelling correction systems (ConSpel system) Candidate suggestions are ranked using a set of algorithms: - edit distance - phonetic similarity - word frequency - local context - context-sensitive

19 Outline  [ 1. Introduction ]  [ 2. Corpus ]  [ 3. Annotation ]  [ 4. Spelling correction systems ] ConSpel system  [ 5. Comparative evaluation ]  [ 6. Discussion ]  [ 7. Conclusions ]

20 5. Comparative evaluation All evaluations were performed in “ full context” (rather than word-by-word)

21 5. Comparative evaluation Error Detection

22 5. Comparative evaluation Error Correction

23 5. Comparative evaluation

24 Error Detection (native and non-native English speakers.)

25 5. Comparative evaluation Error Correction (native and non-native English speakers.)

26 5. Comparative evaluation

27

28

29 Outline  [ 1. Introduction ]  [ 2. Corpus ]  [ 3. Annotation ]  [ 4. Spelling correction systems ] ConSpel system  [ 5. Comparative evaluation ]  [ 6. Discussion ]  [ 7. Conclusions ]

30 6. Discussion Absence of grammatical errors. For example: “They received fresh air, interacte with other youth their age, solved problems...”. Ranked Candidate: Rank 1: Interacts Rank 2: Interact Rank 3: interacted

31 Outline  [ 1. Introduction ]  [ 2. Corpus ]  [ 3. Annotation ]  [ 4. Spelling correction systems ] ConSpel system  [ 5. Comparative evaluation ]  [ 6. Discussion ]  [ 7. Conclusions ]

32 7. Conclusions Results with ConSpel system demonstrate that utilizing contextual information helps improve automatic correction of non-word misspellings, for both native and non-native speakers of English.


Download ppt "On using context for automatic correction of non-word misspellings in student essays Michael Flor Yoko Futagi Educational Testing Service 2012 ACL."

Similar presentations


Ads by Google