The CoNLL-2014 Shared Task on Grammatical Error Correction

The CoNLL-2014 Shared Task on Grammatical Error Correction
Rule-based/statistical parts of different teams David Ling

Contents Rule-based/statistical parts of different teams Suggestions
Team CAMB Team POST Team UFC Suggestions Different approaches for different errors (RB/LM/MT) Annotated student scripts of HK students in xml Problematic patterns from non-annotated scripts using language models

Grammar checking approaches
Rule-based usually refer to handcrafted rules Low recall but high accuracy Statistical corpus-driven, usually ngram or parser features (parser may give incorrect tags if the sentence is problematic) Phase-based statistical machine translation Higher recall, but with lower accuracy Deep learning Translation Recent common approaches: Different methods for different error types

CoNLL-2014 approaches overview
Many hybrid systems with rule- based (RB) LM: language model MT: machine translation SIMPLE hand-crafted rules are used as a preliminary Reason: Some error types are more regular than others: subject verb agreement (SVA) Some are hard: wrong collocation (Wci) eg. Heavy rain -> Thick rain

Team CAMB (University of Cambridge) (1st)
Pipeline: Ngram corpus-driven rules Derived from the Cambridge Learner Corpus (2003) 6M words, with errors and corrections annotated, non-public Extract unigrams, bigrams, and trigrams as errors, which (1) marked as incorrect for > 90% of the times they occur and (2) appeared over 5 times Many mistakes are still not detected, but precision has been found to be more important in terms of learning effect We do not have such huge annotated corpus SMT part is skipped in this power point phase-based statistical machine translation (SMT) Rule-based (RB)

Team POST (Pohang University of Science and Technology)(4th)
Using Language model + Hand-crafted rules (without SMT) Pipeline: Quite similar to LanguageTool Hand-crafted rules ngram frequency for replacement language model: Noun number error (catcats) hand-crafted rules: Insertion (only articles: a, an, the) hand-crafted rules: Subject verb agreement, preposition Ngram frequency: Deletion, replacement (raiserise)

Team POST (4th)- The hand-crafted rules
Only a few simple hand-crafted rules use word and POSTag patterns as the trigger condition VBZ (Verb, 3rd ps. sing. Present): eats, jumps, believes, is, has NN (singular noun) VBG (Verb, gerund/present participle): eating, jumping, believing VBP (Verb, non-3rd ps. sing. Present): eat, jump, believe, am

Team POST (4th) – ngram frequency for deletion, replacement
A list of candidate pairs (Only 215 pairs) Extracted from UNCLE (about 1000 essays, from the conference, see the table) Different candidate pairs have different window sizes, eg. (1;1), (2;1), (1;3) Use Google N-gram corpus frequency to give replacement or not Schematic example: (too to) with window (2;1): Sentence: I go too school by bus ‘I go too school’: 0 times vs ‘I go to school’: 5 times Replace ‘too’ by ‘to’ Window size of a pair is the one with the highest accuracy in training (max: 5gram) However, I think the neural network approach from LanguageTool is better (faster with less memory)

Team UFC (11th) - Universit´e de Franche-Comt´e
Title: Grammatical Error Detection and Correction Using Tagger Disagreement Pipeline: Statistical parsers  Handcrafted rules statistical parsers Stanford Parser: statistical probability parser TreeTagger: by decision tree Deal with 3 types of errors only: (1) Subject verb agreement (by Stanford Parser) (2) Verb form (by Stanford Parser) (3) verb form or word form (by disagreement between Stanford Parser and TreeTagger)

Subject verb agreement (Stanford Parser) Provides POSTag and word dependency Example: Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas. Nsubjpass: (submitted, Bills) A singular verb (VBZ) plural, if it was the root of the dependency tree and was referred to by a plural noun (NNS) They submits They submit (2) verb form (Stanford Parser, hand-crafted rules) Use POSTag of the preceding token Modal verb (MD) + singular verb (VBZ), then VBZ bare infinitive form David can eats.  He can eat.

(3) By disagreements between Stanford Parser and TreeTagger) Vform error: JJ (Stanford) & VB (TreeTagger) past participle a develop country  a developed country Wform error: JJ (Stanford) & adverb (TreeTagger) remove ‘ly’ lead to a completely failure  lead to a complete failure JJ: adjective VB: verb

Conclusions Different methods for different error types
Error with regular pattern  Simple handcraft rules: SVA (subject verb agreement) Vform (verb form) Mec (spelling, punctuation), ArtOrDet, Nn (noun number) Error depends on nearby words  Ngram frequency/LanguageTool neural network Prep (preposition) Vform (verb form), Wform (word form) Wci (wrong collocation) Error depends on context  Machine translation UM (unclear meaning) Rloc- (redundancy)

Suggested next steps We need: 1) An annotated corpus (HK students)
performance evaluation Suggest to follow the xml format of UNCLE (28 error types) a program for teachers to marke scripts and output in the xml format Statistical findings also add values to literatures 2) Extract error patterns from non-annotated scripts with low ngram probability alternative as we don’t have a large corpus For 5-gram, possible language models are Bi-gram model: p(x1,x2,x3,x4,x5) = p(x1)p(x2|x1)p(x3|x2)…p(x5|x4) Neural network: p(x1,x2,x3,x4,x5) = f (x1,x2,x3,x4,x5), where xi are trained word vectors with the neural network f

Suggested next steps - Extract error patterns
Example sentence from a HSMC student: “Hence , he explain lots of people upload a video to YouTube .” Ngrams with low ngram frequencies are (1) Problematic, (2) Bad style (as a 2nd language learner) or (3) Correct but uncommon usage 3-gram frequency with Wiki corpus ' Hence , he ': 61, ' , he explain ': 1, ' he explain lots ': 0, ' explain lots of ': 1, ' lots of people ': 2383, ' of people upload ': 1, ' people upload a ': 0, ' upload a video ': 49, ' a video to ': 138, ' video to YouTube ':401, ‘ to YouTube . ': 105 2-gram frequency with Wiki corpus ' Hence , ': 3044, ' , he ': , ' he explain ': 10, ' explain lots ': 0, ' lots of ': 3700, ' of people ': 21564, ' people upload ': 1, ' upload a ': 33, ' a video ': 6769, ' video to ': 744, ' to YouTube ': 55, ' YouTube \\. ': 338 We may try to group the extracted ngrams and make rules Example: a noun clause is preferred after the word ‘explain’

Suggested next steps 3) The rule-based part
Similar to LanguageTool, allows English teachers to make handcrafted rules using xml A function to search for false alarms during deciding the handcrafted rules Assuming texts in Wikipedia are correct Try to search the corpus to see if there are any matched patterns with the designed rule This helps a lot in finding exceptions (acutally LanguageTool has this feature online) 4) Language model and the machine translation part

End The CoNLL-2014 Shared Task on Grammatical Error Correction
Google Books N-gram Corpus used as a Grammar Checker (EACL 2012) N-gram based Statistical Grammar Checker for Bangla and English Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction, 2016 Conference on Empirical Methods in Natural Language Processing, pages 1546–1556

Google Books N-gram Corpus used as a Grammar Checker (EACL 2012)
Google Books N-gram Corpus (up to 5-gram, >40 times will be recorded) Use n-gram frequency to do 4 types of tasks: Error detection: Flag a tag if 2-gram is not found in the database Other tasks: Word multiple choices, inflection, fill in the banks N-gram error detection for French (non-native speakers) N-gram error detection has a good performance on lexical selection and prepositions Tp: true positive fn: false negative

The CoNLL-2014 Shared Task on Grammatical Error Correction

Similar presentations

Presentation on theme: "The CoNLL-2014 Shared Task on Grammatical Error Correction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The CoNLL-2014 Shared Task on Grammatical Error Correction

Similar presentations

Presentation on theme: "The CoNLL-2014 Shared Task on Grammatical Error Correction"— Presentation transcript:

Similar presentations

About project

Feedback