The CoNLL-2014 Shared Task on Grammatical Error Correction

The CoNLL-2014 Shared Task on Grammatical Error Correction
Rule-based/statistical parts of different teams and our next steps David Ling

Contents Rule-based/statistical parts of different teams To do:
Team CAMB, Team POST, Team UFC Different approaches for different errors (RB/LM/MT) To do: Annotated student scripts of HK students in xml Extracting problematic patterns from non-annotated scripts using language models (LM) Develop a rule-based system Ngram/Machine translation system

Grammar checking approaches
Rule-based usually refer to handcrafted rules Low recall but high accuracy Statistical corpus-driven rules, usually ngram or parser features (parser may give incorrect tags if the sentence is problematic) Phase-based statistical machine translation Higher recall, but with lower accuracy Deep learning Machine translation Recent common approaches: Different methods for different error types

CoNLL-2014 approaches overview
Many hybrid systems with rule- based (RB) LM: language model MT: machine translation SIMPLE hand-crafted rules are used as a preliminary Reasons: Some error types are more regular than others: subject verb agreement (SVA) Some are hard: wrong collocation eg. Heavy rain  Thick rain

Team CAMB (University of Cambridge) (1st)
Rule-based + machine translation: Ngram corpus-driven rules Cambridge Learner Corpus, 6M words, with errors and corrections annotated, non-public Extract unigrams, bigrams, and trigrams as errors, which (1) marked as incorrect for > 90% of the times they occur and (2) appeared over 5 times Eg. “thick rain” is marked as problematic every time in the corpus Low recall, but high precision We do not have such huge annotated corpus, therefore an alternative is proposed later SMT part is skipped in this power point phase-based statistical machine translation (SMT) Input text Rule-based (RB) Output text

Team POST (Pohang University of Science and Technology)(4th)
Language model + Hand-crafted rules (without SMT) Only a few simple hand-crafted rules Subject verb agreement: check if a singular noun, ‘this’, ‘it’, ‘one’, or gerund is in front of a singular verb (5 tokens ahead) Language model (LM): Noun number error (catcats) Input text Hand-crafted rules (RB): Insertion (only articles: a, an, the) Hand-crafted rules (RB): Subject verb agreement, preposition Ngram frequency (LM): Deletion, replacement (raiserise)

Team POST (4th) – ngram frequency for deletion, replacement
A list of candidate pairs with different windows y (extracted from UNCLE) Use Google N-gram corpus frequency to give replacement or not Schematic example: (too to) with window (2;1): Sentence: I go too school by bus ‘I go too school’: 0 times vs ‘I go to school’: 5 times Replace ‘too’ by ‘to’ Window size of a pair is the one with the highest accuracy in training However, I think the neural network approach from LanguageTool is better (faster with less memory)

Team UFC (11th) - Universit´e de Franche-Comt´e
Statistical parsers + rule-based Stanford Parser Provides POSTag and word dependency Example: Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas. Handcrafted-rules based on parsers Subject verb agreement verb form word form Statistical parsers Stanford Parser TreeTagger Input text

Conclusions Different methods for different error types
Error with regular pattern  Simple handcraft rules: SVA (subject verb agreement) Vform (verb form: a study in 2010 [shown>showed] that) Mec (spelling, punctuation), ArtOrDet, Nn (noun number) Error depends on nearby words  Ngram frequency/LanguageTool neural network Prep (preposition) Vform, Wform (word form: The sense of [guilty>guilt] can) Wci (wrong collocation) Error depends on context  Machine translation UM (unclear meaning) Rloc- (redundancy)

Suggested next steps (1) An annotated corpus
From HK students, manually Target: 1k marked essays Performance evaluation, training, and statistical findings also add values to literatures Suggest to follow the xml format of UNCLE (28 error types, see right): more data, can perform comparisons with others Save time and effort to build a new framework Develop an input interface: ref Xml format of UNCLE: Information of a correction: Start/end paragraph and character positions error type, correction

Suggested next steps (2) Extract error patterns from non-annotated scripts with low ngram probability, compare with good style corpus (Wiki or native speakers’ essays) Ngram with low probability are (1) Problematic, (2) Bad style, or (3) Correct but uncommon usage Non-annotated HK students’ essays, Target: 5k or more, in txt or word format We may try to group the extracted ngrams and make rules Example sentence from a HSMC student: “Hence , he explain lots of people upload a video to YouTube .” 3-gram frequency with Wiki corpus ' Hence , he ': 61, ' , he explain ': 1, ' he explain lots ': 0, ' explain lots of ': 1, ' lots of people ': 2383, ' of people upload ': 1, ' people upload a ': 0, ' upload a video ': 49, ' a video to ': 138, ' video to YouTube ':401, ‘ to YouTube . ': 105 2-gram frequency with Wiki corpus ' Hence , ': 3044, ' , he ': , ' he explain ': 10, ' explain lots ': 0, ' lots of ': 3700, ' of people ': 21564, ' people upload ': 1, ' upload a ': 33, ' a video ': 6769, ' video to ': 744, ' to YouTube ': 55, ' YouTube \\. ': 338

Suggested next steps If we further have scripts from various countries, we may even compare to give the unique style/ problems of HK students Good essays (books/ wiki corpus) HK student essays Non-HK student essays HK style and mistakes Non-HK style and mistakes Compare to give unique styles and mistakes of HK

Suggested next steps (3) The rule-based part
Similar to LanguageTool, allows English teachers to make handcrafted rules using xml A function to search for false alarms during deciding the handcrafted rules Assuming texts in Wikipedia are correct Try to search the corpus to see if there are any matched patterns with the designed rule This helps a lot in finding exceptions (actually LanguageTool has this feature online) 4) Language model and the machine translation part

Computing English Charles 1 month? Develop a script marking interface Collect and organize non-annotated student scripts 3 month? Develop a rule-based system Mark scripts using the interface Extract problematic ngrams & system update Test the rule-based system N month? Statistical and translation Analyze the problematic ngram Meeting with Charles? Time

End The CoNLL-2014 Shared Task on Grammatical Error Correction
Google Books N-gram Corpus used as a Grammar Checker (EACL 2012) N-gram based Statistical Grammar Checker for Bangla and English Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction, 2016 Conference on Empirical Methods in Natural Language Processing, pages 1546–1556

The CoNLL-2014 Shared Task on Grammatical Error Correction

Similar presentations

Presentation on theme: "The CoNLL-2014 Shared Task on Grammatical Error Correction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The CoNLL-2014 Shared Task on Grammatical Error Correction

Similar presentations

Presentation on theme: "The CoNLL-2014 Shared Task on Grammatical Error Correction"— Presentation transcript:

Similar presentations

About project

Feedback