Transformer result, convolutional encoder-decoder 8-5-2018 David Ling
Contents Another trial on the transformer Seen data Unseen data Fsore on ConLL 2014 shared task A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction Performance Rescoring with language model
Trial on transformer Data: Training: Result: 5M Wikipedia lines (very small portion) 1M parallel lines (eg. UNCLE, Lang8, FCE) Training: Step 0 to 75k: - Training used all data , often gives identical output Step 75k to 525k: - Training using parallel lines only - Training with a larger learning rate Result: More fluent, but often identical Able to see some generalization when step is at 450k Change of training data Change of learning rate 1 step = 4096 lines 75000 steps ~ 300M lines
Results on CoNLL 2014 shared task’s test set Using the “m2scorer” Our trial result: At 450k steps At 525k steps Successfully corrected examples: IN: For an example , if exercising is helpful for family potential … OUT: For example , if exercising is helpful for family potential … IN: … and let others known about the risk so that … OUT …and let others know about the risk so that …
Results – SEEN DATA Original I had always wanted to swim in the middle of sharks . Reference I had always wanted to swim with sharks . At 225k I had always wanted to swim in the middle of rain . At 450k I had always wanted to swim in the middle of it . At 525k I had always wanted to swim in the middle of her . Not ok Original We ’re lucky mum is such a good cooker . Reference We ’re lucky mum is such a good cook . At 225k We 're lucky mum is such a good cook . At 450k Our mum is such a good cook . At 525k We a lucky mum is such a good cooker . Was ok Original however , I hoped rain because it is too hot . Reference However , I hoped it would rain today because it is too hot . At 225k However , I hoped rain because it is too hot . At 450k However , I hoped it would rain because it is too hot . At 525k Ok
How general is the correction on “I hoped rain”? SEEN IN: however , I hoped rain because it is too hot . OUT: However , I hoped it would rain because it is too hot . UNSEEN (modified) IN: anyway , I hoped rain seriously . OUT: Anyway , I hoped it would rain seriously . IN: however , John hoped rain because it is too hot . OUT: However , John hoped rain because it is too hot . IN: By the way , I wished rain . OUT: By the way , I wished rain . OK Not OK
Results – unseen sentences Original He go to school by bus . Reference He goes to school by bus . At 225k At 450k At 525k Corrected subject verb disagreement Original John give me the apple . Reference John gives me the apple . At 225k John give me an apple . At 450k John gives me an apple . At 525k John give me the apple Getting identical Original The second speaker had more emotioned expression during his presentation . Reference The second speaker had more emotional expressions during his presentation . At 225k The second speaker had more emotioned expressions during his presentation . At 450k At 525k Corrected noun number
By NUS (data provider of CoNLL 2014 shared task on English correction) Convolutional + attention 2018, AAAI-18 source code is available on github 3-words window convolution filters
Paper’s result and baseline Components: Convolutional neural network (MLConv) Edit operation (EO) Language model (LM) Parallel corpora: NUCLE, Lang8 v2 Other corpora: Wikipedia (for word embedding) Common Crawl (for language model)
Rescoring Convolutional + attention Rescoring after beam searching: Feature f1 = Edit operation (number of deletions, insertions, replacements) Feature f2 = Language model (sum of log probability of 5-grams) Target sentence Source sentence Feature score
Language model - Kenlm A language model library (https://github.com/kpu/kenlm) Fast and widely used (eg. hashed and is defaulted in Moses) Backoff smoothed language model Example: 5gram “today is a sunny day” appeared 0 times in corpus Longest matched ngram is the 3gram “a sunny day” Discount and penalty are obtained by minimizing perplexity of the development data Discounted probability Backoff penalty