Download presentation
Presentation is loading. Please wait.
Published byHelen Bradley Modified over 6 years ago
1
Transformer result, convolutional encoder-decoder
David Ling
2
Contents Another trial on the transformer
Seen data Unseen data Fsore on ConLL 2014 shared task A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction Performance Rescoring with language model
3
Trial on transformer Data: Training: Result:
5M Wikipedia lines (very small portion) 1M parallel lines (eg. UNCLE, Lang8, FCE) Training: Step 0 to 75k: - Training used all data , often gives identical output Step 75k to 525k: - Training using parallel lines only - Training with a larger learning rate Result: More fluent, but often identical Able to see some generalization when step is at 450k Change of training data Change of learning rate 1 step = 4096 lines 75000 steps ~ 300M lines
4
Results on CoNLL 2014 shared task’s test set
Using the “m2scorer” Our trial result: At 450k steps At 525k steps Successfully corrected examples: IN: For an example , if exercising is helpful for family potential … OUT: For example , if exercising is helpful for family potential … IN: … and let others known about the risk so that … OUT …and let others know about the risk so that …
5
Results – SEEN DATA Original
I had always wanted to swim in the middle of sharks . Reference I had always wanted to swim with sharks . At 225k I had always wanted to swim in the middle of rain . At 450k I had always wanted to swim in the middle of it . At 525k I had always wanted to swim in the middle of her . Not ok Original We ’re lucky mum is such a good cooker . Reference We ’re lucky mum is such a good cook . At 225k We 're lucky mum is such a good cook . At 450k Our mum is such a good cook . At 525k We a lucky mum is such a good cooker . Was ok Original however , I hoped rain because it is too hot . Reference However , I hoped it would rain today because it is too hot . At 225k However , I hoped rain because it is too hot . At 450k However , I hoped it would rain because it is too hot . At 525k Ok
6
How general is the correction on “I hoped rain”?
SEEN IN: however , I hoped rain because it is too hot . OUT: However , I hoped it would rain because it is too hot . UNSEEN (modified) IN: anyway , I hoped rain seriously . OUT: Anyway , I hoped it would rain seriously . IN: however , John hoped rain because it is too hot . OUT: However , John hoped rain because it is too hot . IN: By the way , I wished rain . OUT: By the way , I wished rain . OK Not OK
7
Results – unseen sentences
Original He go to school by bus . Reference He goes to school by bus . At 225k At 450k At 525k Corrected subject verb disagreement Original John give me the apple . Reference John gives me the apple . At 225k John give me an apple . At 450k John gives me an apple . At 525k John give me the apple Getting identical Original The second speaker had more emotioned expression during his presentation . Reference The second speaker had more emotional expressions during his presentation . At 225k The second speaker had more emotioned expressions during his presentation . At 450k At 525k Corrected noun number
8
By NUS (data provider of CoNLL 2014 shared task on English correction)
Convolutional + attention 2018, AAAI-18 source code is available on github 3-words window convolution filters
9
Paper’s result and baseline
Components: Convolutional neural network (MLConv) Edit operation (EO) Language model (LM) Parallel corpora: NUCLE, Lang8 v2 Other corpora: Wikipedia (for word embedding) Common Crawl (for language model)
10
Rescoring Convolutional + attention Rescoring after beam searching:
Feature f1 = Edit operation (number of deletions, insertions, replacements) Feature f2 = Language model (sum of log probability of 5-grams) Target sentence Source sentence Feature score
11
Language model - Kenlm A language model library ( Fast and widely used (eg. hashed and is defaulted in Moses) Backoff smoothed language model Example: 5gram “today is a sunny day” appeared 0 times in corpus Longest matched ngram is the 3gram “a sunny day” Discount and penalty are obtained by minimizing perplexity of the development data Discounted probability Backoff penalty
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.