Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transformer result, convolutional encoder-decoder

Similar presentations


Presentation on theme: "Transformer result, convolutional encoder-decoder"— Presentation transcript:

1 Transformer result, convolutional encoder-decoder
David Ling

2 Contents Another trial on the transformer
Seen data Unseen data Fsore on ConLL 2014 shared task A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction Performance Rescoring with language model

3 Trial on transformer Data: Training: Result:
5M Wikipedia lines (very small portion) 1M parallel lines (eg. UNCLE, Lang8, FCE) Training: Step 0 to 75k: - Training used all data , often gives identical output Step 75k to 525k: - Training using parallel lines only - Training with a larger learning rate Result: More fluent, but often identical Able to see some generalization when step is at 450k Change of training data Change of learning rate 1 step = 4096 lines 75000 steps ~ 300M lines

4 Results on CoNLL 2014 shared task’s test set
Using the “m2scorer” Our trial result: At 450k steps At 525k steps Successfully corrected examples: IN: For an example , if exercising is helpful for family potential … OUT: For example , if exercising is helpful for family potential … IN: … and let others known about the risk so that … OUT …and let others know about the risk so that …

5 Results – SEEN DATA Original
I had always wanted to swim in the middle of sharks . Reference I had always wanted to swim with sharks . At 225k I had always wanted to swim in the middle of rain . At 450k I had always wanted to swim in the middle of it . At 525k I had always wanted to swim in the middle of her . Not ok Original We ’re lucky mum is such a good cooker . Reference We ’re lucky mum is such a good cook . At 225k We 're lucky mum is such a good cook . At 450k Our mum is such a good cook . At 525k We a lucky mum is such a good cooker . Was ok Original however , I hoped rain because it is too hot . Reference However , I hoped it would rain today because it is too hot . At 225k However , I hoped rain because it is too hot . At 450k However , I hoped it would rain because it is too hot . At 525k Ok

6 How general is the correction on “I hoped rain”?
SEEN IN: however , I hoped rain because it is too hot . OUT: However , I hoped it would rain because it is too hot . UNSEEN (modified) IN: anyway , I hoped rain seriously . OUT: Anyway , I hoped it would rain seriously . IN: however , John hoped rain because it is too hot . OUT: However , John hoped rain because it is too hot . IN: By the way , I wished rain . OUT: By the way , I wished rain . OK Not OK

7 Results – unseen sentences
Original He go to school by bus . Reference He goes to school by bus . At 225k At 450k At 525k Corrected subject verb disagreement Original John give me the apple . Reference John gives me the apple . At 225k John give me an apple . At 450k John gives me an apple . At 525k John give me the apple Getting identical Original The second speaker had more emotioned expression during his presentation . Reference The second speaker had more emotional expressions during his presentation . At 225k The second speaker had more emotioned expressions during his presentation . At 450k At 525k Corrected noun number

8 By NUS (data provider of CoNLL 2014 shared task on English correction)
Convolutional + attention 2018, AAAI-18 source code is available on github 3-words window convolution filters

9 Paper’s result and baseline
Components: Convolutional neural network (MLConv) Edit operation (EO) Language model (LM) Parallel corpora: NUCLE, Lang8 v2 Other corpora: Wikipedia (for word embedding) Common Crawl (for language model)

10 Rescoring Convolutional + attention Rescoring after beam searching:
Feature f1 = Edit operation (number of deletions, insertions, replacements) Feature f2 = Language model (sum of log probability of 5-grams) Target sentence Source sentence Feature score

11 Language model - Kenlm A language model library ( Fast and widely used (eg. hashed and is defaulted in Moses) Backoff smoothed language model Example: 5gram “today is a sunny day” appeared 0 times in corpus Longest matched ngram is the 3gram “a sunny day” Discount and penalty are obtained by minimizing perplexity of the development data Discounted probability Backoff penalty


Download ppt "Transformer result, convolutional encoder-decoder"

Similar presentations


Ads by Google