Transformer result, convolutional encoder-decoder

Slides:



Advertisements
Similar presentations
Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group.
Advertisements

1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
ENGLISH LANGUAGE ARTS 8 Infinitives/Infinitive Phrases.
Subject Adjective Clauses & Adjective Phrases
12/13/2007Chia-Ho Ling1 SRILM Language Model Student: Chia-Ho Ling Instructor: Dr. Veton Z. K ë puska.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Module 10 Administering and Configuring SharePoint Search.
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
PGCE Taster Session Grammar Workshop.
Language Identification and Part-of-Speech Tagging
S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer
Olivier Siohan David Rybach
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Convolutional Sequence to Sequence Learning
End-To-End Memory Networks
Relation Extraction CSCI-GA.2591
Neural Machine Translation by Jointly Learning to Align and Translate
Custom rules on subject verb agreement
專題研究 week3 Language Model and Decoding
Digital Speech Processing
Part I: Basics and Constituency
PGCE Taster Session Grammar Workshop.
Neural Machine Translation By Learning to Jointly Align and Translate
English correction corpora
Neural Language Model CS246 Junghoo “John” Cho.
Generalization ..
Deep Learning based Machine Translation
By: Kevin Yu Ph.D. in Computer Engineering
EL117 Unit 1: Session (4) Prepared By: Dr. Marine Milad.
Build MT systems with Moses
Zhifei Li and Sanjeev Khudanpur Johns Hopkins University
web1T and deep learning methods
Introducing the Open Cloze
The CoNLL-2014 Shared Task on Grammatical Error Correction
Word Embedding Word2Vec.
The CoNLL-2014 Shared Task on Grammatical Error Correction
Hong Kong English in Students’ Writing
CSCE 771 Natural Language Processing
The Big Health Data–Intelligent Machine Paradox
Grammar correction – Data collection interface
Memory-augmented Chinese-Uyghur Neural Machine Translation
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
Chapter 11 Practical Methodology
Statistical Machine Translation Papers from COLING 2004
Statistical n-gram David ling.
Subject The Subject is the agent of the sentence in the active
Ensemble learning.
Improving IBM Word-Alignment Model 1(Robert C. MOORE)
Ngram frequency smooting
Word embeddings (continued)
English project More detail and the data collection system
Preposition error correction using Graph Convolutional Networks
Attention for translation
Presenter : Jen-Wei Kuo
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Neural Machine Translation using CNN
PGCE Taster Session Grammar Workshop.
Neural Machine Translation
Baseline Model CSV Files Pandas DataFrame Sentence Lists
Neural Machine Translation by Jointly Learning to Align and Translate
CS249: Neural Language Model
Visual Grounding.
The experiment based on hier-attention
Presentation transcript:

Transformer result, convolutional encoder-decoder 8-5-2018 David Ling

Contents Another trial on the transformer Seen data Unseen data Fsore on ConLL 2014 shared task A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction Performance Rescoring with language model

Trial on transformer Data: Training: Result: 5M Wikipedia lines (very small portion) 1M parallel lines (eg. UNCLE, Lang8, FCE) Training: Step 0 to 75k: - Training used all data , often gives identical output Step 75k to 525k: - Training using parallel lines only - Training with a larger learning rate Result: More fluent, but often identical Able to see some generalization when step is at 450k Change of training data Change of learning rate 1 step = 4096 lines 75000 steps ~ 300M lines

Results on CoNLL 2014 shared task’s test set Using the “m2scorer” Our trial result: At 450k steps At 525k steps Successfully corrected examples: IN: For an example , if exercising is helpful for family potential … OUT: For example , if exercising is helpful for family potential … IN: … and let others known about the risk so that … OUT …and let others know about the risk so that …

Results – SEEN DATA Original I had always wanted to swim in the middle of sharks . Reference I had always wanted to swim with sharks . At 225k I had always wanted to swim in the middle of rain . At 450k I had always wanted to swim in the middle of it . At 525k I had always wanted to swim in the middle of her . Not ok Original We ’re lucky mum is such a good cooker . Reference We ’re lucky mum is such a good cook . At 225k We 're lucky mum is such a good cook . At 450k Our mum is such a good cook . At 525k We a lucky mum is such a good cooker . Was ok Original however , I hoped rain because it is too hot . Reference However , I hoped it would rain today because it is too hot . At 225k However , I hoped rain because it is too hot . At 450k However , I hoped it would rain because it is too hot . At 525k Ok

How general is the correction on “I hoped rain”? SEEN IN: however , I hoped rain because it is too hot . OUT: However , I hoped it would rain because it is too hot . UNSEEN (modified) IN: anyway , I hoped rain seriously . OUT: Anyway , I hoped it would rain seriously . IN: however , John hoped rain because it is too hot . OUT: However , John hoped rain because it is too hot . IN: By the way , I wished rain . OUT: By the way , I wished rain . OK Not OK

Results – unseen sentences Original He go to school by bus . Reference He goes to school by bus . At 225k At 450k At 525k Corrected subject verb disagreement Original John give me the apple . Reference John gives me the apple . At 225k John give me an apple . At 450k John gives me an apple . At 525k John give me the apple Getting identical Original The second speaker had more emotioned expression during his presentation . Reference The second speaker had more emotional expressions during his presentation . At 225k The second speaker had more emotioned expressions during his presentation . At 450k At 525k Corrected noun number

By NUS (data provider of CoNLL 2014 shared task on English correction) Convolutional + attention 2018, AAAI-18 source code is available on github 3-words window convolution filters

Paper’s result and baseline Components: Convolutional neural network (MLConv) Edit operation (EO) Language model (LM) Parallel corpora: NUCLE, Lang8 v2 Other corpora: Wikipedia (for word embedding) Common Crawl (for language model)

Rescoring Convolutional + attention Rescoring after beam searching: Feature f1 = Edit operation (number of deletions, insertions, replacements) Feature f2 = Language model (sum of log probability of 5-grams) Target sentence Source sentence Feature score

Language model - Kenlm A language model library (https://github.com/kpu/kenlm) Fast and widely used (eg. hashed and is defaulted in Moses) Backoff smoothed language model Example: 5gram “today is a sunny day” appeared 0 times in corpus Longest matched ngram is the 3gram “a sunny day” Discount and penalty are obtained by minimizing perplexity of the development data Discounted probability Backoff penalty