Neural Machine Translation by Jointly Learning to Align and Translate

Slides:



Advertisements
Similar presentations
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Advertisements

Addressing the Rare Word Problem in Neural Machine Translation
Neural Net Language Models
Haitham Elmarakeby.  Speech recognition
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer
Attention Model in NLP Jichuan ZENG.
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Neural Machine Translation
Convolutional Sequence to Sequence Learning
Deep Learning Methods For Automated Discourse CIS 700-7
SUNY Korea BioData Mining Lab - Journal Review
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
Best viewed with Computer Modern fonts installed
End-To-End Memory Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
SeaRNN: training RNNs with global-local losses
Van-Khanh Tran and Le-Minh Nguyen
Adversarial Learning for Neural Dialogue Generation
Attention Is All You Need
Show and Tell: A Neural Image Caption Generator (CVPR 2015)
An Overview of Machine Translation
CSCI 5922 Neural Networks and Deep Learning Language Modeling
Joint Training for Pivot-based Neural Machine Translation
Intro to NLP and Deep Learning
Neural Machine Translation By Learning to Jointly Align and Translate
Please, Pay Attention, Neural Attention
Neural Language Model CS246 Junghoo “John” Cho.
Attention Is All You Need
Deep Learning based Machine Translation
Distributed Representation of Words, Sentences and Paragraphs
Attention-based Caption Description Mun Jonghwan.
Grid Long Short-Term Memory
Paraphrase Generation Using Deep Learning
Recurrent Neural Networks
Vessel Extraction in X-Ray Angiograms Using Deep Learning
Final Presentation: Neural Network Doc Summarization
Understanding LSTM Networks
Word Embedding Word2Vec.
Code Completion with Neural Attention and Pointer Networks
The Big Health Data–Intelligent Machine Paradox
Memory-augmented Chinese-Uyghur Neural Machine Translation
Introduction to Natural Language Processing
Neural Speech Synthesis with Transformer Network
Machine Translation(MT)
Natural Language to SQL(nl2sql)
Report by: 陆纪圆.
RNN Encoder-decoder Architecture
Attention.
Please enjoy.
Word embeddings (continued)
Word Embedding 모든 단어를 vector로 표시 Word vector Word embedding Word
Attention for translation
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Presented by: Anurag Paul
Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.
Neural Machine Translation using CNN
Question Answering System
Neural Machine Translation
Presented By: Harshul Gupta
Baseline Model CSV Files Pandas DataFrame Sentence Lists
Recurrent Neural Networks
Sequence-to-Sequence Models
CSC 578 Neural Networks and Deep Learning
LHC beam mode classification
Neural Machine Translation by Jointly Learning to Align and Translate
A Neural Network for Car-Passenger matching in Ride Hailing Services.
Presentation transcript:

Neural Machine Translation by Jointly Learning to Align and Translate Bahdanau et. al., ICLR 2015 Presented by İhsan Utlu

Outline Neural Machine Translation overview Relevant studies Encoder/Decoder framework Attention mechanism Results Conclusion

Neural Machine Translation Massive improvement in recent years Google Translate, Skype Translator Compare: Phrase-based End-to-end trainable Europarl FR-ENG: 2M aligned sentences Yandex RUS-EN: 1M aligned sentences

Neural Machine Translation Basic framework: Encoder/Decoder Encoder: Vector representation of source sentence Decoder: A (conditional) language model https://research.googleblog.com/2015/11/computer-respond-to-this-email.html

NMT: Preceding studies Kalchbrenner, 2013: Recurrent Continuous Translation Models Encoder: Convolutional sequence model Cho 2014: Learning Phrase Representations using RNN Encoder- Decoder for Statistical Machine Translation GRUs introduced Sutskever 2014: Sequence to Sequence Learning with Neural Networks Multi-layer LSTMs

RNN Encoder/Decoder (Cho 2014, Sutskever 2014) LSTM/GRU units used Word embeddings also learnt <EoS>, <UNK> tokens Words outside the top 30000 frequency rank

RNN Units: GRU vs LSTM Basic LSTM Unit Basic GRU Unit

Decoder: RNN-based LM Chain rule: RNN implementation Could also condition on prev. target (Cho, et. al., 2014)

Decoder: Sentence generation Greedy search Beam search Keep a collection of B translation candidates at time t Calculate conditional distributions at t+1 Prune down to B Repeat until <EoS>

Limitations on the Encoder Encoding of long sentences an issue Even with LSTM/GRU Fixed size vector restrictive Encoded representations biased Sutskever: process in reverse Need to ‘attend’ to each individual word

Proposed Solutions Convolutional encoder (Kalchbrenner, 2013) Represent input as a matrix Use convnet architectures Attention based models Use an adaptive weighted sum of individual word vectors

Attention Model Introduce BiRNNs into the encoder Adaptive source embedding Weights depend on the target hidden state Alignments inferred with end-to- end training

Attention Model e.g. Google Translate (currently deployed) https://research.googleblog.com/2016/09/a-neural-network-for-machine.html

BiRNN Encoder with Attention One-hot vectors: GRU update eqns BiRNN w/ GRUs

Decoder implementation GRU update eqns with feedback from target sentence

Decoder implementation Attention model

Decoder implementation Output layer (with Maxout neurons) Output embedding matrix Similar to word2vec algorithms Sampled with beam search

Training Objective: Training Dataset 1000 hidden units Maximize log-prob of correct translation Training Dataset WMT 14 Corpora: 384M words after denoising Test dataset = 3003 sentences Freq. rank threshold = 30000 1000 hidden units Embedding dimensions: 620, 500 (input and output) Beam size 12

Learnt Alignments

Results The BLEU scores of the generated translations on the test set with respect to the lengths of the sentences.

Results BLEU scores BLEU-n: A metric for automated scoring of translations Based on precision The percentage of n-grams in the candidate translation that exist in one of the reference translations Further modifications are applied to the precision criterion to acccount for abuses RNNencdec: Cho et. al., 2014 RNNsearch: Proposed method Moses: Phrase-based MT

Conclusion The concept of attention introduced in the context of neural machine translation The restriction of fixed-length encoding for variable-length source sequences lifted Improvements obtained in BLEU scores Rare words seen to cause performance problems

References K. Cho, B. van Merrienboer, C¸ . G¨ulc¸ehre, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” CoRR, vol. abs/1406.1078, 2014. [Online]. Available: http://arxiv.org/abs/1406.1078 I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” CoRR, vol. abs/1409.3215, 2014. [Online]. Available: http://arxiv.org/abs/1409.3215 M. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” CoRR, vol. abs/1508.04025, 2015. [Online]. Available: http://arxiv.org/abs/1508.04025 N. Kalchbrenner and P. Blunsom, “Recurrent continuous translation models,” in EMNLP, 2013.