Neural Machine Translation By Learning to Jointly Align and Translate

Slides:



Advertisements
Similar presentations
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Advertisements

Addressing the Rare Word Problem in Neural Machine Translation
Haitham Elmarakeby.  Speech recognition
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.
S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer
Attention Model in NLP Jichuan ZENG.
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Neural Machine Translation
Convolutional Sequence to Sequence Learning
Statistical Machine Translation Part II: Word Alignments and EM
Deep Learning Methods For Automated Discourse CIS 700-7
End-To-End Memory Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Learning Amin Sobhani.
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
Recurrent Neural Networks for Natural Language Processing
Neural Machine Translation by Jointly Learning to Align and Translate
Attention Is All You Need
Show and Tell: A Neural Image Caption Generator (CVPR 2015)
An Overview of Machine Translation
Deep Learning: Model Summary
Intro to NLP and Deep Learning
Joint Training for Pivot-based Neural Machine Translation
Intelligent Information System Lab
Intro to NLP and Deep Learning
Deep Neural Networks based Text- Dependent Speaker Verification
Efficient Estimation of Word Representation in Vector Space
Word2Vec CS246 Junghoo “John” Cho.
Please, Pay Attention, Neural Attention
Neural Language Model CS246 Junghoo “John” Cho.
Attention-based Caption Description Mun Jonghwan.
Grid Long Short-Term Memory
Paraphrase Generation Using Deep Learning
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
Final Presentation: Neural Network Doc Summarization
On the Impact of Various Types of Noise on Neural Machine Translation
Understanding LSTM Networks
Word Embedding Word2Vec.
The Big Health Data–Intelligent Machine Paradox
Other Classification Models: Recurrent Neural Network (RNN)
Memory-augmented Chinese-Uyghur Neural Machine Translation
Statistical Machine Translation Papers from COLING 2004
Machine Translation(MT)
Domain Mixing for Chinese-English Neural Machine Translation
Improving IBM Word-Alignment Model 1(Robert C. MOORE)
Natural Language to SQL(nl2sql)
Report by: 陆纪圆.
RNN Encoder-decoder Architecture
Attention.
Neural networks (3) Regularization Autoencoder
A Path-based Transfer Model for Machine Translation
Word embeddings (continued)
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
-- Ray Mooney, Association for Computational Linguistics (ACL) 2014
Automatic Handwriting Generation
A unified extension of lstm to deep network
Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.
Neural Machine Translation using CNN
Question Answering System
Neural Machine Translation
Presented By: Harshul Gupta
Recurrent Neural Networks
Sequence-to-Sequence Models
Bidirectional LSTM-CRF Models for Sequence Tagging
CSC 578 Neural Networks and Deep Learning
Neural Machine Translation by Jointly Learning to Align and Translate
CS249: Neural Language Model
Presentation transcript:

Neural Machine Translation By Learning to Jointly Align and Translate Yoshua Bengio Dzmitry Bahdanau KyungHyun Cho

What is Neural Machine Translation? Simply put, it is the automated translation of human text from one language to another It’s neural because we are using deep neural networks to do it, of course! The usual gold standard for translation tasks is a huge freely available parallel corpus of English and French sentences

The “Old” Way of Doing Neural Machine Translation Sutskever (2013) and Cho (2014) showed that neural networks could effectively be trained to do this task These models relied on an encoder-decoder architecture, whereby sentences in a source language were encoded into a fixed length vector and then decoded from that vector into a target language You can think of an encoder as a neural network architecture that achieves dimensionality reduction. Kind of like PCA, but even more powerful due to its ability to capture nonlinearities

Encode Decode

Visualizing the Encoder-Decoder Framework Input: A B C Output: W X Y Z

Problems with this method? The problem with this generic encoder-decoder framework is that all sentences, regardless of size, are compressed into a fixed length vector So, it is reasonable to think that for longer sentences you may start losing semantic information with this fixed length compression Authors of this paper showed that longer sentence translations do indeed suffer because of this constraint They then introduced a novel enhancement to this encoder-decoder framework that works around the fixed length vector constraint and found that it handles longer sentences much more elegantly

Enhanced Decoder They modified the decoder by having an individualized context vector c(i) for each target word, rather than one single context vector c for all the target words Intuitively, this represents a way of having the neural net focus its attention on different parts of the source sentence when translating rather then having to compress all the information into one fixed length vector

Enhanced Decoder Equations

Alignment Function So let’s say we have an english sentence: “The dog is red.” And the french sentence is: “Le chien est rouge.”

Enhanced Encoder Instead of the usual RNN encoder, they modified the RNN so that the hidden states h(t) are bidirectional They did this because they wanted the hidden states at time t to represent not only the sequence up to that point, but the sequence afterwards as well In this way, the hidden states at time t capture a lot of information about the input word at time t, but also information about the words to the right and left of it, decreasing proportionally with sentence distance.

Putting it all together… As you can see there are two big differences between this model and the older neural machine translation model The hidden states in the encoder RNN are bidirectional now instead of simply flowing in the forward direction. The current target word y_t is predicted based on an individualized context vector c_i that is a weighted linear combination of the encoders hidden states, instead of one single fixed context vector.

Results They tested their new neural machine translation model against the older, simpler model on the standard ACL WMT dataset, which is a massive parallel corpora of french and english translations As we will see on the next slide, the new model outperforms the old one significantly, especially for longer input sentences

Results

Results

Results

Here’s An Example Test Sentence An admitting privilege is the right of a doctor to admit a patient to a hospital or a medical centre to carry out a diagnosis or a procedure, based on his status as a health care worker at a hospital.

Here’s A Translation from the First Neural Machine Translation Model Un privilege d’admission est le droit d’un medecin de reconnaitre un patient a` l’hopital ou un centre medical d’un diagnostic ou de prendre un diagnostic en fonction de son etat de sante. The part underlined means “carry out a diagnosis based on his health status”. This is wrong! Original meaning was “carry out a diagnosis, based on his status as a health care worker”

Here’s A Translation From the New Machine Translation Model Un privilege d’admission est le droit d’un medecin d’admettre un patient a un hopital ou un centre medical pour effectuer un diagnostic ou une procedure, selon son statut de travailleur des soins de sante a l’hopital. This is correct! (more or less)

Conclusion This paper introduced novel enhancements to the encoder-decoder framework that allowed neural machine translations to be significantly more robust to longer input sentences They achieved BLEU scores on the french-english translation task that are comparable to the state-of-the-art marks set by phrase based translation models and superior to those achieved by previous neural network based models