Addressing the Rare Word Problem in Neural Machine Translation

Slides:



Advertisements
Similar presentations
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Advertisements

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Neural Networks Basic concepts ArchitectureOperation.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Distributed Representations of Sentences and Documents
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.
Distributed Representative Reading Group. Research Highlights 1Support vector machines can robustly decode semantic information from EEG and MEG 2Multivariate.
Neural Net Language Models
Haitham Elmarakeby.  Speech recognition
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Joe Bradish Parallel Neural Networks. Background  Deep Neural Networks (DNNs) have become one of the leading technologies in artificial intelligence.
English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
Attention Model in NLP Jichuan ZENG.
Olivier Siohan David Rybach
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Neural Machine Translation
Unsupervised Learning of Video Representations using LSTMs
SUNY Korea BioData Mining Lab - Journal Review
End-To-End Memory Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
How to describe a graph Otherwise called CUSS
Neural Machine Translation by Jointly Learning to Align and Translate
Attention Is All You Need
Show and Tell: A Neural Image Caption Generator (CVPR 2015)
Intro to NLP and Deep Learning
CSCI 5922 Neural Networks and Deep Learning Language Modeling
Joint Training for Pivot-based Neural Machine Translation
Intro to NLP and Deep Learning
mengye ren, ryan kiros, richard s. zemel
Neural Machine Translation By Learning to Jointly Align and Translate
Neural Language Model CS246 Junghoo “John” Cho.
Attention Is All You Need
Deep Learning based Machine Translation
Grid Long Short-Term Memory
Paraphrase Generation Using Deep Learning
Word Embedding Word2Vec.
Code Completion with Neural Attention and Pointer Networks
Memory-augmented Chinese-Uyghur Neural Machine Translation
Introduction to Natural Language Processing
Statistical Machine Translation Papers from COLING 2004
Machine Translation(MT)
Domain Mixing for Chinese-English Neural Machine Translation
Unsupervised Pretraining for Semantic Parsing
Natural Language to SQL(nl2sql)
Report by: 陆纪圆.
Attention.
Machine learning overview
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Word embeddings (continued)
Word Embedding 모든 단어를 vector로 표시 Word vector Word embedding Word
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Presented by: Anurag Paul
Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi
Neural Machine Translation using CNN
Question Answering System
Neural Machine Translation
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals (Google) Wojciech Zaremba (New York Univerity)

Abstract Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches A significant weakness in conventional NMT systems is their inability to correctly translate very rare words

Neural Machine Translation A neural machine translation system is any neural network that maps a source sentence, s1, . . . , sn, to a target sentence, t1, . . . , tm More concretely, an NMT system uses a neural network to parameterize the conditional distributions for 1 ≤ j ≤ m

Neural Machine Translation They use a deep LSTM to encode the input sequence and a separate deep LSTM to output the translation. The encoder reads the source sentence, one word at a time, and produces a large vector that represents the entire source sentence. The decoder is initialized with this vector and generates a translation, one word at a time, until it emits the end-of-sentence symbol <eos>.

Rare Word Models They treated the NMT system as a black box and train it on a corpus annotated by one of the models which will follow shortly. First, the alignments are produced with an unsupervised aligner. Next, they use the alignment links to construct a word dictionary that will be used for the word translations in the post-processing step. If a word does not appear in their dictionary, they apply the identity translation

1.Copyable Model 2.Position All Model (PosAll) 3.Positional Unknown Model (PosUnk)

Training Data Training data consisted of 12 M parallel sentences. (348 M French and 304 M English words) Due to the computationally intensive nature of the naive softmax, they limited the French vocabulary to the either the 40K or the 80K most frequent French words. On the source side, they could afford a much larger vocabulary, so they used the 200K most frequent English words. The model treats all other words as unknowns.

Main result

Comparison of different alignment models

Effect of depths

Sample Translation

Conclusions A simple alignment based technique can mitigate and even overcome the main weakness of the current NMT systems, which is their inability to translate words that are not in their vocabulary. A key advantage is that it is applicable to any NMT system and not only deep LSTM model. The technique yielded a constant and substantial improvement of up to 2.8 BLEU points over various NMT systems. With 37.5 BLEU points they have established the first NMT system that outperformed the best MT system on a WMT’14 contest dataset.

Thank You!