Addressing the Rare Word Problem in Neural Machine Translation

Slides:

Advertisements

Similar presentations

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Advertisements

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Neural Networks Basic concepts ArchitectureOperation.

Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Distributed Representations of Sentences and Documents

Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.

The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,

Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.

Distributed Representative Reading Group. Research Highlights 1Support vector machines can robustly decode semantic information from EEG and MEG 2Multivariate.

Neural Net Language Models

Haitham Elmarakeby.  Speech recognition

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Joe Bradish Parallel Neural Networks. Background  Deep Neural Networks (DNNs) have become one of the leading technologies in artificial intelligence.

English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.

NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.

Attention Model in NLP Jichuan ZENG.

Olivier Siohan David Rybach

Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi

Neural Machine Translation

Unsupervised Learning of Video Representations using LSTMs

SUNY Korea BioData Mining Lab - Journal Review

End-To-End Memory Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel

How to describe a graph Otherwise called CUSS

Neural Machine Translation by Jointly Learning to Align and Translate

Attention Is All You Need

Show and Tell: A Neural Image Caption Generator (CVPR 2015)

Intro to NLP and Deep Learning

CSCI 5922 Neural Networks and Deep Learning Language Modeling

Joint Training for Pivot-based Neural Machine Translation

Intro to NLP and Deep Learning

mengye ren, ryan kiros, richard s. zemel

Neural Machine Translation By Learning to Jointly Align and Translate

Neural Language Model CS246 Junghoo “John” Cho.

Attention Is All You Need

Deep Learning based Machine Translation

Grid Long Short-Term Memory

Paraphrase Generation Using Deep Learning

Word Embedding Word2Vec.

Code Completion with Neural Attention and Pointer Networks

Memory-augmented Chinese-Uyghur Neural Machine Translation

Introduction to Natural Language Processing

Statistical Machine Translation Papers from COLING 2004

Machine Translation(MT)

Domain Mixing for Chinese-English Neural Machine Translation

Unsupervised Pretraining for Semantic Parsing

Natural Language to SQL(nl2sql)

Report by: 陆纪圆.

Machine learning overview

Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler

Word embeddings (continued)

Word Embedding 모든 단어를 vector로 표시 Word vector Word embedding Word

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Attention for translation

Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi

Presented by: Anurag Paul

Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi

Neural Machine Translation using CNN

Question Answering System

Neural Machine Translation

Neural Machine Translation by Jointly Learning to Align and Translate

Presentation transcript:

Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals (Google) Wojciech Zaremba (New York Univerity)

Abstract Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches A significant weakness in conventional NMT systems is their inability to correctly translate very rare words

Neural Machine Translation A neural machine translation system is any neural network that maps a source sentence, s1, . . . , sn, to a target sentence, t1, . . . , tm More concretely, an NMT system uses a neural network to parameterize the conditional distributions for 1 ≤ j ≤ m

Neural Machine Translation They use a deep LSTM to encode the input sequence and a separate deep LSTM to output the translation. The encoder reads the source sentence, one word at a time, and produces a large vector that represents the entire source sentence. The decoder is initialized with this vector and generates a translation, one word at a time, until it emits the end-of-sentence symbol <eos>.

Rare Word Models They treated the NMT system as a black box and train it on a corpus annotated by one of the models which will follow shortly. First, the alignments are produced with an unsupervised aligner. Next, they use the alignment links to construct a word dictionary that will be used for the word translations in the post-processing step. If a word does not appear in their dictionary, they apply the identity translation

1.Copyable Model 2.Position All Model (PosAll) 3.Positional Unknown Model (PosUnk)

Training Data Training data consisted of 12 M parallel sentences. (348 M French and 304 M English words) Due to the computationally intensive nature of the naive softmax, they limited the French vocabulary to the either the 40K or the 80K most frequent French words. On the source side, they could afford a much larger vocabulary, so they used the 200K most frequent English words. The model treats all other words as unknowns.

Main result

Comparison of different alignment models

Effect of depths

Sample Translation

Conclusions A simple alignment based technique can mitigate and even overcome the main weakness of the current NMT systems, which is their inability to translate words that are not in their vocabulary. A key advantage is that it is applicable to any NMT system and not only deep LSTM model. The technique yielded a constant and substantial improvement of up to 2.8 BLEU points over various NMT systems. With 37.5 BLEU points they have established the first NMT system that outperformed the best MT system on a WMT’14 contest dataset.

Thank You!