Memory-augmented Chinese-Uyghur Neural Machine Translation

Slides:



Advertisements
Similar presentations
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Advertisements

A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Large Language Models in Machine Translation Conference on Empirical Methods in Natural Language Processing 2007 報告者:郝柏翰 2013/06/04 Thorsten Brants, Ashok.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
Korea Maritime and Ocean University NLP Jung Tae LEE
Addressing the Rare Word Problem in Neural Machine Translation
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Is Neural Machine Translation the New State of the Art?
Attention Model in NLP Jichuan ZENG.
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Statistical Machine Translation Part II: Word Alignments and EM
SUNY Korea BioData Mining Lab - Journal Review
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
End-To-End Memory Networks
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
A Deep Memory Network for Chinese Zero Pronoun Resolution
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
Neural Machine Translation by Jointly Learning to Align and Translate
Attention Is All You Need
Show and Tell: A Neural Image Caption Generator (CVPR 2015)
An Overview of Machine Translation
KantanNeural™ LQR Experiment
Attention Is All You Need
Joint Training for Pivot-based Neural Machine Translation
Intro to NLP and Deep Learning
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Neural Machine Translation By Learning to Jointly Align and Translate
Hybrid computing using a neural network with dynamic external memory
Triangular Architecture for Rare Language Translation
Deep Learning based Machine Translation
Paraphrase Generation Using Deep Learning
Chinese Poetry Generation with Planning based Neural Network
Final Presentation: Neural Network Doc Summarization
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Understanding LSTM Networks
Code Completion with Neural Attention and Pointer Networks
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
Statistical Machine Translation Papers from COLING 2004
Machine Translation(MT)
Domain Mixing for Chinese-English Neural Machine Translation
Dialogue State Tracking & Dialogue Corpus Survey
Report by: 陆纪圆.
Attention.
Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab
Word embeddings (continued)
Attention for translation
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Presented by: Anurag Paul
Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.
Neural Machine Translation using CNN
Extracting Why Text Segment from Web Based on Grammar-gram
Neural Machine Translation
Recurrent Neural Networks
CSC 578 Neural Networks and Deep Learning
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

Memory-augmented Chinese-Uyghur Neural Machine Translation Shiyue Zhang CSLT, Tsinghua University; Xinjiang University Co-work with Gulnigar Mahmut, Dong Wang, Askar Hamdulla

Outline Introduction Attention-based NMT Memory-augmented NMT Experiments Conclusions Future work Reference

Introduction Uyghur is a minority language in China, mainly used in Xinjiang Province Chinese-Uyghur/Uyghur-Chinese translation Challenges: Common for minority languages Low resource Large vocabulary Syntactic order Uyghur: Subject-object-verb (SOV) Chinese: Subject-verb-object (SVO) Agglutinative nature of Uyghur 30,000 root words, 8 prefixes, more than 100 suffixes Common for many languages

Introduction ? Not perfect! Previous works: Our choice: Statistical Machine Translation(SMT) Low resource, Small dataset Phrase-based Machine Translation Phrase mappings + language model ? Our choice: Neural Machine Translation (NMT) Attention-based NMT Meaning-oriented method Not perfect!

Introduction Out of Vocabulary (OOV) Let’s say the number of Chinese words in training set is 130,000. ... SMT Vocabulary = 130,000 NMT ... Vocabulary = 30,000 ”UNK”

Introduction Rare words NMT gives a reasonable translation, but the meaning drifts away. Overfits to frequent observations, while overlooks rare ones. An experiment: after decoding training set, 30,000 English vocabulary shrinks to 26911. ~3,000 rare words are smoothed out. Chinese-Uyghur/Uyghur-Chinese translation will aggravate OOV and rare word problem

Introduction Our aim: To address the rare and OOV word problem Our method: augment NMT with a memory component which memorizes source-target word mappings. It’s like equipping a translator with a dictionary.

Outline Introduction Attention-based NMT Memory-augmented NMT Experiments Conclusions Future work

Attention-based NMT Encoder-decoder architecture Encoder: a bi-directional RNN ℎ 1 , ℎ 2 , … Attention mechanism: 𝛼 𝑖𝑗 = 𝑒 𝑖𝑗 𝑒 𝑖𝑘 𝑒 𝑖𝑗 =𝑎( 𝑠 𝑖−1 , ℎ 𝑗 ) 𝑐 𝑖 = 𝛼 𝑖𝑗 ℎ 𝑗 Decoder: a RNN 𝑠 1 , 𝑠 2 , … -> 𝑦 1 , 𝑦 2 , … 𝑠 𝑖 = 𝑓 𝑑 ( 𝑦 𝑖−1 , 𝑠 𝑖−1 , 𝑐 𝑖 ) 𝑧 𝑖 =𝑔 𝑦 𝑖−1 , 𝑠 𝑖−1 , 𝑐 𝑖 p 𝑦 𝑖 = 𝜎( 𝑦 𝑖 𝑇 𝑊 𝑧 𝑖 ) Attention weights Context vector

Attention-based NMT 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝑣𝑒𝑐𝑡𝑜𝑟:𝑐=0.05∗ ℎ 1 +0.1∗ ℎ 2 +0.05∗ ℎ 3 +0.8∗ ℎ 4 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑤𝑒𝑖𝑔ℎ𝑡𝑠: 𝛼=[0.05, 0.1, 0.05, 0.8]

Attention-based NMT 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝑣𝑒𝑐𝑡𝑜𝑟:𝑐=0.8∗ ℎ 1 +0.05∗ ℎ 2 +0.1∗ ℎ 3 +0.05∗ ℎ 4 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑤𝑒𝑖𝑔ℎ𝑡𝑠: 𝛼=[0.8, 0.05, 0.1, 0.05]

Outline Introduction Attention-based NMT Memory-augmented NMT Experiments Conclusions Future work

Memory-augmented NMT Memory construction: 3 steps Global memory: Source-target word mappings Obtained from SMT or human-defined dictionary Local memory: Select elements from global memory based on each input sentence, and the selection is in the order of p 𝑦 𝑗 𝑥 𝑖 Replace 𝑥 𝑖 by ℎ 𝑖 Merged memory: Merge repeated target words in local memory 𝑢 𝑘 = 𝑦 𝑗 ℎ 𝑖 = 𝑦 𝑗 𝑖 𝑝( 𝑥 𝑖 | 𝑦 𝑗 ) ℎ 𝑖

Memory-augmented NMT merged memory local memory Global memory i me my love like Beijing a*h1+((1-a)*h4 h1 h2 h3 merged memory i me my love like Beijing h1 h2 h3 h4 local memory i me my you your love like Beijing Shanghai … 我 你 爱 北京 上海 啊 Global memory

Memory-augmented NMT Memory attention Similar to original attention mechanism Decide which memory slots to attend to 𝛼 𝑖𝑘 𝑚 = 𝑒 𝑖𝑘 𝑚 𝑖=𝑘 𝐾 𝑒 𝑖𝑘 𝑚 𝑒 𝑖𝑘 𝑚 =𝑎 [𝑠 𝑖−1 , 𝑦 𝑖−1 , 𝑢 𝑘 ) We directly take 𝛼 𝑖𝑘 𝑚 as the posterior, and combine it with posterior produced by neural model, 𝛽=1/3 𝑝 𝑦 𝑖 = 𝛽 𝛼 𝑖𝑘 𝑚 + 1−𝛽 𝑝( 𝑦 𝑖 )

Memory-augmented NMT i me my love like Beijing a*h1+((1-a)*h4 h1 h2 h3 0.034 0.004 0.012 0.02 0.01 0.46 0.26 0.19 i my love like Beijing Shanghai … 0.001 0.003 0.005 0.01 0.3 0.4 0.28 i my love like Beijing Shanghai … 0.1 0.01 0.03 0.05 0.8 i my love like Beijing Shanghai … i me my love like Beijing a*h1+((1-a)*h4 h1 h2 h3

Memory-augmented NMT X OOV treatment Represent an OOV word by its similar word in vocabulary An example: Src: X OOV OOV

Outline Introduction Attention-based NMT Memory-augmented NMT Experiments Conclusions Future work

Experiments Data: 180k sentence pairs, ~170,000 Uyghur words, ~130,000 Chinese words Biggest parallel dataset so far.

Experiments Systems: Evaluation metrics: SMT: Moses NMT M-NMT BLEU: the average of 1-4 grams bleu multiplied by a brevity penalty

Experiments

Experiments Consistent improvement on different amount of data

Experiments Recall more rare words Systems Recalled words in test SMT 3680/6666 NMT 3509/6666 M-NMT 3560/6666 *6666 is the number of words in reference

Experiments Cannot apply to the whole dataset, but performs very well for OOV name entities

Outline Introduction Attention-based NMT Memory-augmented NMT Experiments Conclusions Future work

Conclusions Biggest parallel dataset Best Chinese-Uyghur/Uyghur-Chinese translation performance M-NMT alleviates rare word and under-translation problems in NMT. M-NMT provides a way to address OOV problem. M-NMT brings stable improvement on different datasets, especially on small datasets, this improvement is significant and consistent.

Outline Introduction Attention-based NMT Memory-augmented NMT Experiments Conclusions Future work

Future work Better OOV treatment? Phrase-based memory? No need to do similar word replacement Implement to the whole dataset Phrase-based memory?

Thanks! Q&A

OOV treatment 0.46 0.3 0.8 h1