Memory-augmented Chinese-Uyghur Neural Machine Translation

Slides:

Advertisements

Similar presentations

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Advertisements

A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven

Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Large Language Models in Machine Translation Conference on Empirical Methods in Natural Language Processing 2007 報告者：郝柏翰 2013/06/04 Thorsten Brants, Ashok.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.

Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.

Korea Maritime and Ocean University NLP Jung Tae LEE

Addressing the Rare Word Problem in Neural Machine Translation

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.

A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.

English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.

Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.

Is Neural Machine Translation the New State of the Art?

Attention Model in NLP Jichuan ZENG.

Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi

Statistical Machine Translation Part II: Word Alignments and EM

SUNY Korea BioData Mining Lab - Journal Review

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

End-To-End Memory Networks

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

A Deep Memory Network for Chinese Zero Pronoun Resolution

Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel

Neural Machine Translation by Jointly Learning to Align and Translate

Attention Is All You Need

Show and Tell: A Neural Image Caption Generator (CVPR 2015)

An Overview of Machine Translation

KantanNeural™ LQR Experiment

Attention Is All You Need

Joint Training for Pivot-based Neural Machine Translation

Intro to NLP and Deep Learning

Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.

Neural Machine Translation By Learning to Jointly Align and Translate

Hybrid computing using a neural network with dynamic external memory

Triangular Architecture for Rare Language Translation

Deep Learning based Machine Translation

Paraphrase Generation Using Deep Learning

Chinese Poetry Generation with Planning based Neural Network

Final Presentation: Neural Network Doc Summarization

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Understanding LSTM Networks

Code Completion with Neural Attention and Pointer Networks

Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B

Statistical Machine Translation Papers from COLING 2004

Machine Translation(MT)

Domain Mixing for Chinese-English Neural Machine Translation

Dialogue State Tracking & Dialogue Corpus Survey

Report by: 陆纪圆.

Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab

Word embeddings (continued)

Attention for translation

Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi

Presented by: Anurag Paul

Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.

Neural Machine Translation using CNN

Extracting Why Text Segment from Web Based on Grammar-gram

Neural Machine Translation

Recurrent Neural Networks

CSC 578 Neural Networks and Deep Learning

Neural Machine Translation by Jointly Learning to Align and Translate

Presentation transcript:

Memory-augmented Chinese-Uyghur Neural Machine Translation Shiyue Zhang CSLT, Tsinghua University; Xinjiang University Co-work with Gulnigar Mahmut, Dong Wang, Askar Hamdulla

Outline Introduction Attention-based NMT Memory-augmented NMT Experiments Conclusions Future work Reference

Introduction Uyghur is a minority language in China, mainly used in Xinjiang Province Chinese-Uyghur/Uyghur-Chinese translation Challenges: Common for minority languages Low resource Large vocabulary Syntactic order Uyghur: Subject-object-verb (SOV) Chinese: Subject-verb-object (SVO) Agglutinative nature of Uyghur 30,000 root words, 8 prefixes, more than 100 suffixes Common for many languages

Introduction ? Not perfect! Previous works: Our choice: Statistical Machine Translation(SMT) Low resource, Small dataset Phrase-based Machine Translation Phrase mappings + language model ? Our choice: Neural Machine Translation (NMT) Attention-based NMT Meaning-oriented method Not perfect!

Introduction Out of Vocabulary (OOV) Let’s say the number of Chinese words in training set is 130,000. ... SMT Vocabulary = 130,000 NMT ... Vocabulary = 30,000 ”UNK”

Introduction Rare words NMT gives a reasonable translation, but the meaning drifts away. Overfits to frequent observations, while overlooks rare ones. An experiment: after decoding training set, 30,000 English vocabulary shrinks to 26911. ~3,000 rare words are smoothed out. Chinese-Uyghur/Uyghur-Chinese translation will aggravate OOV and rare word problem

Introduction Our aim: To address the rare and OOV word problem Our method: augment NMT with a memory component which memorizes source-target word mappings. It’s like equipping a translator with a dictionary.

Outline Introduction Attention-based NMT Memory-augmented NMT Experiments Conclusions Future work

Attention-based NMT Encoder-decoder architecture Encoder: a bi-directional RNN ℎ 1 , ℎ 2 , … Attention mechanism: 𝛼 𝑖𝑗 = 𝑒 𝑖𝑗 𝑒 𝑖𝑘 𝑒 𝑖𝑗 =𝑎( 𝑠 𝑖−1 , ℎ 𝑗 ) 𝑐 𝑖 = 𝛼 𝑖𝑗 ℎ 𝑗 Decoder: a RNN 𝑠 1 , 𝑠 2 , … -> 𝑦 1 , 𝑦 2 , … 𝑠 𝑖 = 𝑓 𝑑 ( 𝑦 𝑖−1 , 𝑠 𝑖−1 , 𝑐 𝑖 ) 𝑧 𝑖 =𝑔 𝑦 𝑖−1 , 𝑠 𝑖−1 , 𝑐 𝑖 p 𝑦 𝑖 = 𝜎( 𝑦 𝑖 𝑇 𝑊 𝑧 𝑖 ) Attention weights Context vector

Attention-based NMT 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝑣𝑒𝑐𝑡𝑜𝑟:𝑐=0.05∗ ℎ 1 +0.1∗ ℎ 2 +0.05∗ ℎ 3 +0.8∗ ℎ 4 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑤𝑒𝑖𝑔ℎ𝑡𝑠: 𝛼=[0.05, 0.1, 0.05, 0.8]

Attention-based NMT 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝑣𝑒𝑐𝑡𝑜𝑟:𝑐=0.8∗ ℎ 1 +0.05∗ ℎ 2 +0.1∗ ℎ 3 +0.05∗ ℎ 4 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑤𝑒𝑖𝑔ℎ𝑡𝑠: 𝛼=[0.8, 0.05, 0.1, 0.05]

Outline Introduction Attention-based NMT Memory-augmented NMT Experiments Conclusions Future work

Memory-augmented NMT Memory construction: 3 steps Global memory: Source-target word mappings Obtained from SMT or human-defined dictionary Local memory: Select elements from global memory based on each input sentence, and the selection is in the order of p 𝑦 𝑗 𝑥 𝑖 Replace 𝑥 𝑖 by ℎ 𝑖 Merged memory: Merge repeated target words in local memory 𝑢 𝑘 = 𝑦 𝑗 ℎ 𝑖 = 𝑦 𝑗 𝑖 𝑝( 𝑥 𝑖 | 𝑦 𝑗 ) ℎ 𝑖

Memory-augmented NMT merged memory local memory Global memory i me my love like Beijing a*h1+((1-a)*h4 h1 h2 h3 merged memory i me my love like Beijing h1 h2 h3 h4 local memory i me my you your love like Beijing Shanghai … 我你爱北京上海啊 Global memory

Memory-augmented NMT Memory attention Similar to original attention mechanism Decide which memory slots to attend to 𝛼 𝑖𝑘 𝑚 = 𝑒 𝑖𝑘 𝑚 𝑖=𝑘 𝐾 𝑒 𝑖𝑘 𝑚 𝑒 𝑖𝑘 𝑚 =𝑎 [𝑠 𝑖−1 , 𝑦 𝑖−1 , 𝑢 𝑘 ) We directly take 𝛼 𝑖𝑘 𝑚 as the posterior, and combine it with posterior produced by neural model, 𝛽=1/3 𝑝 𝑦 𝑖 = 𝛽 𝛼 𝑖𝑘 𝑚 + 1−𝛽 𝑝( 𝑦 𝑖 )

Memory-augmented NMT i me my love like Beijing a*h1+((1-a)*h4 h1 h2 h3 0.034 0.004 0.012 0.02 0.01 0.46 0.26 0.19 i my love like Beijing Shanghai … 0.001 0.003 0.005 0.01 0.3 0.4 0.28 i my love like Beijing Shanghai … 0.1 0.01 0.03 0.05 0.8 i my love like Beijing Shanghai … i me my love like Beijing a*h1+((1-a)*h4 h1 h2 h3

Memory-augmented NMT X OOV treatment Represent an OOV word by its similar word in vocabulary An example: Src: X OOV OOV

Outline Introduction Attention-based NMT Memory-augmented NMT Experiments Conclusions Future work

Experiments Data: 180k sentence pairs, ~170,000 Uyghur words, ~130,000 Chinese words Biggest parallel dataset so far.

Experiments Systems: Evaluation metrics: SMT: Moses NMT M-NMT BLEU: the average of 1-4 grams bleu multiplied by a brevity penalty

Experiments

Experiments Consistent improvement on different amount of data

Experiments Recall more rare words Systems Recalled words in test SMT 3680/6666 NMT 3509/6666 M-NMT 3560/6666 *6666 is the number of words in reference

Experiments Cannot apply to the whole dataset, but performs very well for OOV name entities

Outline Introduction Attention-based NMT Memory-augmented NMT Experiments Conclusions Future work

Conclusions Biggest parallel dataset Best Chinese-Uyghur/Uyghur-Chinese translation performance M-NMT alleviates rare word and under-translation problems in NMT. M-NMT provides a way to address OOV problem. M-NMT brings stable improvement on different datasets, especially on small datasets, this improvement is significant and consistent.

Outline Introduction Attention-based NMT Memory-augmented NMT Experiments Conclusions Future work

Future work Better OOV treatment? Phrase-based memory? No need to do similar word replacement Implement to the whole dataset Phrase-based memory?

Thanks! Q&A

OOV treatment 0.46 0.3 0.8 h1