Japan Science and Technology Agency

Slides:

Advertisements

Similar presentations

Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.

Advertisements

© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.

Language Knowledge Engineering Lab. Kyoto University NTCIR-10 PatentMT, Japan, Jun , 2013 Description of KYOTO EBMT System in PatentMT at NTCIR-10.

INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.

Accurate Parallel Fragment Extraction from Quasi-Comparable Corpora using Alignment Model and Translation Lexicon Chenhui Chu, Toshiaki Nakazawa, Sadao.

Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.

Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.

Addressing the Rare Word Problem in Neural Machine Translation

School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.

Haitham Elmarakeby.  Speech recognition

English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Learning to Answer Questions from Image Using Convolutional Neural Network Lin Ma, Zhengdong Lu, and Hang Li Huawei Noah’s Ark Lab, Hong Kong

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.

Cross-language Projection of Dependency Trees Based on Constrained Partial Parsing for Tree-to-Tree Machine Translation Yu Shen, Chenhui Chu, Fabien Cromieres.

A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock

Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi

Unsupervised Learning of Video Representations using LSTMs

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Deep Learning Amin Sobhani.

SCTB: A Chinese Treebank in Scientific Domain

Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel

Efficient Image Classification on Vertically Decomposed Data

Adversarial Learning for Neural Dialogue Generation

Neural Machine Translation by Jointly Learning to Align and Translate

Attention Is All You Need

An Overview of Machine Translation

Matt Gormley Lecture 16 October 24, 2016

KantanNeural™ LQR Experiment

CSCI 5922 Neural Networks and Deep Learning Language Modeling

Joint Training for Pivot-based Neural Machine Translation

Suggestions for Class Projects

Are End-to-end Systems the Ultimate Solutions for NLP?

Background & Overview Proposed Model Experimental Results Future Work

Neural Lattice Search for Domain Adaptation in Machine Translation

Neural Machine Translation By Learning to Jointly Align and Translate

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Image Question Answering

Attention Is All You Need

Deep Learning based Machine Translation

Efficient Image Classification on Vertically Decomposed Data

Final Presentation: Neural Network Doc Summarization

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

The CoNLL-2014 Shared Task on Grammatical Error Correction

T H E P U B G P R O J E C T.

Neural Networks Geoff Hulten.

Memory-augmented Chinese-Uyghur Neural Machine Translation

Introduction to Natural Language Processing

Neural Speech Synthesis with Transformer Network

Machine Translation(MT)

Statistical vs. Neural Machine Translation: a Comparison of MTH and DeepL at Swiss Post’s Language service Lise Volkart – Pierrette Bouillon – Sabrina.

Domain Mixing for Chinese-English Neural Machine Translation

Unsupervised Pretraining for Semantic Parsing

An Empirical Comparison of Domain Adaptation Methods for

Kyoto University Participation to WAT 2016

Deep Learning for the Soft Cutoff Problem

Idiap Research Institute University of Edinburgh

Attention for translation

-- Ray Mooney, Association for Computational Linguistics (ACL) 2014

Ask and Answer Questions

Neural Machine Translation using CNN

Neural Machine Translation

Baseline Model CSV Files Pandas DataFrame Sentence Lists

Week 3 Presentation Ngoc Ta Aidean Sharghi.

Neural Machine Translation by Jointly Learning to Align and Translate

Mahdi Kalayeh David Hill

Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen

Presentation transcript:

Japan Science and Technology Agency Kyoto University An Empirical Comparison of Simple Domain Adaptation Methods for Neural Machine Translation Chenhui Chu1, Raj Dabre2, Sadao Kurohashi2 chu@pa.jst.jp, raj@nlp.ist.i.kyoto-u.ac.jp, kuro@i.kyoto-u.ac.jp 1Japan Science and Technology Agency, 2Kyoto University 1. Attention-based Neural Machine Translation [Bahdanau+ 2015] 4. Domain Adaptation Methods for Comparison 5. Experimental Results 4.1. Fine tuning [Luong+ 2015; Sennrich+ 2016; Servan+ 2016; Freitag+ 2016] High Quality In-domain Corpus Setting Results (BLEU-4) 新たな翻訳手法を提案する embedding forward LSTM backward LSTM We propose a novel translation method attention decoder LSTM softmax input output <EOS> Source-Target (out-of-domain) NMT Model (in-domain) Initialize System NTCIR-CE IWSLT-CE IWSLT-CE SMT - 14.31 IWSLT-CE NMT 7.87 NTCIR-CE SMT 29.54 4.33 NTCIR-CE NMT 37.11 2.60 Fine tuning 17.37 16.41 Multi domain 36.40 16.34 Multi domain w/o tags 37.32 14.97 Multi domain + Fine tuning 14.47 15.82 Mixed fine tuning 37.01 18.01 Mixed fine tuning w/o tags 39.67 17.43 + Fine tuning 32.03 17.11 Fine tuning tends to overfit quickly due to the small size of the in-domain data 4.2. Multi domain [Kobus+ 2016] Multi domain w/o tags: if do not append tags NMT Model (mixed) merge Append in-domain tag (<2in-domain>) (in-domain) Initialize Append out-of-domain tag (<2out-of-domain>) Oversample the smaller in-domain corpus Source-Target (out-of-domain) NMT requires large-scale data for learning large amount of parameters! Low Quality In-domain Corpus Setting Results (BLEU-4) 2. The Low Resource Domain Problem BLEU-4 scores for translation performance System ASPEC-CJ WIKI-CJ WIKI-CJ SMT - 36.83 WIKI-CJ NMT 18.29 ASPEC-CJ SMT 36.39 17.43 ASPEC-CJ NMT 42.92 20.01 Fine tuning 22.10 37.66 Multi domain 42.52 35.79 Multi domain w/o tags 40.78 33.74 Multi domain + Fine tuning 22.78 34.61 Mixed fine tuning 42.56 37.57 Mixed fine tuning w/o tags 41.86 37.23 + Fine tuning 31.63 37.77 System NTCIR-CE ASPEC-CJ IWSLT-CE WIKI-CJ SMT 29.54 36.39 14.31 36.83 NMT 37.11 42.92 7.87 18.29 Multi domain + Fine tuning [Kobus+ 2016] studies multi domain translation, but it is not for domain adaptation in particular Resource rich: NTCIR-CE patent (train: 1M), ASPEC-CJ scientific (train: 672k) Resource Poor: IWSLT-CE spoken (train: 209k), WIKI-CJ automatically extracted Wikipedia (train: 136k) 4.3. Proposed: mixed fine tuning NMT Model (mixed) merge Append in-domain tag (<2in-domain>) (out-of-domain) Initialize (in-domain) Append out-of-domain tag (<2out-of-domain>) Oversample the smaller in-domain corpus Source-Target Mixed fine tuning w/o tags: if do not append tags 3. Overview of This Study Prevent overfitting We conduct an empirical study on different domain adaptation methods for NMT Methods Features Fine tuning [Luong+ 2015] etc. Multi domain [Kobus+ 2016] Proposed: mixed fine tuning Performance Good Best Speed Fast Slowest Slower Overfitting Yes No # Red numbers indicate the best system and all systems that were not significantly different from the best system Mixed fine tuning + Fine tuning NLP 2017, Tsukuba, Mar. 15, 2017