Japan Science and Technology Agency

Slides:



Advertisements
Similar presentations
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Advertisements

© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.
Language Knowledge Engineering Lab. Kyoto University NTCIR-10 PatentMT, Japan, Jun , 2013 Description of KYOTO EBMT System in PatentMT at NTCIR-10.
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.
Accurate Parallel Fragment Extraction from Quasi-Comparable Corpora using Alignment Model and Translation Lexicon Chenhui Chu, Toshiaki Nakazawa, Sadao.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
Addressing the Rare Word Problem in Neural Machine Translation
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Haitham Elmarakeby.  Speech recognition
English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Learning to Answer Questions from Image Using Convolutional Neural Network Lin Ma, Zhengdong Lu, and Hang Li Huawei Noah’s Ark Lab, Hong Kong
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
Cross-language Projection of Dependency Trees Based on Constrained Partial Parsing for Tree-to-Tree Machine Translation Yu Shen, Chenhui Chu, Fabien Cromieres.
A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock 
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Unsupervised Learning of Video Representations using LSTMs
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Learning Amin Sobhani.
SCTB: A Chinese Treebank in Scientific Domain
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
Efficient Image Classification on Vertically Decomposed Data
Adversarial Learning for Neural Dialogue Generation
Neural Machine Translation by Jointly Learning to Align and Translate
Attention Is All You Need
An Overview of Machine Translation
Matt Gormley Lecture 16 October 24, 2016
KantanNeural™ LQR Experiment
CSCI 5922 Neural Networks and Deep Learning Language Modeling
Joint Training for Pivot-based Neural Machine Translation
Suggestions for Class Projects
Are End-to-end Systems the Ultimate Solutions for NLP?
Background & Overview Proposed Model Experimental Results Future Work
Neural Lattice Search for Domain Adaptation in Machine Translation
Neural Machine Translation By Learning to Jointly Align and Translate
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Image Question Answering
Attention Is All You Need
Deep Learning based Machine Translation
Efficient Image Classification on Vertically Decomposed Data
Final Presentation: Neural Network Doc Summarization
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
The CoNLL-2014 Shared Task on Grammatical Error Correction
T H E P U B G P R O J E C T.
Neural Networks Geoff Hulten.
Memory-augmented Chinese-Uyghur Neural Machine Translation
Introduction to Natural Language Processing
Neural Speech Synthesis with Transformer Network
Machine Translation(MT)
Statistical vs. Neural Machine Translation: a Comparison of MTH and DeepL at Swiss Post’s Language service Lise Volkart – Pierrette Bouillon – Sabrina.
Domain Mixing for Chinese-English Neural Machine Translation
Unsupervised Pretraining for Semantic Parsing
An Empirical Comparison of Domain Adaptation Methods for
Attention.
Kyoto University Participation to WAT 2016
Deep Learning for the Soft Cutoff Problem
Idiap Research Institute University of Edinburgh
Attention for translation
-- Ray Mooney, Association for Computational Linguistics (ACL) 2014
Ask and Answer Questions
Neural Machine Translation using CNN
Neural Machine Translation
CVPR19.
Baseline Model CSV Files Pandas DataFrame Sentence Lists
Week 3 Presentation Ngoc Ta Aidean Sharghi.
Neural Machine Translation by Jointly Learning to Align and Translate
Mahdi Kalayeh David Hill
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

Japan Science and Technology Agency Kyoto University An Empirical Comparison of Simple Domain Adaptation Methods for Neural Machine Translation Chenhui Chu1, Raj Dabre2, Sadao Kurohashi2 chu@pa.jst.jp, raj@nlp.ist.i.kyoto-u.ac.jp, kuro@i.kyoto-u.ac.jp 1Japan Science and Technology Agency, 2Kyoto University 1. Attention-based Neural Machine Translation [Bahdanau+ 2015] 4. Domain Adaptation Methods for Comparison 5. Experimental Results 4.1. Fine tuning [Luong+ 2015; Sennrich+ 2016; Servan+ 2016; Freitag+ 2016] High Quality In-domain Corpus Setting Results (BLEU-4) 新たな 翻訳 手法 を 提案 する embedding forward LSTM backward LSTM We propose a novel translation method attention decoder LSTM softmax input output <EOS> Source-Target (out-of-domain) NMT Model (in-domain) Initialize System NTCIR-CE IWSLT-CE IWSLT-CE SMT - 14.31 IWSLT-CE NMT 7.87 NTCIR-CE SMT 29.54 4.33 NTCIR-CE NMT 37.11 2.60 Fine tuning 17.37 16.41 Multi domain 36.40 16.34 Multi domain w/o tags 37.32 14.97 Multi domain + Fine tuning 14.47 15.82 Mixed fine tuning 37.01 18.01 Mixed fine tuning w/o tags 39.67 17.43 + Fine tuning 32.03 17.11 Fine tuning tends to overfit quickly due to the small size of the in-domain data 4.2. Multi domain [Kobus+ 2016] Multi domain w/o tags: if do not append tags NMT Model (mixed) merge Append in-domain tag (<2in-domain>) (in-domain) Initialize Append out-of-domain tag (<2out-of-domain>) Oversample the smaller in-domain corpus Source-Target (out-of-domain) NMT requires large-scale data for learning large amount of parameters! Low Quality In-domain Corpus Setting Results (BLEU-4) 2. The Low Resource Domain Problem BLEU-4 scores for translation performance System ASPEC-CJ WIKI-CJ WIKI-CJ SMT - 36.83 WIKI-CJ NMT 18.29 ASPEC-CJ SMT 36.39 17.43 ASPEC-CJ NMT 42.92 20.01 Fine tuning 22.10 37.66 Multi domain 42.52 35.79 Multi domain w/o tags 40.78 33.74 Multi domain + Fine tuning 22.78 34.61 Mixed fine tuning 42.56 37.57 Mixed fine tuning w/o tags 41.86 37.23 + Fine tuning 31.63 37.77 System NTCIR-CE ASPEC-CJ IWSLT-CE WIKI-CJ SMT 29.54 36.39 14.31 36.83 NMT 37.11 42.92 7.87 18.29 Multi domain + Fine tuning [Kobus+ 2016] studies multi domain translation, but it is not for domain adaptation in particular Resource rich: NTCIR-CE patent (train: 1M), ASPEC-CJ scientific (train: 672k) Resource Poor: IWSLT-CE spoken (train: 209k), WIKI-CJ automatically extracted Wikipedia (train: 136k) 4.3. Proposed: mixed fine tuning NMT Model (mixed) merge Append in-domain tag (<2in-domain>) (out-of-domain) Initialize (in-domain) Append out-of-domain tag (<2out-of-domain>) Oversample the smaller in-domain corpus Source-Target Mixed fine tuning w/o tags: if do not append tags 3. Overview of This Study Prevent overfitting We conduct an empirical study on different domain adaptation methods for NMT Methods Features Fine tuning [Luong+ 2015] etc. Multi domain [Kobus+ 2016] Proposed: mixed fine tuning Performance Good Best Speed Fast Slowest Slower Overfitting Yes No # Red numbers indicate the best system and all systems that were not significantly different from the best system Mixed fine tuning + Fine tuning NLP 2017, Tsukuba, Mar. 15, 2017