Kyoto University Participation to WAT 2016

Kyoto University Participation to WAT 2016
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi KyotoNMT KyotoEBMT Results Example-based Machine Translation Tree-to-Tree Uses dependency trees for both source and target side Training done on a NVIDIA Titan X (Maxwell) from 2 days for single-layer model on ASPEC Ja-Zh to 2 weeks for multi-layer model on ASPEC Ja-En Essentially an implementation of (Bahdanau et al., 2015) Implemented in Python with the Chainer library LSTM 私は学生です Source Embedding <620> <1000> I am a Student Attention Model maxout <500> softmax <30000> <3620> <2620> Target Embedding Encoding of input Encoder Current context Previous state new concatenation Previously generated word New word Decoder Input: ウイスキーはオオムギから製造される whisky is produced from barley Output: whisky is produced from barley the 水素は現在天然ガスや石油から製造される hydrogen natural gas and petroleum at present ウイスキーを調査した We investigated オオムギ Ja -> En BLEU AM-FM Pairwise JPO Adequacy EBMT 21.22 59.52 - NMT 1 24.71 56.27 47.0 (3/9) 3.89 (1/3) NMT 2 26.22 55.85 44.25 (4/9) # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 200k (JUMAN) 52k (BPE) - NMT 2 1 30k (JUMAN) 30k (words) x4 En -> Ja BLEU AM-FM Pairwise JPO Adequacy EBMT 31.03 74.75 - NMT 1 36.19 73.87 55.25 (1/10) 4.02 (1/4) # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 52k (BPE) 52k (BPE) - We used mostly the network size used in the original paper as shown in picture above Depending on experiments, we changed (see Result column for details): multi-layer LSTM larger source and target vocabulary size EBMT vs NMT EBMT: less fluent NMT: more under/over-translation issues Ja -> Zh BLEU AM-FM Pairwise JPO Adequacy EBMT 30.27 76.42 30.75 (3/5) - NMT 1 31.98 76.33 58.75 (1/5) 3.88 (1/3) Src 本フローセンサーの型式と基本構成，規格を図示, 紹介。 Ref Shown here are type and basic configuration and standards of this flow with some diagrams. EBMT This flow sensor type and the basic composition, standard is illustrated, and introduced. NMT This paper introduces the type, basic configuration, and standards of this flow sensor. The important details # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 30k (JUMAN) 30k (KyotoMorph) - During our experiments, we found that using these settings appropriately had a significant impact on final results: Regularization weight decay dropout early stopping random noise on previous word embedding Training algorithm ADAM Zh -> Ja BLEU AM-FM Pairwise JPO Adequacy EBMT 36.63 76.71 - NMT 1 46.04 78.59 63.75 (1/9) 3.94 (1/3) NMT 2 44.29 78.44 56.00 (2/9) Beam-search normalizing the loss by length Ensembling ensembling of several models or self- ensembling Segmentation Automatic segmentation via JUMAN or KyotoMorph Or subword units with BPE Conclusion and Future Work # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 30k (KyotoMorph) 30k (JUMAN) x2 NMT 2 200k (KyotoMorph) 50k (JUMAN) - Very good results with Neural Machine Translation especially for Zh -> Ja Long training times mean that we could not test every combination of setting for each language pair Some possible future improvements: Adding more linguistic aspects Adding newly proposed mechanisms (copy mechanism, etc.) Code available (GPL) Random noise on previous word embedding in the hope of reducing cascading errors at translation time we add noise to the target word embedding at training time works well, but maybe just a regularization effect KyotoEBMT: KyotoNMT:

Kyoto University Participation to WAT 2016

Similar presentations

Presentation on theme: "Kyoto University Participation to WAT 2016"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kyoto University Participation to WAT 2016

Similar presentations

Presentation on theme: "Kyoto University Participation to WAT 2016"— Presentation transcript:

Similar presentations

About project

Feedback