Kyoto University Participation to WAT 2016

Slides:



Advertisements
Similar presentations
A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;
Advertisements

Automatic Image Collection of Objects with Similar Function by Learning Human Grasping Forms Shinya Morioka, Tadashi Matsuo, Yasuhiro Hiramoto, Nobutaka.
The Moment of AHA! Dec. 25, 2004 hideki
青森大学 5 号館の 模型の設計と製作 ソ 小山 内 拓真
地球温暖化と 天候の関係性 ~温暖化は天候のせいなのではないのか~. 目的課題 地球温暖化現象 ただの気象条件によるものではないのか? 地球温暖化現象に天候は関係しているの か?
線形符号(10章).
3.正方行列(単位行列、逆行列、対称行列、交代行列)
トランスフォームロボット の設計と製作 矢萩研究室 ソ 神貴浩 創作ゼミナールⅠ 計画発表.
伝わるスライド 中野研究室 M2 石川 雅 信. どのようなスライドを作れば良 いか 伝えたいこと.
移動エージェントプログラムの 動作表示のためのアニメーション言 語 名古屋大学情報工学コース 坂部研究室 高岸 健.
8.任意のデータ構造 (グラフの表現とアルゴリズム)
実験5 規則波 C0XXXX 石黒 ○○ C0XXXX 杉浦 ○○ C0XXXX 大杉 ○○ C0XXXX 高柳 ○○ C0XXXX 岡田 ○○ C0XXXX 藤江 ○○ C0XXXX 尾形 ○○ C0XXXX 足立 ○○
Exercise IV-A p.164. What did they say? 何と言ってましたか。 1.I’m busy this month. 2.I’m busy next month, too. 3.I’m going shopping tomorrow. 4.I live in Kyoto.
4 形から影へ From Shapes to Shadows 形式の分解 Dissolving Forms 形式の分解 Dissolving Forms.
Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers.
Exploiting Shared Chinese Characters in Chinese Word Segmentation Optimization for Chinese-Japanese Machine Translation Chenhui Chu, Toshiaki Nakazawa,
One World Your Translation Companion Kenny Risk Alex Cheng Angapparaj Kalimuthu Daniel Mejia.
Accurate Parallel Fragment Extraction from Quasi-Comparable Corpora using Alignment Model and Translation Lexicon Chenhui Chu, Toshiaki Nakazawa, Sadao.
Japanese Dependency Analysis using Cascaded Chunking Taku Kudo 工藤 拓 Yuji Matsumoto 松本 裕治 Nara Institute Science and Technology, JAPAN.
たくさんの人がいっしょに乗れる乗り物を 「公共交通」といいます バスや電車 と 自動車 の よいところ と よくない ところ よいところ と よくない ところ を考えてみよう!
Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
2006/12/081 Large Scale Crawling the Web for Parallel Texts Chikayama Taura lab. M1 Dai Saito.
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
6Data structure design (データ構造の設計) Data structure is one of the most important aspects of a program: Program = Data Structure + Algorithm.
Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.
Addressing the Rare Word Problem in Neural Machine Translation
Haitham Elmarakeby.  Speech recognition
Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao Kurohashi (Kyoto University)
今日の内容 高階関数  関数を値として扱う 関数を引数にとる 関数を返す関数 プログラミングの例題  クイックソート.
Assignments: -Writing practice prompt due THUR. -Quiz signed.
A Simulator for the LWA Masaya Kuniyoshi (UNM). Outline 1.Station Beam Model 2.Asymmetry Station Beam 3.Station Beam Error 4.Summary.
Japanese-Chinese Phrase Alignment Exploiting Shared Chinese Characters Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi Graduate School of Informatics,
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
Cross-language Projection of Dependency Trees Based on Constrained Partial Parsing for Tree-to-Tree Machine Translation Yu Shen, Chenhui Chu, Fabien Cromieres.
Attention Model in NLP Jichuan ZENG.
Olivier Siohan David Rybach
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Convolutional Sequence to Sequence Learning
RNNs: An example applied to the prediction task
End-To-End Memory Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Japan Science and Technology Agency
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
Adversarial Learning for Neural Dialogue Generation
Japanese-Chinese Phrase Alignment Using Common Chinese Characters Information Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi Graduate School of Informatics,
Neural Machine Translation by Jointly Learning to Align and Translate
Joint Training for Pivot-based Neural Machine Translation
Intro to NLP and Deep Learning
Deep Learning based Machine Translation
RNNs: Going Beyond the SRN in Language Prediction
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Code Completion with Neural Attention and Pointer Networks
Memory-augmented Chinese-Uyghur Neural Machine Translation
Neural Speech Synthesis with Transformer Network
Machine Translation(MT)
Statistical vs. Neural Machine Translation: a Comparison of MTH and DeepL at Swiss Post’s Language service Lise Volkart – Pierrette Bouillon – Sabrina.
An Empirical Comparison of Domain Adaptation Methods for
Report by: 陆纪圆.
Sun Yat-sen University
Attention.
Attention for translation
Neural Joint Model for Transition-based Chinese Syntactic Analysis
Neural Machine Translation using CNN
Neural Machine Translation
Neural Machine Translation by Jointly Learning to Align and Translate
Visual Grounding.
Faithful Multimodal Explanation for VQA
Presentation transcript:

Kyoto University Participation to WAT 2016 Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi fabien@pa.jst.jp chu@pa.jst.jp nakazawa@pa.jst.jp kuro@i.Kyoto-u.ac.jp KyotoNMT KyotoEBMT Results Example-based Machine Translation Tree-to-Tree Uses dependency trees for both source and target side Training done on a NVIDIA Titan X (Maxwell) from 2 days for single-layer model on ASPEC Ja-Zh to 2 weeks for multi-layer model on ASPEC Ja-En Essentially an implementation of (Bahdanau et al., 2015) Implemented in Python with the Chainer library LSTM 私 は 学生 です Source Embedding <620> <1000> I am a Student Attention Model maxout <500> softmax <30000> <3620> <2620> Target Embedding Encoding of input Encoder Current context Previous state new concatenation Previously generated word New word Decoder Input: ウイスキーはオオムギから製造される whisky is produced from barley Output: whisky is produced from barley the 水素 は 現在 天然ガス や 石油 から 製造 さ れる hydrogen natural gas and petroleum at present ウイスキー を 調査 した We investigated オオムギ Ja -> En BLEU AM-FM Pairwise JPO Adequacy EBMT 21.22 59.52 - NMT 1 24.71 56.27 47.0 (3/9) 3.89 (1/3) NMT 2 26.22 55.85 44.25 (4/9) # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 200k (JUMAN) 52k (BPE) - NMT 2 1 30k (JUMAN) 30k (words) x4 En -> Ja BLEU AM-FM Pairwise JPO Adequacy EBMT 31.03 74.75 - NMT 1 36.19 73.87 55.25 (1/10) 4.02 (1/4) # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 52k (BPE) 52k (BPE) - We used mostly the network size used in the original paper as shown in picture above Depending on experiments, we changed (see Result column for details): multi-layer LSTM larger source and target vocabulary size EBMT vs NMT EBMT: less fluent NMT: more under/over-translation issues Ja -> Zh BLEU AM-FM Pairwise JPO Adequacy EBMT 30.27 76.42 30.75 (3/5) - NMT 1 31.98 76.33 58.75 (1/5) 3.88 (1/3) Src 本フローセンサーの型式と基本構成,規格を図示, 紹介。 Ref Shown here are type and basic configuration and standards of this flow with some diagrams. EBMT This flow sensor type and the basic composition, standard is illustrated, and introduced. NMT This paper introduces the type, basic configuration, and standards of this flow sensor. The important details # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 30k (JUMAN) 30k (KyotoMorph) - During our experiments, we found that using these settings appropriately had a significant impact on final results: Regularization weight decay dropout early stopping random noise on previous word embedding Training algorithm ADAM Zh -> Ja BLEU AM-FM Pairwise JPO Adequacy EBMT 36.63 76.71 - NMT 1 46.04 78.59 63.75 (1/9) 3.94 (1/3) NMT 2 44.29 78.44 56.00 (2/9) Beam-search normalizing the loss by length Ensembling ensembling of several models or self- ensembling Segmentation Automatic segmentation via JUMAN or KyotoMorph Or subword units with BPE Conclusion and Future Work # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 30k (KyotoMorph) 30k (JUMAN) x2 NMT 2 200k (KyotoMorph) 50k (JUMAN) - Very good results with Neural Machine Translation especially for Zh -> Ja Long training times mean that we could not test every combination of setting for each language pair Some possible future improvements: Adding more linguistic aspects Adding newly proposed mechanisms (copy mechanism, etc.) Code available (GPL) Random noise on previous word embedding in the hope of reducing cascading errors at translation time we add noise to the target word embedding at training time works well, but maybe just a regularization effect KyotoEBMT: http://lotus.kuee.kyoto-u.ac.jp/~john/kyotoebmt.html KyotoNMT: https://github.com/fabiencro/knmt