Deep Learning based Machine Translation

Deep Learning based Machine Translation
Zhiwei Yu

Content Introduction Basic Structure Current Problem Advancing NMT
Encoder-Decoder Attention Current Problem Advancing NMT Translation Quality Range of Application Efficiency Future Work

Introduction Commercial Use
Google translates over 100 billion words a day Facebook has just rolled out new homegrown MT eBay uses MT to enable cross-border trade Academic Influence ACL % –long 14/195; short 12/107 EMNLP % – 26/322 NAACL % – long 13/207; short 8/125 ACL % – long 12/258; short 10/126 Other papers appear in IJCAI, AAAI, NIPS, ICLR or TACL,TASLP etc.

Introduction (Junczys-Dowmunt et al,2016)

Encoder-Decoder Model

Attention Mechanism (Bahdanau et al., 2015) (Luong et al., 2015)

Attention Mechanism

Current Problem Limited Vocabulary e.g.: UNK
Source language translation coverage issues e.g.: over-translation under-translation Translation is not faithful e.g.: low-frequency word

Advancing NMT Efficiency Translation Quality Range of Application
Add Linguistic Knowledge SMT+NMT Model Structure ... Translation Quality Unsupervised Learning Multilingual Translation Multimodal Translation .... Range of Application Parallel processing Decoding Efficiency ... Efficiency Efficiency

Add Linguistic Knowledge
Incorporate syntax information into the encoder or decoder to enhance the structure knowledge syntactic trees can be used to model the grammatical validity of translation partial syntactic structures can be used as additional context to facilitate future target word prediction (Wu et al., 2017)

Add Linguistic Knowledge
• ACL17 – Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder – Modeling Source Syntax for Neural Machine Translation – Sequence-to-Dependency Neural Machine Translation – Chunk-based Decoder for Neural Machine Translation – Chunk-Based Bi-Scale Decoder for Neural Machine Translation(short paper) – Learning to Parse and Translate Improves Neural Machine ranslation(short paper) – Towards String-To-Tree Neural Machine Translation(short paper) • EMNLP17 – Graph Convolutional Encoders for Syntax-aware Neural Machine Translation – Neural Machine Translation with Source-Side Latent Graph Parsing – Neural Machine Translation with Source Dependency Representation (short paper) • ACL18 – Forest-Based Neural Machine Translation – Practical Target Syntax for Neural Machine Translation (short paper)

SMT+NMT Including attempts on adding SMT methods, model and results
e.g.:Structural Bias,Position Bias,Fertility,Markov Condition,Bilingual Symmetry BLEU:2.1 ↑ (Zhang et al., 2017)

SMT+NMT • ACL17 – Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation – Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization – Neural system combination for machine translation • EMNLP17 – Neural Machine Translation Leveraging Phrasebased Models in a Hybrid Search – Translating Phrases in Neural Machine Translation • ICLR18 – Towards Neural Phrase-based Machine Translation

Model Structure Coverage over-translation, under-translation
Context Gate Faithful translation External Memory Neural Turing Machine Memory Network Long-dependency & Memory Space Character/Subword Level NMT OOV

Model Structure • ACL16 – Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models. –Pointing the Unknown Words –Modeling Coverage for Neural Machine Translation. • EMNLP16 –Sequence-level knowledge distillation •ACL18 – Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings – Sparse and Constrained Attention for Neural Machine Translation

Unsupervised Learning
Machine Translation without enough parallel corpus A good initial model+Denoising Autoencoder+Back-translation &Iteration (Yang et al., 2018)

Unsupervised Learning
• ACL17 – Data Augmentation for Low-Resource Neural Machine Translation (short paper) • ICLR18 – Word Translation Without Parallel Data – Unsupervised Machine Translation Using Monolingual Corpora Only – Unsupervised Neural Machine Translation • ACL18 – Unsupervised Neural Machine Translation with Weight Sharing – Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation (short paper)

Multilingual Translation
ACL15 Multi-task learning for multiple language translation (en-fr/du/sp) NAACL16 Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism Multi-source neural translation

Multimodal Translation
Use the information contained in the image to improve the translation quality

Multimodal Translation
• ACL17 – Doubly-Attentive Decoder for Multi-modal Neural Machine Translation • EMNLP17 – Incorporating Global Visual Features into Attentionbased Neural Machine Translation – An empirical study of the effectiveness of images on Multi-modal Neural Machine Translation • ACL18 – Learning Translations via Images: A Large Multilingual Dataset and Comprehensive Study

Parallel processing Speed up the training process
(Gehring et al., 2017)

Parallel processing • ACL17
– A Convolutional Encoder Model for Neural Machine Translation (Convolutional Sequence to Sequence Learning) • NIPS17 – Attention Is All You Need. • ICLR2018 – Non-Autoregressive Neural Machine Translation

Decoding Efficiency Improve the decoding efficiency by reducing
the vocab, adopting distillation model and training the decoder independently • ACL17 – Neural Machine Translation via Binary Code Prediction – Speeding up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary (short paper) • EMNLP17 – Trainable Greedy Decoding for Neural Machine Translation – Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU(short)

Future Work • Interpretability
Use linguistic knowledge to explain the performance of the models. • External Knowledge Pun,Metaphor,Metonymy,Allegory,Paradox etc. e.g.:Make the hay while the sun shines. • Larger-context NMT Paragraphs, articles, books, etc. need: Effective attention mechanism for long sequences Tracking states over many sentences (Dialogue systems) •Unsupervised Learning

Future Work •Maximum Likelihood Estimation+ Disadvantages
Exposure bias Weak correlation with a true reward Potential Solution Maximize the sequence-wise global loss Incorporate inference into training Stochastic inference Policy gradient (Ranzato et al., ICLR2016; Bahdanau et al., arXiv2016) Minimum risk training (Shen et al., ACL2016) Deterministic inference Learning to search (Wiseman & Rush, arXiv2016)

Thank you

Deep Learning based Machine Translation

Similar presentations

Presentation on theme: "Deep Learning based Machine Translation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deep Learning based Machine Translation

Similar presentations

Presentation on theme: "Deep Learning based Machine Translation"— Presentation transcript:

Similar presentations

About project

Feedback