Deep Learning based Machine Translation

Slides:



Advertisements
Similar presentations
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Advertisements

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.
Haitham Elmarakeby.  Speech recognition
Deep Visual Analogy-Making
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.
Learning to Answer Questions from Image Using Convolutional Neural Network Lin Ma, Zhengdong Lu, and Hang Li Huawei Noah’s Ark Lab, Hong Kong
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
Attention Model in NLP Jichuan ZENG.
R-NET: Machine Reading Comprehension With Self-Matching Networks
Olivier Siohan David Rybach
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Neural Machine Translation
Statistical Machine Translation Part II: Word Alignments and EM
SUNY Korea BioData Mining Lab - Journal Review
End-To-End Memory Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Online Multiscale Dynamic Topic Models
Deep Learning for Program Repair
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
SeaRNN: training RNNs with global-local losses
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Deep Predictive Model for Autonomous Driving
Adversarial Learning for Neural Dialogue Generation
Neural Machine Translation by Jointly Learning to Align and Translate
Attention Is All You Need
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
Translation of Unknown Words in Low Resource Languages
Joint Training for Pivot-based Neural Machine Translation
Statistical NLP: Lecture 13
Deep learning and applications to Natural language processing
--Mengxue Zhang, Qingyang Li
Attention Is All You Need
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
convolutional neural networkS
Distributed Representation of Words, Sentences and Paragraphs
Grid Long Short-Term Memory
convolutional neural networkS
Overview of Machine Learning
Introduction to Text Generation
Seminar Topics and Projects
Memory-augmented Chinese-Uyghur Neural Machine Translation
Introduction to Natural Language Processing
Neural Speech Synthesis with Transformer Network
Statistical Machine Translation Papers from COLING 2004
Socialized Word Embeddings
Machine Translation(MT)
Report by: 陆纪圆.
实习生汇报 ——北邮 张安迪.
Advances in Deep Audio and Audio-Visual Processing
Word embeddings (continued)
Attention for translation
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Neural Machine Translation using CNN
Neural Machine Translation
Peng Cui Tsinghua University
CSC 578 Neural Networks and Deep Learning
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

Deep Learning based Machine Translation Zhiwei Yu

Content Introduction Basic Structure Current Problem Advancing NMT Encoder-Decoder Attention Current Problem Advancing NMT Translation Quality Range of Application Efficiency Future Work

Introduction Commercial Use Google translates over 100 billion words a day Facebook has just rolled out new homegrown MT eBay uses MT to enable cross-border trade Academic Influence ACL17 8.6% –long 14/195; short 12/107 EMNLP17 8.0% – 26/322 NAACL18 6.3% – long 13/207; short 8/125 ACL18 5.7% – long 12/258; short 10/126 Other papers appear in IJCAI, AAAI, NIPS, ICLR or TACL,TASLP etc.

Introduction (Junczys-Dowmunt et al,2016)

Content Introduction Basic Structure Current Problem Advancing NMT Encoder-Decoder Attention Current Problem Advancing NMT Translation Quality Range of Application Efficiency Future Work

Encoder-Decoder Model

Attention Mechanism (Bahdanau et al., 2015) (Luong et al., 2015)

Attention Mechanism

Attention Mechanism

Attention Mechanism

Attention Mechanism

Attention Mechanism

Content Introduction Basic Structure Current Problem Advancing NMT Encoder-Decoder Attention Current Problem Advancing NMT Translation Quality Range of Application Efficiency Future Work

Current Problem Limited Vocabulary e.g.: UNK Source language translation coverage issues e.g.: over-translation under-translation Translation is not faithful e.g.: low-frequency word

Content Introduction Basic Structure Current Problem Advancing NMT Encoder-Decoder Attention Current Problem Advancing NMT Translation Quality Range of Application Efficiency Future Work

Advancing NMT Efficiency Translation Quality Range of Application Add Linguistic Knowledge SMT+NMT Model Structure ... Translation Quality Unsupervised Learning Multilingual Translation Multimodal Translation .... Range of Application Parallel processing Decoding Efficiency ... Efficiency Efficiency

Content Introduction Basic Structure Current Problem Advancing NMT Encoder-Decoder Attention Current Problem Advancing NMT Translation Quality Range of Application Efficiency Future Work

Add Linguistic Knowledge Incorporate syntax information into the encoder or decoder to enhance the structure knowledge syntactic trees can be used to model the grammatical validity of translation partial syntactic structures can be used as additional context to facilitate future target word prediction (Wu et al., 2017)

Add Linguistic Knowledge • ACL17 – Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder – Modeling Source Syntax for Neural Machine Translation – Sequence-to-Dependency Neural Machine Translation – Chunk-based Decoder for Neural Machine Translation – Chunk-Based Bi-Scale Decoder for Neural Machine Translation(short paper) – Learning to Parse and Translate Improves Neural Machine ranslation(short paper) – Towards String-To-Tree Neural Machine Translation(short paper) • EMNLP17 – Graph Convolutional Encoders for Syntax-aware Neural Machine Translation – Neural Machine Translation with Source-Side Latent Graph Parsing – Neural Machine Translation with Source Dependency Representation (short paper) • ACL18 – Forest-Based Neural Machine Translation – Practical Target Syntax for Neural Machine Translation (short paper)

SMT+NMT Including attempts on adding SMT methods, model and results e.g.:Structural Bias,Position Bias,Fertility,Markov Condition,Bilingual Symmetry BLEU:2.1 ↑ (Zhang et al., 2017)

SMT+NMT • ACL17 – Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation – Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization – Neural system combination for machine translation • EMNLP17 – Neural Machine Translation Leveraging Phrasebased Models in a Hybrid Search – Translating Phrases in Neural Machine Translation • ICLR18 – Towards Neural Phrase-based Machine Translation

Model Structure Coverage over-translation, under-translation Context Gate Faithful translation External Memory Neural Turing Machine Memory Network Long-dependency & Memory Space Character/Subword Level NMT OOV

Model Structure • ACL16 – Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models. –Pointing the Unknown Words –Modeling Coverage for Neural Machine Translation. • EMNLP16 –Sequence-level knowledge distillation •ACL18 – Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings – Sparse and Constrained Attention for Neural Machine Translation

Content Introduction Basic Structure Current Problem Advancing NMT Encoder-Decoder Attention Current Problem Advancing NMT Translation Quality Range of Application Efficiency Future Work

Unsupervised Learning Machine Translation without enough parallel corpus A good initial model+Denoising Autoencoder+Back-translation &Iteration (Yang et al., 2018)

Unsupervised Learning • ACL17 – Data Augmentation for Low-Resource Neural Machine Translation (short paper) • ICLR18 – Word Translation Without Parallel Data – Unsupervised Machine Translation Using Monolingual Corpora Only – Unsupervised Neural Machine Translation • ACL18 – Unsupervised Neural Machine Translation with Weight Sharing – Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation (short paper)

Multilingual Translation ACL15 Multi-task learning for multiple language translation (en-fr/du/sp) NAACL16 Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism Multi-source neural translation

Multimodal Translation Use the information contained in the image to improve the translation quality

Multimodal Translation • ACL17 – Doubly-Attentive Decoder for Multi-modal Neural Machine Translation • EMNLP17 – Incorporating Global Visual Features into Attentionbased Neural Machine Translation – An empirical study of the effectiveness of images on Multi-modal Neural Machine Translation • ACL18 – Learning Translations via Images: A Large Multilingual Dataset and Comprehensive Study

Content Introduction Basic Structure Current Problem Advancing NMT Encoder-Decoder Attention Current Problem Advancing NMT Translation Quality Range of Application Efficiency Future Work

Parallel processing Speed up the training process (Gehring et al., 2017)

Parallel processing • ACL17 – A Convolutional Encoder Model for Neural Machine Translation (Convolutional Sequence to Sequence Learning) • NIPS17 – Attention Is All You Need. • ICLR2018 – Non-Autoregressive Neural Machine Translation

Decoding Efficiency Improve the decoding efficiency by reducing the vocab, adopting distillation model and training the decoder independently • ACL17 – Neural Machine Translation via Binary Code Prediction – Speeding up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary (short paper) • EMNLP17 – Trainable Greedy Decoding for Neural Machine Translation – Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU(short)

Content Introduction Basic Structure Current Problem Advancing NMT Encoder-Decoder Attention Current Problem Advancing NMT Translation Quality Range of Application Efficiency Future Work

Future Work • Interpretability Use linguistic knowledge to explain the performance of the models. • External Knowledge Pun,Metaphor,Metonymy,Allegory,Paradox etc. e.g.:Make the hay while the sun shines. • Larger-context NMT Paragraphs, articles, books, etc. need: Effective attention mechanism for long sequences Tracking states over many sentences (Dialogue systems) •Unsupervised Learning

Future Work •Maximum Likelihood Estimation+ Disadvantages Exposure bias Weak correlation with a true reward Potential Solution Maximize the sequence-wise global loss Incorporate inference into training Stochastic inference Policy gradient (Ranzato et al., ICLR2016; Bahdanau et al., arXiv2016) Minimum risk training (Shen et al., ACL2016) Deterministic inference Learning to search (Wiseman & Rush, arXiv2016)

Thank you