Neural Machine Translation using CNN

Slides:

Advertisements

Similar presentations

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Advertisements

Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 21, Week 101 More on DEEP ANNs –Convolution –Max Pooling –Drop Out Final ANN Wrapup FYI:

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.

Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.

Fill-in-The-Blank Using Sum Product Network

Welcome deep loria !.

Convolutional Sequence to Sequence Learning

Unsupervised Learning of Video Representations using LSTMs

Gist of Achieving Human Parity in Conversational Speech Recognition

SUNY Korea BioData Mining Lab - Journal Review

RNNs: An example applied to the prediction task

Faster R-CNN – Concepts

End-To-End Memory Networks

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

The Relationship between Deep Learning and Brain Function

Deep Learning Amin Sobhani.

Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel

Recurrent Neural Networks for Natural Language Processing

Neural Machine Translation by Jointly Learning to Align and Translate

Attention Is All You Need

Show and Tell: A Neural Image Caption Generator (CVPR 2015)

Matt Gormley Lecture 16 October 24, 2016

Spring Courses CSCI 5922 – Probabilistic Models (Mozer) CSCI Mind Reading Machines (Sidney D’Mello) CSCI 7000 – Human Centered Machine Learning.

Attention Is All You Need

Intelligent Information System Lab

Intro to NLP and Deep Learning

Different Units Ramakrishna Vedantam.

Convolutional Networks

Deep Belief Networks Psychology 209 February 22, 2013.

Neural Machine Translation By Learning to Jointly Align and Translate

CS6890 Deep Learning Weizhen Cai

Random walk initialization for training very deep feedforward networks

Hybrid computing using a neural network with dynamic external memory

Attention Is All You Need

Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang

By: Kevin Yu Ph.D. in Computer Engineering

RNNs: Going Beyond the SRN in Language Prediction

Attention-based Caption Description Mun Jonghwan.

Grid Long Short-Term Memory

RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect

Paraphrase Generation Using Deep Learning

Recurrent Neural Networks

cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

Final Presentation: Neural Network Doc Summarization

Understanding LSTM Networks

The Big Health Data–Intelligent Machine Paradox

Neural Networks Geoff Hulten.

Lecture: Deep Convolutional Neural Networks

Lip movement Synthesis from Text

Forward and Backward Max Pooling

Machine Translation(MT)

RNNs: Going Beyond the SRN in Language Prediction

Graph Neural Networks Amog Kamsetty January 30, 2019.

Problems with CNNs and recent innovations 2/13/19

实习生汇报 ——北邮张安迪.

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Attention for translation

-- Ray Mooney, Association for Computational Linguistics (ACL) 2014

Presented by: Anurag Paul

Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.

Neural Machine Translation

Presented By: Harshul Gupta

End-to-End Facial Alignment and Recognition

Bidirectional LSTM-CRF Models for Sequence Tagging

LHC beam mode classification

Neural Machine Translation by Jointly Learning to Align and Translate

Presentation transcript:

Neural Machine Translation using CNN - Shashank Rajput

Introduction Neural machine translation has recently caught worldwide attention with Google claiming to have achieved almost human level translation accuracy [11]. A team from Facebook claim to have surpassed Google’s accuracy using CNNs in machine translation[1]. The encoder-decoder based Neural Machine Translation architecture comprises of two units: Encoder : Processes the source sentence and stores the understanding in some vector(s). Decoder : Uses the vector(s) generated by the encoder to output the translated sentence word by word (one word at a time).

CNN based architecture Works similar to CNN on images - sentence treated as 1D image. Order of magnitude faster than RNN for this task because of parallelization and have achieved (and surpassed) RNN scores in translation. See [12] for explanation with animation.

Proposal Can we vary number of layers in encoder at run time? That is, can we know how much to process an input, at run time. Turns out that varying layers in encoder does not affect the result much [1]. But, that lead to an intriguing question – can we know the relative importance of each layer's output amongst output of all the layers? Similar to a committee machine Didn’t find any existing work that does similar thing.

Challenges We have variable number of words, thus the output size of each layer is variable. How do we provide fixed size input to gating network when output size of each layer is variable? Tried mean, max and sum across the words... Sum worked the best. Need to do a more detailed analysis about how to get state of a layer, why the sum worked. Positional embeddings were added for each layer so that the gating network knows what is the index of a layer Random Initialization of weights helped a lot (see next slide)

Results Validation error when gating network weights initialization with 0, stuck on local minima (was giving most of the weight to the first layer only) Validation error when gating network weights initialization randomly Validation error for original network (state of the art) Although there is not much difference in error rates, our model did achieve the same performance. With some tuning and experimenting, we might improve upon the baseline (which is currently state of the art)

Analysis and future work Related to ResNet – In ResNet, the output of each layer is weighted and added to the final output. In our case also the output of each layer is weighted and added to the final output, but the weights in our case are computed dynamically, depending on the input and outputs. The network did compute different weights for different input sentences. Difficult to see what they mean. Try on other networks – I could not find similar work done for general neural networks. We can try to apply this approach instead of Resnet and see the results.

References [1] Convolutional Sequence to Sequence Learning - https://arxiv.org/pdf/1705.03122.pdf [2] Adaptive computation time in RNN - https://arxiv.org/pdf/1603.08983.pdf [3] Current state of art RNN NMT - https://arxiv.org/pdf/1609.08144.pdf [4] Attention in RNN Encoder - https://arxiv.org/pdf/1409.0473.pdf [5] Attention walks on images - http://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf (some follow up papers on images also published) [6] Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation- http://arxiv.org/abs/1606.04199 [7] Addressing the RareWord Problem in NeuralMachine Translation - https://arxiv.org/pdf/1410.8206.pdf [8] On Using Very Large Target Vocabulary for Neural Machine Translation - http://www.aclweb.org/anthology/P15-1001 [9] Bleu metric - https://en.wikipedia.org/wiki/BLEU [10] WMT site - http://www.statmt.org/wmt14/translation-task.html [11] Google blog - https://research.googleblog.com/2016/09/a-neural-network-for-machine.html [12] Source code for convolutional sequence to sequence learning - https://github.com/facebookresearch/fairseq