Machine Translation(MT)

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Natural Language Processing Expectation Maximization.

Deep Learning Neural Network with Memory (1)

Addressing the Rare Word Problem in Neural Machine Translation

Haitham Elmarakeby.  Speech recognition

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Convolutional LSTM Networks for Subcellular Localization of Proteins

Predicting the dropouts rate of online course using LSTM method

English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.

NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.

Fill-in-The-Blank Using Sum Product Network

S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer

Attention Model in NLP Jichuan ZENG.

Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi

Neural Machine Translation

Statistical Machine Translation Part II: Word Alignments and EM

RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,

Approaches to Machine Translation

End-To-End Memory Networks

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel

Natural Language and Text Processing Laboratory

Recurrent Neural Networks for Natural Language Processing

Neural Machine Translation by Jointly Learning to Align and Translate

Show and Tell: A Neural Image Caption Generator (CVPR 2015)

An Overview of Machine Translation

ICS 491 Big Data Analytics Fall 2017 Deep Learning

Intelligent Information System Lab

Intro to NLP and Deep Learning

Neural Machine Translation By Learning to Jointly Align and Translate

Attention Is All You Need

Grid Long Short-Term Memory

RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect

Paraphrase Generation Using Deep Learning

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

A First Look at Music Composition using LSTM Recurrent Neural Networks

Final Presentation: Neural Network Doc Summarization

CSCI 5832 Natural Language Processing

Understanding LSTM Networks

Word Embedding Word2Vec.

Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.

Approaches to Machine Translation

The Big Health Data–Intelligent Machine Paradox

Other Classification Models: Recurrent Neural Network (RNN)

Memory-augmented Chinese-Uyghur Neural Machine Translation

Natural Language Processing

Lecture 16: Recurrent Neural Networks (RNNs)

Natural Language to SQL(nl2sql)

Report by: 陆纪圆.

Word embeddings (continued)

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Attention for translation

-- Ray Mooney, Association for Computational Linguistics (ACL) 2014

Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi

A unified extension of lstm to deep network

Neural Machine Translation - Encoder-Decoder Architecture and Attention Mechanism Anmol Popli CSE 291G.

Neural Machine Translation using CNN

Question Answering System

Neural Machine Translation

Week 3 Presentation Ngoc Ta Aidean Sharghi.

Recurrent Neural Networks

CSC 578 Neural Networks and Deep Learning

Neural Machine Translation by Jointly Learning to Align and Translate

Presentation transcript:

Machine Translation(MT) Doha Abd EL- Fattah Anwer Rizk & Mennat Allah Abd El Rahman Massoud Mathematics and Computer Science Department. Faculty of Science, Alexandria University.

Translation is difficult What is MT? It is the use of computers to automate some or all of the process of translating from one language to another. Translation is a method of decoding, in which we treat the text as a poem in cryptography. Translation is difficult

Why MT is Difficult?

Cont. Why MT is Difficult? Languages differs structurally and lexically. (eg: Chinese has less articles than English) Word order of two texts are very different. (eg: arrangement of words in the sentence differs from one language to another) Cultural differences.(eg: Translating people names) Translation requires deep and rich understanding of the source language.

Breakthroughs in MT Rule based Rule Based (RBMT): Systems use a combination of language and grammar rules plus dictionaries for common words. Specialist dictionaries are created to focus on certain industries or disciplines. Rules-based systems typically deliver consistent translations with accurate terminology when trained with specialist dictionaries. Rule based Statistical Neural

Breakthroughs in MT Statistical Statistical (SMT) : Types of statistical based MT Statistical Word Based: Rule based Statistical Neural

Breakthroughs in MT Statistical Statistical (SMT): Types of statistical based MT Statistical Phrase Based: Rule based Statistical Neural

IBM Translation Models

Breakthroughs in MT Statistical IBM Model 1: Current Statistical Machine Translation builds upon the IBM Models. Whole sequences are called phrases: The cat Translating using phrases has advantages: – Non-compositional units (words) can be translated. – There is no need to translate these phrases themselves. The resulting models are called Phrase- based. Rule based Statistical Neural

Breakthroughs in MT Statistical Cont. IBM Model 1:Alignment Probability of a sentence. P(S) = P(W1,W2,W3,…,Wn) Source language is represented by symbol “e”. Target language is represented by Symbol “f”. Rule based Statistical Neural

Breakthroughs in MT Statistical Cont. Alignment: English sentence ‘e’ has ‘l’ words e1,…,el. French sentence ‘f’ has ‘m’ words f1,…,fm. Alignment ‘ɑ’ identifies which English word each French word is originated from. P(e) : the language model. P(f|e) : the translation model. Rule based Statistical Neural

Breakthroughs in MT Statistical Cont. Alignment: Example: e = And the program has been implemented. f =Le programme a ete mis an application. (one of many possible translations). one alignment is {2,3,4,5,6,6,6}. Another (bad) alignment {1,1,1,1,1,1,1}. Rule based Statistical Neural

Breakthroughs in MT Statistical Cont. Alignment: The probability of a target phrase is: P(f|e)=freq(e,f)/freq(f) is difficult to get directly. Define a model for P(f,ɑ|e,m) = P(ɑ|e,m)P(f|ɑ,e,m) Also, P(f|e,m)= 𝑎∈𝐴 𝑃 𝑎 𝑒,𝑚 𝑃(𝑓|𝑎,𝑒,𝑚) A is set of all possible alignments. Rule based Statistical Neural

Breakthroughs in MT Statistical Cont. Alignment: Once a model is found for any alignment P(ɑ|f,e,m) = P(f,ɑ|e,m) 𝑎∈𝐴 𝑃 𝑓,𝑎 𝑒,𝑚 For a given e,f pair, we can compute the most likely alignment. Nowadays, IBM original model is rarely(if ever) used for translation, but they play an important role in recovering alignments between sentences. Rule based Statistical Neural

Breakthroughs in MT Cont. Alignment: Rule based Statistical Neural

Breakthroughs in MT Statistical Cont. IBM Model 1: In IBM Model 1 all alignments ɑ are equally alike: P(ɑ|e,m) = 1 (𝑙+1 ) 𝑚 e = null , e1 , e2 , …..,el f = f1 , f2 , f3 , ..…,fm Rule based Statistical Neural

Breakthroughs in MT Statistical Cont. IBM Model 1: After we come up with an estimate for P(f | a,e,m) In Model 1 : P(f | a,e,m) = 𝑗=1 𝑚 𝑡( 𝑓 𝑗 𝑒 𝑎 𝑗 = t(le | the) * t(programme | program)….. Rule based Statistical Neural

Breakthroughs in MT Statistical Rule based Neural Cont. IBM Model 1: To Generate a French string f from an English string e: Step 1 : pick an alignment ɑ with the probability 1 (𝑙+1 ) 𝑚 Step 2 : Pick the French words with the probability P(f | a,e,m) = 𝑗=1 𝑚 𝑡( 𝑓 𝑗 𝑒 𝑎 𝑗 . Step 3 : final result is P(f,ɑ | e,m) = P(ɑ|e,m)* P(f | ɑ,e,m) = 1 (𝑙+1 ) 𝑚 𝑗=1 𝑚 𝑡( 𝑓 𝑗 𝑒 𝑎 𝑗 . Rule based Statistical Neural

Breakthroughs in MT Neural Neural (NMT): It is a new approach that makes machines learn to translate through one large neural network (multiple processing devices modeled on the brain). The approach has become increasingly popular amongst MT researchers and developers, as trained NMT systems have begun to show better translation performance in many language pairs compared to the phrase-based statistical approach. Rule based Statistical Neural

Neural Machine Translation (NMT)

Bilingual NMT

Multilingual NMT - Previously

Google Multilingual NMT

Cont. Google Multilingual NMT

Cont. Google Multilingual NMT

Cont. Google Multilingual NMT Test (Zero-shot)

Artificial Token At the beginning of the input sentence to indicate the target language.

Recurrent Neural Network (RNN) Why not Feedforward NN?

Recurrent Neural Network (RNN) Why RNN?

Recurrent Neural Network (RNN) Cont. Why RNN?

Recurrent Neural Network (RNN) What is RNN?

Long Short Term Memory Network (LSTM) LSTMs are a building unit for layers of a recurrent neural network (RNN). A RNN composed of LSTM units is often called an LSTM network. A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell is responsible for "remembering" values over arbitrary time.

Cont. Long Short Term Memory Network (LSTM)

Attention: Seq2Seq The Seq2Seq framework relies on the encoder-decoder paradigm. The encoder encodes the input sequence, while the decoder produces the target sequence.

cont. Attention The cat eats mouse the e h0 h1 h2 h4 h3 h5 h6 d d Die Katze h5 h6 d

Problem in LSTM Architecture……. It seems somewhat unreasonable to assume that we can encode all information about a potentially very long sentence into a single vector and then have the decoder produce a good translation based on only that. Let’s say your source sentence is 50 words long. The first word of the English translation is probably highly correlated with the first word of the source sentence. But that means decoder has to consider information from 50 steps ago, and that information needs to be somehow encoded in the vector.

Solution: Random Access Memory. Cont. Attention Solution: Random Access Memory.

Cont. Attention To know the next word that should be translated, combine all of the hidden states of the encoder weighted by how much attention we are paying to it in the sentence.

Cont. Attention: Scoring

Cont. Attention: Normalization

Cont. Attention: Context vector

Cont. Attention: Score Function

References: Daniel Jurafsky, James H. Martin, 2000, Speech and Language Processing Machine Translation Advanced Methods | NLP | University of Michigan, https://www.youtube.com/watch?v=owSClMuxQTY&t=28s Lecture 10: Neural Machine Translation and Models with Attention, Stanford University, https://www.youtube.com/watch?v=IxQtK2SjWWM&t=3927s Attention Is All You Need, https://www.youtube.com/watch?v=iDulhoQ2pro&t=553s IBM Model 1 - Part I, https://www.youtube.com/watch?v=PEvRedAiF-E IBM Model 1 - Part II, https://www.youtube.com/watch?v=n58-akQIlQ4 IBM Model 2, https://www.youtube.com/watch?v=CT4ScNnl3Tk

Thank You!