Download presentation
Presentation is loading. Please wait.
1
Machine Translation(MT)
Doha Abd EL- Fattah Anwer Rizk & Mennat Allah Abd El Rahman Massoud Mathematics and Computer Science Department. Faculty of Science, Alexandria University.
2
Translation is difficult
What is MT? It is the use of computers to automate some or all of the process of translating from one language to another. Translation is a method of decoding, in which we treat the text as a poem in cryptography. Translation is difficult
3
Why MT is Difficult?
4
Cont. Why MT is Difficult?
Languages differs structurally and lexically. (eg: Chinese has less articles than English) Word order of two texts are very different. (eg: arrangement of words in the sentence differs from one language to another) Cultural differences.(eg: Translating people names) Translation requires deep and rich understanding of the source language.
5
Breakthroughs in MT Rule based Rule Based (RBMT):
Systems use a combination of language and grammar rules plus dictionaries for common words. Specialist dictionaries are created to focus on certain industries or disciplines. Rules-based systems typically deliver consistent translations with accurate terminology when trained with specialist dictionaries. Rule based Statistical Neural
6
Breakthroughs in MT Statistical Statistical (SMT) :
Types of statistical based MT Statistical Word Based: Rule based Statistical Neural
7
Breakthroughs in MT Statistical Statistical (SMT):
Types of statistical based MT Statistical Phrase Based: Rule based Statistical Neural
8
IBM Translation Models
9
Breakthroughs in MT Statistical IBM Model 1:
Current Statistical Machine Translation builds upon the IBM Models. Whole sequences are called phrases: The cat Translating using phrases has advantages: – Non-compositional units (words) can be translated. – There is no need to translate these phrases themselves. The resulting models are called Phrase- based. Rule based Statistical Neural
10
Breakthroughs in MT Statistical Cont. IBM Model 1:Alignment
Probability of a sentence. P(S) = P(W1,W2,W3,…,Wn) Source language is represented by symbol “e”. Target language is represented by Symbol “f”. Rule based Statistical Neural
11
Breakthroughs in MT Statistical Cont. Alignment:
English sentence ‘e’ has ‘l’ words e1,…,el. French sentence ‘f’ has ‘m’ words f1,…,fm. Alignment ‘ɑ’ identifies which English word each French word is originated from. P(e) : the language model. P(f|e) : the translation model. Rule based Statistical Neural
12
Breakthroughs in MT Statistical Cont. Alignment: Example:
e = And the program has been implemented. f =Le programme a ete mis an application. (one of many possible translations). one alignment is {2,3,4,5,6,6,6}. Another (bad) alignment {1,1,1,1,1,1,1}. Rule based Statistical Neural
13
Breakthroughs in MT Statistical Cont. Alignment:
The probability of a target phrase is: P(f|e)=freq(e,f)/freq(f) is difficult to get directly. Define a model for P(f,ɑ|e,m) = P(ɑ|e,m)P(f|ɑ,e,m) Also, P(f|e,m)= 𝑎∈𝐴 𝑃 𝑎 𝑒,𝑚 𝑃(𝑓|𝑎,𝑒,𝑚) A is set of all possible alignments. Rule based Statistical Neural
14
Breakthroughs in MT Statistical Cont. Alignment:
Once a model is found for any alignment P(ɑ|f,e,m) = P(f,ɑ|e,m) 𝑎∈𝐴 𝑃 𝑓,𝑎 𝑒,𝑚 For a given e,f pair, we can compute the most likely alignment. Nowadays, IBM original model is rarely(if ever) used for translation, but they play an important role in recovering alignments between sentences. Rule based Statistical Neural
15
Breakthroughs in MT Cont. Alignment: Rule based Statistical Neural
16
Breakthroughs in MT Statistical Cont. IBM Model 1:
In IBM Model 1 all alignments ɑ are equally alike: P(ɑ|e,m) = 1 (𝑙+1 ) 𝑚 e = null , e1 , e2 , …..,el f = f1 , f2 , f3 , ..…,fm Rule based Statistical Neural
17
Breakthroughs in MT Statistical Cont. IBM Model 1:
After we come up with an estimate for P(f | a,e,m) In Model 1 : P(f | a,e,m) = 𝑗=1 𝑚 𝑡( 𝑓 𝑗 𝑒 𝑎 𝑗 = t(le | the) * t(programme | program)….. Rule based Statistical Neural
18
Breakthroughs in MT Statistical Rule based Neural Cont. IBM Model 1:
To Generate a French string f from an English string e: Step 1 : pick an alignment ɑ with the probability 1 (𝑙+1 ) 𝑚 Step 2 : Pick the French words with the probability P(f | a,e,m) = 𝑗=1 𝑚 𝑡( 𝑓 𝑗 𝑒 𝑎 𝑗 . Step 3 : final result is P(f,ɑ | e,m) = P(ɑ|e,m)* P(f | ɑ,e,m) = 1 (𝑙+1 ) 𝑚 𝑗=1 𝑚 𝑡( 𝑓 𝑗 𝑒 𝑎 𝑗 . Rule based Statistical Neural
19
Breakthroughs in MT Neural Neural (NMT):
It is a new approach that makes machines learn to translate through one large neural network (multiple processing devices modeled on the brain). The approach has become increasingly popular amongst MT researchers and developers, as trained NMT systems have begun to show better translation performance in many language pairs compared to the phrase-based statistical approach. Rule based Statistical Neural
20
Neural Machine Translation (NMT)
21
Bilingual NMT
22
Multilingual NMT - Previously
23
Google Multilingual NMT
24
Cont. Google Multilingual NMT
25
Cont. Google Multilingual NMT
26
Cont. Google Multilingual NMT
Test (Zero-shot)
27
Artificial Token At the beginning of the input sentence to indicate the target language.
28
Recurrent Neural Network (RNN)
Why not Feedforward NN?
29
Recurrent Neural Network (RNN)
Why RNN?
30
Recurrent Neural Network (RNN)
Cont. Why RNN?
31
Recurrent Neural Network (RNN)
What is RNN?
32
Long Short Term Memory Network (LSTM)
LSTMs are a building unit for layers of a recurrent neural network (RNN). A RNN composed of LSTM units is often called an LSTM network. A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell is responsible for "remembering" values over arbitrary time.
33
Cont. Long Short Term Memory Network (LSTM)
34
Attention: Seq2Seq The Seq2Seq framework relies on the encoder-decoder paradigm. The encoder encodes the input sequence, while the decoder produces the target sequence.
35
cont. Attention The cat eats mouse the e h0 h1 h2 h4 h3 h5 h6 d d Die
Katze h5 h6 d
36
Problem in LSTM Architecture…….
It seems somewhat unreasonable to assume that we can encode all information about a potentially very long sentence into a single vector and then have the decoder produce a good translation based on only that. Let’s say your source sentence is 50 words long. The first word of the English translation is probably highly correlated with the first word of the source sentence. But that means decoder has to consider information from 50 steps ago, and that information needs to be somehow encoded in the vector.
37
Solution: Random Access Memory.
Cont. Attention Solution: Random Access Memory.
38
Cont. Attention To know the next word that should be translated, combine all of the hidden states of the encoder weighted by how much attention we are paying to it in the sentence.
39
Cont. Attention: Scoring
40
Cont. Attention: Normalization
41
Cont. Attention: Context vector
42
Cont. Attention: Score Function
43
References: Daniel Jurafsky, James H. Martin, 2000, Speech and Language Processing Machine Translation Advanced Methods | NLP | University of Michigan, Lecture 10: Neural Machine Translation and Models with Attention, Stanford University, Attention Is All You Need, IBM Model 1 - Part I, IBM Model 1 - Part II, IBM Model 2,
44
Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.