1 COMP791A: Statistical Language Processing Machine Translation and Statistical Alignment Chap. 13.

1 COMP791A: Statistical Language Processing Machine Translation and Statistical Alignment Chap. 13

2 Contents 1- Machine Translation 2- Statistical Machine Translation 3- Text Alignment  Length-based methods  Offset alignment by signal processing techniques  Lexical methods of sentence alignment 4- Word Alignment

3 Where:  meaning(text2) == meaning(text1) i.e. faithful  text2 is perfecly grammatical and idiomatic i.e. fluent MT is very hard  translation programs available today do not perform very well Goal of MT Text1 in source language Text2 in target language

4 Little history of MT 1950’s  inspired by the code-breakers of WWII  Russian is just an encoded version of English  “We’ll have this up and running in a few years, it’ll be great, just give us lots of money” 1964  ALPAC report (Automatic Language Processing Advisory Committee) “…we do not have useful machine translation…” “…there is no immediate or predictable prospect of useful machine translation…”  Nearly sank funding for all of AI. 1990’s  DARPA funds research in MT  2 “competitive” approaches Statistical MT (IBM at TJ Watson Research Center) Rule-based MT(CMU, ISI, NMSU)  Regular competitions  And the winner was… Systran!

5 Difficulties in MT Different word order (SVO vs VSO vs SOV languages) “the black cat” (DT ADJ N) --> “le chat noir” (DT N ADJ) Many-to-many mapping between words in different languages “John knows Bill.” --> “John connaît Bill.” “John knows Bill will be late.” --> “John sait que Bill sera en retard.” Overlapping of word senses leg patte étape jambe pied foot paw human journey chair animal human bird

6 analysis --> transfer --> generation Each arrow can be implemented with rule-based methods or probabilistically The Transfer Metaphor Interlingua attraction(NamedJohn, NamedMary, high) English Semantics loves(John, Mary) French Semantics aime(Jean, Marie) English Syntax S(NP(John) VP(loves, NP(Mary))) French Syntax S(NP(Jean) VP(aime, NP(Marie))) English Words John loves Mary French Words Jean aime Marie word transfer (memory-based translation) syntactic transfer semantic transfer knowledge transfer

7 Syntactic transfer Solves some problems…  Word order  Some cases of lexical choice Ex:  Dictionary of analysis know: verb ; transitive ; subj: human ; obj: NP || Sentence  Dictionary of transfer know + obj [NP] --> connaître know + obj [sentence] --> savoir But syntax is not enough…  No one-to-one correspondence between syntactic structures in different languages (syntactic mismatch)

8 2-Statistical MT: Being faithful & fluent Often impossible to have a true translation; one that is:  Faithful to the source language, and  Fluent in the target language  Ex: Japanese: “fukaku hansei shite orimasu” Fluent translation: “we apologize” Faithful translation: “we are deeply reflecting (on our past behaviour, and what we did wrong, and how to avoid the problem next time)” So need to compromise between faithfulness & fluency Statistical MT tries to maximise some function that represents the importance of faithfulness and fluency  Best-translation T*= argmax T fluency(T) x faithfulness(T, S)

9 The Noisy Channel Model Statistical MT is based on the noisy channel model Developed by Shannon to model communication (ex. over a phone line) Noisy channel model in SMT (ex. en|fr):  Assume that the true text is in English  But when it was transmitted over the noisy channel, it somehow got corrupted and came out in French i.e. the noisy channel has deformed/corrupted the original English input into French So really… French is a form of noisy English  The task is to recover the original English sentence (or to decode the French into English)

10 Fundamental Equation for SMT Assume we are translating from FR-->EN (en|fr) Intuitively we saw that: e* = argmax e fluency(e) x faithfulness(e, f) More formally: e* = argmax e P(e|f) By Bayes theorem: But P(f) is the same for all e, so may seem circular… why not just P(e|f) ???  P(f|e) x P(e) allows us to have a sloppy translation model  Hopefully P(e) will correct the mistakes of the translation model

11 Example of SMT (en|jp) Source sentence (Japanese): “2000men taio” Translation model From the Translation model: ”2000 correspondence” is the best translation But the Language model: “2000 correspondence” is not frequent at all so overall: “dealing with Y2K” is the best translation! (maximizes their product) 2000mentaio More probable2000correspondence Year 2000corresponding Y2Kequivalent 200 yearstackle 200 yeardeal with Less probable……

12 We need 3 things (for en|fr): 1. A Language Model of English: P(e)  Measures fluency  Probability of an English sentence  We can do this with an n-gram or PCFG  ~ Provides the right word ordering and collocations  ~ Provides a set of fluent sentences to test for potential translation 2. A Translation Model: P(f|e)  Measures faithfulness  Probability of an (French, English) pair  We can do this with text (word) alignment of parallel corpora  ~ Provides the right bag of words  ~Tests if a given fluent sentence is a translation 3. A Decoder: argmax  An effective and efficient search technique to find e*  Usually we use a heuristic search

13 seen in class… We need a Language Model P(e)

14 We need 3 things (for en|fr): 1. A Language Model of English: P(e)  Measures fluency  Probability of an English sentence  We can do this with an n-gram or PCFG  ~ Provides the right word ordering and collocations  ~ Provides a set of fluent sentences to test for potential translation 2. --> A Translation Model: P(f|e)  Measures faithfulness  Probability of an (French, English) pair  We can do this with text (word) alignment of parallel corpora  ~ Provides the right bag of words  ~Tests if a given fluent sentence is a translation 3. A Decoder: argmax  An effective and efficient search technique to find e*  Usually we use a heuristic search

15 Probability of an FR sentence being a translation of an EN sentence  ~ the product of the probabilities that each FR word is the translation of some EN word  unigram translation model  ex: P(le chien est mort | the dog is dead) = P(le|the) x P(chien|dog) x P(est|is) x P(mort|dead) So we need to know, for each FR word, the probability of it mapping to each possible EN word But where do we get these probabilities? We need a translation model P(f|e) ex: IBM model 3

16 Parallel Texts Parallel texts or bitexts  Same content is available in several languages  Official documents of countries with multiple official languages -> literal, consistent Alignment  Paragraph to paragraph, sentence to sentence, word to word Language2 Section k Paragraph k Sentence k Phrase k Word k … Word m Language1 Section i Paragraph i Sentence i Phrase i Word i … Word j

17 Problem 1: Fertility word choice is not 1-to-1  ex: Je mange à la maison.--> I eat home. solution:  a word with fertility n gets copied n times, and for each of these n times, gets translated independently  ex: à la maison --> home à --> fertility 0 la --> fertility 0 maison --> fertility 1 use unigram translation model to translate maison-->home  ex: home --> à la maison home --> fertility 3 home home home --> à la maison note: the translation model will give the same probability to: home home home --> maison à la… it is up to the language model to select the correct word order

18 Problem 2: Word order word order is not the same in both languages  ex: le chien brun --> the brown dog solution:  assign an offset to move words from their original positing to their final position  ex: chien brun --> brown dog brown --> offset +1 dog --> offset -1  Making the offset dependent on the words would be too costly… so in IBM model 3, the offset only depends: on the position of the word within the sentence!!! the length of the sentences in both languages P(offset=o | Position = p, EngLen = m, FrLen = n)  ex: brown dog  offset of brown = P(offset| 1,2,2)  ex: P(+1| 1,2,2) =.3 P(0| 1,2,2) =.6 P(-1| 1,2,2) =.1

19 An Example (en|fr) Given the English Thebrowndogdidnot gohome Fertility Model 1111213 Transformed English Thebrowndogdidnot gohome Translation Model Lebrunchienestn'pasalléàlamaison Offset Model 0+1+100000 A possible Translation Lechienbrunn'estpasalléàlahome Then use Language Model P(e) to evaluate fluency of all possible translations

21 We need 3 things (for en|fr): 1. A Language Model of English: P(e)  Measures fluency  Probability of an English sentence  We can do this with an n-gram or PCFG  ~ Provides the right word ordering and collocations  ~ Provides a set of fluent sentences to test for potential translation 2. --> A Translation Model: P(f|e)  Measures faithfulness  Probability of an (French, English) pair  We can do this with text (word) alignment of parallel corpora  ~ Provides the right bag of words  ~Tests if a given fluent sentence is a translation 3. --> A Decoder: argmax  An effective and efficient search technique to find e*  Usually we use a heuristic search

22 We needed a decoder  we can compute P(e|f) for any given pair of (en,fr) sentences… that's nice  but: what we really want is to find the English sentence that maximises P(e|f) given a French sentence assume a vocabulary of 100,000 words in English there are 10 5n possible English sentences of length n.. and many alignments of each one, and many possible offsets …  we need a search algorithm (ex. A*)

23 3- Text alignment used to find P(f|e) not a trivial task Problems:  not always one sentence to one sentence translators do not always translate one sentence in the input into one sentence in the output although true in 90% of the cases.  crossing dependencies the order of sentences are changed in the translation.  Large pieces of material can disappear

24 Egyptian hieroglyphs Egyptian Demotic Greek carved in 196 BC found in 1799 decoded in 1822 The Rosetta Stone

25 A modern Rosetta Stone: TransSearch

26 Note:  Re-ordering of phrases  Disappearance of phrases (they are implied in the French version) Quand aux eaux minérales et aux limonades, Elles rencontrent toujours plus d’adeptes. En effet notre sondage fait ressortir des ventes nettement supérieures à celles de 1987, pour les boissons à base de cola notamment. According to our survey, 1988 sales of mineral water and soft drinks were much higher than in 1987, reflecting their growing popularity of these products. Cola drink manufacturers in particular achieved above average growth rate. Example

27 Aligning sentence and paragraph BEAD is a n:m grouping  S, T : text in two languages  S = (s 1, s 2, …, s i )  T = (t 1, t 2, …, t j )  Each sentence can occur in only one bead  Assume no crossing (but occurs in reality)  Most common (90%) 1:1  But also: 0:1, 1:0, 2:1, 1:2, 2:2, 2:3, 3:2 … s1.......sis1.......si t1.......tjt1.......tj ST b1b2b3b4b5..bkb1b2b3b4b5..bk

28 Quand aux eaux minérales et aux limonades, Elles rencontrent toujours plus d’adeptes. En effet notre sondage fait ressortir des ventes nettement supérieures à celles de 1987, pour les boissons à base de cola notamment. According to our survey, 1988 sales of mineral water and soft drinks were much higher than in 1987, reflecting their growing popularity of these products. Cola drink manufacturers in particular achieved above average growth rate. 2:2 alignment Example

29 Approaches to Text Alignment Length-Based Methods  short sentences will be translated with short sentences  long sentences will be translated with long sentences Offset Alignment by Signal Processing Techniques  do not attempt to align beads of sentences  just try to align position offsets in the two parallel texts Lexical Methods  use lexical information to align beads of sentences

30 Approaches to Text Alignment --> Length-Based Methods Offset Alignment by Signal Processing Techniques Lexical Methods

31 Rationale  Short sentence -> short sentence  Long sentence -> long sentence Length  nb of words or nb of characters Advantages:  Efficient (for similar languages)  Fast !

32 Length-based method Rationale: Short sentence -> short sentence / Long sentence -> long sentence Length: nb of words or nb of characters Advantages: Efficient (for similar languages) and fast! Gale and Church (1993):  Find alignment A with highest probability given the two parallel texts S and T.  Union Bank of Switzerland Corpus (English, French, German)  Let D(i,j) be the lowest cost alignment (the distance) between sentences s 1,…,s i and t 1,…,t j

33 Example cost(align(s 1, t 1 )) cost(align(s 2, t 2 )) cost(align(s 3,  )) t1t2t3t1t2t3 t1t2t3t1t2t3 s1s2s3s4s1s2s3s4 alignment 1 cost(align([s 1, s 2 ], t 1 )) cost(align(s 3, t 2 )) cost(align(s 4, t 3 )) + + alignment 2 + + cost(align(s 4, t 3 )) L2L2 L1L1 + L1L1 Mean length ratio of sentences (nb of characters) in bead is ~1  German/English = 1.1 French/English = 1.06 Cost of an alignment  Calculate the difference (distance) between lengths of sentences in the beads  So as to minimize this distance  i.e. try to align beads so that the lengths of the sentences from the 2 languages in each bead are as similar as possible.

34 Results Gale and Church (1993)  use Dynamic Programming to efficiently consider all possible alignments and find the minimum cost alignment  method performs well (at least on related languages) 4% error rate only 2% error rate on 1:1 alignments higher error rate on more difficult alignments  Assumes paragraph alignment  Without a paragraph alignment, error rate triples

35 Approaches to Text Alignment Length-Based Methods --> Offset Alignment Lexical Methods

36 Offset alignment Length-based methods work well on clean texts but may break down in real-world situations  Ex: noisy text (OCR output with no clear sentence or paragraph boundaries,…) Church (1993)  Goal: Showing roughly what offset in one text aligns with what offset in the other.  uses cognates (words that are similar across languages) Ex: proper names, numbers, common ancestors…  Ex: Smith, 848-3000, superior/supérieur  But: uses cognates at the level of character sequences NOT at the word level  Build a dot-plot

37 ST T the source and translated text are concatenated a square graph is made with this text on both axes a dot is placed at (x,y) when there is a match. [Unit= character 4-grams] Sample Dot Plot S Perfect match of a text with itself Match of a text with its translation (cognates) The small diagonals provide an alignment in terms of offsets in the two texts

38 Approaches to Text Alignment Length-Based Methods Offset Alignment by Signal Processing Techniques --> Lexical Methods

39 Lexical methods Align beads of sentences using lexical information Kay and Röscheisen (1993)  Idea: Use word alignment to help determine sentence alignment Then use sentence alignment to refine word alignment,…  Method: 1. Begin with start and end of text as anchors 2. Form an envelope of all possible alignments (no crossing of anchors) where: Possible alignments must be at a certain distance away from the anchors The distance increases as we get further away from the anchors 3. Choose pairs of words that co-occur in these potential alignments 4. Pick the best sentences involved in step 3 (having the most lexical correspondences) and use them as new anchors 5. Repeat steps 2-5

40 Example Sentences of language 2 12345678910111213141516171819 1 ● 2 ●●● 3 ●●●● 4 ●●●●● 5 ●●●●● 6 ●●●●●● 7 ●●●●●● 8 ●●●●●● 9 ●●●●●●● 10 ●●●●●● 11 ●●●●●● 12 ●●●●● 13 ●●●● 14 ●●●● 15 ●●● 16 ● Sentences of language 1

41 Example (con’t) Sentences of language 2 12345678910111213141516171819 1 ● 2 ●●● 3 ●●●● 4 ●●●● 5 ●●●●● 6 ●●●●●● 7 ●●●●●● 8 ●●●●●● 9 ●●●●●●● 10 ●●●●●● 11 ●●●●●● 12 ●●●●● 13 ●●●● 14 ●●●● 15 ●●● 16 ● Sentences of language 1

42 Example (con’t) Sentences of language 2 12345678910111213141516171819 1 ● 2 ●● 3 ●● 4 ● 5 ●●● 6 ●●●● 7 ●●●● 8 ●●● 9 ● 10 ●●● 11 ●●● 12 ●●●● 13 ●●●● 14 ●●● 15 ●●● 16 ● Sentences of language 1

43 Example (con’t) Sentences of language 2 12345678910111213141516171819 1 ● 2 ●● 3 ●● 4 ● 5 ●●● 6 ●●●● 7 ●●●● 8 ●●● 9 ● 10 ●●● 11 ●●● 12 ●●●● 13 ●●●● 14 ●●● 15 ●●● 16 ● Sentences of language 1

44 Example (con’t) Sentences of language 2 12345678910111213141516171819 1 ● 2 ●● 3 ● 4 ●● 5 ●●● 6 ●●●● 7 ● 8 ●● 9 ●●●● 10 ●●●● 11 ● 12 ●● 13 ●● 14 ●● 15 ●● 16 ● Sentences of language 1

45 Word Alignment Usually done in two steps: 1. Do sentence/text alignment 2. Select words from aligned pairs and use frequency or chi-square to see if they co-occur more frequently English: In the beginning God created the heavens and the earth. Vietnamese: Ban dâu Ðúc Chúa Tròi dung nên tròi dât. English: God called the expanse heaven. Vietnamese: Ðúc Chúa Tròi dat tên khoang không la tròi. English: … you are this day like the stars of heaven in number. Vietnamese: … các nguoi dông nhu sao trên tròi. Can also use an existing bilingual dictionary to start the word-alignment

1 COMP791A: Statistical Language Processing Machine Translation and Statistical Alignment Chap. 13.

Similar presentations

Presentation on theme: "1 COMP791A: Statistical Language Processing Machine Translation and Statistical Alignment Chap. 13."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 COMP791A: Statistical Language Processing Machine Translation and Statistical Alignment Chap. 13.

Similar presentations

Presentation on theme: "1 COMP791A: Statistical Language Processing Machine Translation and Statistical Alignment Chap. 13."— Presentation transcript:

Similar presentations

About project

Feedback