1 COMP791A: Statistical Language Processing Machine Translation and Statistical Alignment Chap. 13.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Intro to NLP - J. Eisner1 Learning to Translate.
Machine Translation Course 9 Diana Trandab ă ț Academic year
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Search Applications: Machine Translation Next time: Constraint Satisfaction Reading for today: See “Machine Translation Paper” under links Reading for.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
5/28/031 Data Intensive Linguistics Statistical Alignment and Machine Translation.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Natural Language Processing Expectation Maximization.
1 Machine Translation (MT) Definition –Automatic translation of text or speech from one language to another Goal –Produce close to error-free output that.
Machine translation Context-based approach Lucia Otoyo.
Statistical Alignment and Machine Translation
WSTA 20: Machine Translation
Text Analysis Everything Data CompSci Spring 2014.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
5/28/031 Data Intensive Linguistics Statistical Alignment and Machine Translation.
Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.
February 2006Machine Translation II.21 Postgraduate Diploma In Translation Example Based Machine Translation Statistical Machine Translation.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
Natural Language Processing Statistical Inference: n-grams
NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.
Machine Translation Course 2 Diana Trandab ă ţ Academic year:
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation Diana Trandab ă ţ Academic Year
Machine Translation Course 4 Diana Trandab ă ț Academic year:
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
23.3 Information Extraction More complicated than an IR (Information Retrieval) system. Requires a limited notion of syntax and semantics.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Recap: distributional hypothesis What is tezgüino? – A bottle of tezgüino is on the table. – Everybody likes tezgüino. – Tezgüino makes you drunk. – We.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Statistical Machine Translation Part II: Word Alignments and EM
Approaches to Machine Translation
Statistical NLP: Lecture 13
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Approaches to Machine Translation
Machine Translation(MT)
Recap: WordNet A very large lexical database of English:
Presentation transcript:

1 COMP791A: Statistical Language Processing Machine Translation and Statistical Alignment Chap. 13

2 Contents 1- Machine Translation 2- Statistical Machine Translation 3- Text Alignment  Length-based methods  Offset alignment by signal processing techniques  Lexical methods of sentence alignment 4- Word Alignment

3 Where:  meaning(text2) == meaning(text1) i.e. faithful  text2 is perfecly grammatical and idiomatic i.e. fluent MT is very hard  translation programs available today do not perform very well Goal of MT Text1 in source language Text2 in target language

4 Little history of MT 1950’s  inspired by the code-breakers of WWII  Russian is just an encoded version of English  “We’ll have this up and running in a few years, it’ll be great, just give us lots of money” 1964  ALPAC report (Automatic Language Processing Advisory Committee) “…we do not have useful machine translation…” “…there is no immediate or predictable prospect of useful machine translation…”  Nearly sank funding for all of AI. 1990’s  DARPA funds research in MT  2 “competitive” approaches Statistical MT (IBM at TJ Watson Research Center) Rule-based MT(CMU, ISI, NMSU)  Regular competitions  And the winner was… Systran!

5 Difficulties in MT Different word order (SVO vs VSO vs SOV languages) “the black cat” (DT ADJ N) --> “le chat noir” (DT N ADJ) Many-to-many mapping between words in different languages “John knows Bill.” --> “John connaît Bill.” “John knows Bill will be late.” --> “John sait que Bill sera en retard.” Overlapping of word senses leg patte étape jambe pied foot paw human journey chair animal human bird

6 analysis --> transfer --> generation Each arrow can be implemented with rule-based methods or probabilistically The Transfer Metaphor Interlingua attraction(NamedJohn, NamedMary, high) English Semantics loves(John, Mary) French Semantics aime(Jean, Marie) English Syntax S(NP(John) VP(loves, NP(Mary))) French Syntax S(NP(Jean) VP(aime, NP(Marie))) English Words John loves Mary French Words Jean aime Marie word transfer (memory-based translation) syntactic transfer semantic transfer knowledge transfer

7 Syntactic transfer Solves some problems…  Word order  Some cases of lexical choice Ex:  Dictionary of analysis know: verb ; transitive ; subj: human ; obj: NP || Sentence  Dictionary of transfer know + obj [NP] --> connaître know + obj [sentence] --> savoir But syntax is not enough…  No one-to-one correspondence between syntactic structures in different languages (syntactic mismatch)

8 2-Statistical MT: Being faithful & fluent Often impossible to have a true translation; one that is:  Faithful to the source language, and  Fluent in the target language  Ex: Japanese: “fukaku hansei shite orimasu” Fluent translation: “we apologize” Faithful translation: “we are deeply reflecting (on our past behaviour, and what we did wrong, and how to avoid the problem next time)” So need to compromise between faithfulness & fluency Statistical MT tries to maximise some function that represents the importance of faithfulness and fluency  Best-translation T*= argmax T fluency(T) x faithfulness(T, S)

9 The Noisy Channel Model Statistical MT is based on the noisy channel model Developed by Shannon to model communication (ex. over a phone line) Noisy channel model in SMT (ex. en|fr):  Assume that the true text is in English  But when it was transmitted over the noisy channel, it somehow got corrupted and came out in French i.e. the noisy channel has deformed/corrupted the original English input into French So really… French is a form of noisy English  The task is to recover the original English sentence (or to decode the French into English)

10 Fundamental Equation for SMT Assume we are translating from FR-->EN (en|fr) Intuitively we saw that: e* = argmax e fluency(e) x faithfulness(e, f) More formally: e* = argmax e P(e|f) By Bayes theorem: But P(f) is the same for all e, so may seem circular… why not just P(e|f) ???  P(f|e) x P(e) allows us to have a sloppy translation model  Hopefully P(e) will correct the mistakes of the translation model

11 Example of SMT (en|jp) Source sentence (Japanese): “2000men taio” Translation model From the Translation model: ”2000 correspondence” is the best translation But the Language model: “2000 correspondence” is not frequent at all so overall: “dealing with Y2K” is the best translation! (maximizes their product) 2000mentaio More probable2000correspondence Year 2000corresponding Y2Kequivalent 200 yearstackle 200 yeardeal with Less probable……

12 We need 3 things (for en|fr): 1. A Language Model of English: P(e)  Measures fluency  Probability of an English sentence  We can do this with an n-gram or PCFG  ~ Provides the right word ordering and collocations  ~ Provides a set of fluent sentences to test for potential translation 2. A Translation Model: P(f|e)  Measures faithfulness  Probability of an (French, English) pair  We can do this with text (word) alignment of parallel corpora  ~ Provides the right bag of words  ~Tests if a given fluent sentence is a translation 3. A Decoder: argmax  An effective and efficient search technique to find e*  Usually we use a heuristic search

13 seen in class… We need a Language Model P(e)

14 We need 3 things (for en|fr): 1. A Language Model of English: P(e)  Measures fluency  Probability of an English sentence  We can do this with an n-gram or PCFG  ~ Provides the right word ordering and collocations  ~ Provides a set of fluent sentences to test for potential translation 2. --> A Translation Model: P(f|e)  Measures faithfulness  Probability of an (French, English) pair  We can do this with text (word) alignment of parallel corpora  ~ Provides the right bag of words  ~Tests if a given fluent sentence is a translation 3. A Decoder: argmax  An effective and efficient search technique to find e*  Usually we use a heuristic search

15 Probability of an FR sentence being a translation of an EN sentence  ~ the product of the probabilities that each FR word is the translation of some EN word  unigram translation model  ex: P(le chien est mort | the dog is dead) = P(le|the) x P(chien|dog) x P(est|is) x P(mort|dead) So we need to know, for each FR word, the probability of it mapping to each possible EN word But where do we get these probabilities? We need a translation model P(f|e) ex: IBM model 3

16 Parallel Texts Parallel texts or bitexts  Same content is available in several languages  Official documents of countries with multiple official languages -> literal, consistent Alignment  Paragraph to paragraph, sentence to sentence, word to word Language2 Section k Paragraph k Sentence k Phrase k Word k … Word m Language1 Section i Paragraph i Sentence i Phrase i Word i … Word j

17 Problem 1: Fertility word choice is not 1-to-1  ex: Je mange à la maison.--> I eat home. solution:  a word with fertility n gets copied n times, and for each of these n times, gets translated independently  ex: à la maison --> home à --> fertility 0 la --> fertility 0 maison --> fertility 1 use unigram translation model to translate maison-->home  ex: home --> à la maison home --> fertility 3 home home home --> à la maison note: the translation model will give the same probability to: home home home --> maison à la… it is up to the language model to select the correct word order

18 Problem 2: Word order word order is not the same in both languages  ex: le chien brun --> the brown dog solution:  assign an offset to move words from their original positing to their final position  ex: chien brun --> brown dog brown --> offset +1 dog --> offset -1  Making the offset dependent on the words would be too costly… so in IBM model 3, the offset only depends: on the position of the word within the sentence!!! the length of the sentences in both languages P(offset=o | Position = p, EngLen = m, FrLen = n)  ex: brown dog  offset of brown = P(offset| 1,2,2)  ex: P(+1| 1,2,2) =.3 P(0| 1,2,2) =.6 P(-1| 1,2,2) =.1

19 An Example (en|fr) Given the English Thebrowndogdidnot gohome Fertility Model Transformed English Thebrowndogdidnot gohome Translation Model Lebrunchienestn'pasalléàlamaison Offset Model A possible Translation Lechienbrunn'estpasalléàlahome Then use Language Model P(e) to evaluate fluency of all possible translations

20 Summary : IBM-3 for (en|fr) to find P(e|f), we need: 1. Language model for English P(e): P(wordE i | wordE i-1 ) 2. Translation model P(f|e): 1. Translation model per se: P(wordF | wordE) 1. Fertility model of English: P(Fertility=n | wordE) 2. Offset model for French: P(Offset=o | pos, lenF, lenE)

21 We need 3 things (for en|fr): 1. A Language Model of English: P(e)  Measures fluency  Probability of an English sentence  We can do this with an n-gram or PCFG  ~ Provides the right word ordering and collocations  ~ Provides a set of fluent sentences to test for potential translation 2. --> A Translation Model: P(f|e)  Measures faithfulness  Probability of an (French, English) pair  We can do this with text (word) alignment of parallel corpora  ~ Provides the right bag of words  ~Tests if a given fluent sentence is a translation 3. --> A Decoder: argmax  An effective and efficient search technique to find e*  Usually we use a heuristic search

22 We needed a decoder  we can compute P(e|f) for any given pair of (en,fr) sentences… that's nice  but: what we really want is to find the English sentence that maximises P(e|f) given a French sentence assume a vocabulary of 100,000 words in English there are 10 5n possible English sentences of length n.. and many alignments of each one, and many possible offsets …  we need a search algorithm (ex. A*)

23 3- Text alignment used to find P(f|e) not a trivial task Problems:  not always one sentence to one sentence translators do not always translate one sentence in the input into one sentence in the output although true in 90% of the cases.  crossing dependencies the order of sentences are changed in the translation.  Large pieces of material can disappear

24 Egyptian hieroglyphs Egyptian Demotic Greek carved in 196 BC found in 1799 decoded in 1822 The Rosetta Stone

25 A modern Rosetta Stone: TransSearch

26 Note:  Re-ordering of phrases  Disappearance of phrases (they are implied in the French version) Quand aux eaux minérales et aux limonades, Elles rencontrent toujours plus d’adeptes. En effet notre sondage fait ressortir des ventes nettement supérieures à celles de 1987, pour les boissons à base de cola notamment. According to our survey, 1988 sales of mineral water and soft drinks were much higher than in 1987, reflecting their growing popularity of these products. Cola drink manufacturers in particular achieved above average growth rate. Example

27 Aligning sentence and paragraph BEAD is a n:m grouping  S, T : text in two languages  S = (s 1, s 2, …, s i )  T = (t 1, t 2, …, t j )  Each sentence can occur in only one bead  Assume no crossing (but occurs in reality)  Most common (90%) 1:1  But also: 0:1, 1:0, 2:1, 1:2, 2:2, 2:3, 3:2 … s sis si t tjt tj ST b1b2b3b4b5..bkb1b2b3b4b5..bk

28 Quand aux eaux minérales et aux limonades, Elles rencontrent toujours plus d’adeptes. En effet notre sondage fait ressortir des ventes nettement supérieures à celles de 1987, pour les boissons à base de cola notamment. According to our survey, 1988 sales of mineral water and soft drinks were much higher than in 1987, reflecting their growing popularity of these products. Cola drink manufacturers in particular achieved above average growth rate. 2:2 alignment Example

29 Approaches to Text Alignment Length-Based Methods  short sentences will be translated with short sentences  long sentences will be translated with long sentences Offset Alignment by Signal Processing Techniques  do not attempt to align beads of sentences  just try to align position offsets in the two parallel texts Lexical Methods  use lexical information to align beads of sentences

30 Approaches to Text Alignment --> Length-Based Methods Offset Alignment by Signal Processing Techniques Lexical Methods

31 Rationale  Short sentence -> short sentence  Long sentence -> long sentence Length  nb of words or nb of characters Advantages:  Efficient (for similar languages)  Fast !

32 Length-based method Rationale: Short sentence -> short sentence / Long sentence -> long sentence Length: nb of words or nb of characters Advantages: Efficient (for similar languages) and fast! Gale and Church (1993):  Find alignment A with highest probability given the two parallel texts S and T.  Union Bank of Switzerland Corpus (English, French, German)  Let D(i,j) be the lowest cost alignment (the distance) between sentences s 1,…,s i and t 1,…,t j

33 Example cost(align(s 1, t 1 )) cost(align(s 2, t 2 )) cost(align(s 3,  )) t1t2t3t1t2t3 t1t2t3t1t2t3 s1s2s3s4s1s2s3s4 alignment 1 cost(align([s 1, s 2 ], t 1 )) cost(align(s 3, t 2 )) cost(align(s 4, t 3 )) + + alignment cost(align(s 4, t 3 )) L2L2 L1L1 + L1L1 Mean length ratio of sentences (nb of characters) in bead is ~1  German/English = 1.1 French/English = 1.06 Cost of an alignment  Calculate the difference (distance) between lengths of sentences in the beads  So as to minimize this distance  i.e. try to align beads so that the lengths of the sentences from the 2 languages in each bead are as similar as possible.

34 Results Gale and Church (1993)  use Dynamic Programming to efficiently consider all possible alignments and find the minimum cost alignment  method performs well (at least on related languages) 4% error rate only 2% error rate on 1:1 alignments higher error rate on more difficult alignments  Assumes paragraph alignment  Without a paragraph alignment, error rate triples

35 Approaches to Text Alignment Length-Based Methods --> Offset Alignment Lexical Methods

36 Offset alignment Length-based methods work well on clean texts but may break down in real-world situations  Ex: noisy text (OCR output with no clear sentence or paragraph boundaries,…) Church (1993)  Goal: Showing roughly what offset in one text aligns with what offset in the other.  uses cognates (words that are similar across languages) Ex: proper names, numbers, common ancestors…  Ex: Smith, , superior/supérieur  But: uses cognates at the level of character sequences NOT at the word level  Build a dot-plot

37 ST T the source and translated text are concatenated a square graph is made with this text on both axes a dot is placed at (x,y) when there is a match. [Unit= character 4-grams] Sample Dot Plot S Perfect match of a text with itself Match of a text with its translation (cognates) The small diagonals provide an alignment in terms of offsets in the two texts

38 Approaches to Text Alignment Length-Based Methods Offset Alignment by Signal Processing Techniques --> Lexical Methods

39 Lexical methods Align beads of sentences using lexical information Kay and Röscheisen (1993)  Idea: Use word alignment to help determine sentence alignment Then use sentence alignment to refine word alignment,…  Method: 1. Begin with start and end of text as anchors 2. Form an envelope of all possible alignments (no crossing of anchors) where: Possible alignments must be at a certain distance away from the anchors The distance increases as we get further away from the anchors 3. Choose pairs of words that co-occur in these potential alignments 4. Pick the best sentences involved in step 3 (having the most lexical correspondences) and use them as new anchors 5. Repeat steps 2-5

40 Example Sentences of language ● 2 ●●● 3 ●●●● 4 ●●●●● 5 ●●●●● 6 ●●●●●● 7 ●●●●●● 8 ●●●●●● 9 ●●●●●●● 10 ●●●●●● 11 ●●●●●● 12 ●●●●● 13 ●●●● 14 ●●●● 15 ●●● 16 ● Sentences of language 1

41 Example (con’t) Sentences of language ● 2 ●●● 3 ●●●● 4 ●●●● 5 ●●●●● 6 ●●●●●● 7 ●●●●●● 8 ●●●●●● 9 ●●●●●●● 10 ●●●●●● 11 ●●●●●● 12 ●●●●● 13 ●●●● 14 ●●●● 15 ●●● 16 ● Sentences of language 1

42 Example (con’t) Sentences of language ● 2 ●● 3 ●● 4 ● 5 ●●● 6 ●●●● 7 ●●●● 8 ●●● 9 ● 10 ●●● 11 ●●● 12 ●●●● 13 ●●●● 14 ●●● 15 ●●● 16 ● Sentences of language 1

43 Example (con’t) Sentences of language ● 2 ●● 3 ●● 4 ● 5 ●●● 6 ●●●● 7 ●●●● 8 ●●● 9 ● 10 ●●● 11 ●●● 12 ●●●● 13 ●●●● 14 ●●● 15 ●●● 16 ● Sentences of language 1

44 Example (con’t) Sentences of language ● 2 ●● 3 ● 4 ●● 5 ●●● 6 ●●●● 7 ● 8 ●● 9 ●●●● 10 ●●●● 11 ● 12 ●● 13 ●● 14 ●● 15 ●● 16 ● Sentences of language 1

45 Word Alignment Usually done in two steps: 1. Do sentence/text alignment 2. Select words from aligned pairs and use frequency or chi-square to see if they co-occur more frequently English: In the beginning God created the heavens and the earth. Vietnamese: Ban dâu Ðúc Chúa Tròi dung nên tròi dât. English: God called the expanse heaven. Vietnamese: Ðúc Chúa Tròi dat tên khoang không la tròi. English: … you are this day like the stars of heaven in number. Vietnamese: … các nguoi dông nhu sao trên tròi. Can also use an existing bilingual dictionary to start the word-alignment