1 Machine Translation (MT) Definition –Automatic translation of text or speech from one language to another Goal –Produce close to error-free output that.

Slides:

Advertisements

Similar presentations

Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.

Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,

Multilingual Text Retrieval Applications of Multilingual Text Retrieval W. Bruce Croft, John Broglio and Hideo Fujii Computer Science Department University.

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Lecture 8: Statistical Alignment & Machine Translation (Chapter 13 of Manning & Schutze) Wen-Hsiang Lu (盧文祥) Department of Computer Science and Information.

Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.

Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.

Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.

C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.

MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.

Distributional clustering of English words Authors: Fernando Pereira, Naftali Tishby, Lillian Lee Presenter: Marian Olteanu.

Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.

1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.

5/28/031 Data Intensive Linguistics Statistical Alignment and Machine Translation.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.

Natural Language Understanding

Natural Language Processing Expectation Maximization.

Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓

Machine translation Context-based approach Lucia Otoyo.

Statistical Alignment and Machine Translation

Comparable Corpora Kashyap Popat( ) Rahul Sharnagat(11305R013)

1 Statistical NLP: Lecture 10 Lexical Acquisition.

Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:

12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

5/28/031 Data Intensive Linguistics Statistical Alignment and Machine Translation.

Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.

Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.

1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.

Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.

Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.

Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.

For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

23.3 Information Extraction More complicated than an IR (Information Retrieval) system. Requires a limited notion of syntax and semantics.

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Statistical Machine Translation Part II: Word Alignments and EM

Approaches to Machine Translation

Statistical NLP: Lecture 7

Statistical NLP: Lecture 13

Soft Cross-lingual Syntax Projection for Dependency Parsing

Deep Learning based Machine Translation

CSCI 5832 Natural Language Processing

Statistical NLP: Lecture 9

Approaches to Machine Translation

Statistical Machine Translation Papers from COLING 2004

Statistical NLP : Lecture 9 Word Sense Disambiguation

Statistical NLP: Lecture 10

Presentation transcript:

1 Machine Translation (MT) Definition –Automatic translation of text or speech from one language to another Goal –Produce close to error-free output that reads fluently in the target language –Far from it ? Current Status –Existing systems seem not good in quality Babel Fish Translation –A mix of probabilistic and non-probabilistic components

2 Issues Build high-quality semantic-based MT systems in circumscribed domains Abandon automatic MT, build software to assist human translators instead –Post-edit the output of a buggy translation Develop automatic knowledge acquisition techniques for improving general-purpose MT –Supervised or unsupervised learning

3 Different Strategies for MT English Text (word string) French Text (word string) English (syntactic parse) French (syntactic parse) English (semantic representation) French (semantic representation) Interlingua (knowledge representation) word-for-word syntactic transfer semantic transfer knowledge-based translation

4 Word for Word MT Translate words one-by-one from one language to another –Problems 1. No one-to-one correspondence between words in different languages (lexical ambiguity) –Need to look at the context larger than individual word (→ phrase or clause ) 2. Languages have different word orders 1950 English French suit lawsuit, set of garments meanings

5 Syntactic Transfer MT Parse the source text, then transfer the parse tree of the source text into a syntactic tree in the target language, and then generate the translation from this syntactic tree –Solve the problems of word ordering –Problems Syntactic ambiguity The target syntax will likely mirror that of the source text German: Ich esse gern ( I like to eat ) English: I eat readily/gladly N V Adv

6 Semantic Transfer MT Represent the meaning of the source sentence and then generate the translation from the meaning –Fix cases of syntactic mismatch –Problems Still be unnatural to the point of being unintelligible Difficult to build the translation system for all pairs of languages Spanish: La botella entró a la cueva flotando (The bottle floated into the cave) English: The bottle entered the cave floating (In Spanish, the direction is expressed using the verb and the manner is expressed with a separate phrase)

7 Knowledge-Based MT The translation is performed by way of a knowledge representation formulism called “ interlingua” –Independence of the way particular languages express meaning Problems –Difficult to design an efficient and comprehensive knowledge representation formulism –Large amount of ambiguity needed to be solved to translate from a natural language to a knowledge representation language

8 Text Alignment: Definition Definition –Align paragraphs, sentences or words in one language to paragraphs, sentences or words in another languages Thus can learn which words tend to be translated by which other words in another language –Is not part of MT process per se But the obligatory first step for making use of multilingual text corpora Applications –Bilingual lexicography –Machine translation –Multilingual information retrieval –… bilingual dictionaries, MT, parallel grammars …

9 Text Alignment: Sources and Granularities Sources of Parallel texts or bitexts –Parliamentary proceedings (Hansards) –Newspapers and magazines –Religious and literary works Two levels of alignment –Gross large scale alignment Learn which paragraphs or sentences correspond to which paragraphs or sentences in another language –Word alignment Learn which words tend to be translated by which words in another language The necessary step for acquiring a bilingual dictionary with less literal translation Orders of word or sentence might not be preserved.

10 Text Alignment: Example 1 2:2 alignment

11 Text Alignment: Example 2 2:2 alignment 1:1 alignment 2:1 alignment a bead/a sentence alignment Studies show that around 90% of alignments are 1:1 sentence alignment.

12 Sentence Alignment Crossing dependencies are not allowed here –Word ordering is preserved ! Related work

13 Sentence Alignment Length-based Lexical-guided Offset-based

14 Sentence Alignment Length-based method Rationale: the short sentences will be translated as short sentences and long sentences as long sentences –Length is defined as the number of words or the number of characters Approach 1 (Gale & Church 1993) –Assumptions The paragraph structure was clearly marked in the corpus, confusions are checked by hand Lengths of sentences measured in characters Crossing dependences are not handled here –The order of sentences are not changed in the translation s1s2s3s4...sIs1s2s3s4...sI t1t2t3t4....tJt1t2t3t4....tJ Ignore the rich information available in the text. Union Bank of Switzerland (UBS) corpus : English, French, and German

15 Sentence Alignment Length-based method Most cases are 1:1 alignments.

16 Sentence Alignment Length-based method t1t2t3t4....tJt1t2t3t4....tJ s1s2s3s4...sIs1s2s3s4...sI B1B1 B2B2 B3B3 BkBk source target possible alignments: {1:1, 1:0, 0:1, 2:1,1:2, 2:2,…} a bead probability independence between beads Source Target

17 Sentence Alignment Length-based method –Dynamic Programming The cost function (Distance Measure) Sentence is the unit of alignment Statistically modeling of character lengths square difference of two paragraphs is a normal distribution Ratio of texts in two languages Bayes’ Law The prob. distribution of standard normal distribution

18 Sentence Alignment Length-based method The priori probability Source Target s i-1 sisi s i-2 tjtj t j-1 t j-2 Or P(α align)

19 Sentence Alignment Length-based method –A simple example s1s1 s2s2 s3s3 s4s4 t1t1 t2t2 t3t3 t1t1 t2t2 t3t3 L 1 alignment 2 L 1 alignment 1 cost(align(s 1, t 1 )) + cost(align(s 2, t 2 )) + cost(align(s 3,Ø)) + cost(align(s 4, t 3 )) cost(align(s 1, s 2, t 1 )) + cost(align(s 3, t 2 )) + cost(align(s 4, t 3 ))

20 Sentence Alignment Length-based method –The experimental results

21 Sentence Alignment Length-based method –4% error rate was achieved –Problems: Can not handle noisy and imperfect input –E.g., OCR output or file containing unknown markup conventions –Finding paragraph or sentence boundaries is difficult –Solution: just align text (position) offsets in two parallel texts (Church 1993) Questionable for languages with few cognates or different writing systems –E.g., English ←→ Chinese eastern European languages ←→ Asian languages

22 Sentence Alignment Length-based method Approach 2 (Brown 1991) –Compare sentence length in words rather than characters However, variance in number of words us greater than that of characters –EM training for the model parameters Approach 3 (Wu 1994) –Apply the method of Gale and Church(1993) to a corpus of parallel English and Cantonese text –Also explore the use of lexical cues

23 Sentence Alignment Lexical method Rationale: the lexical information gives a lot of confirmation of alignments –Use a partial alignment of lexical items to induce the sentence alignment –That is, a partial alignment at the word level induces a maximum likelihood at the sentence level –The result of the sentence alignment can be in turn to refine the word level alignment

24 Sentence Alignment Lexical method Approach 1 (Kay and Röscheisen 1993) –First assume the first and last sentences of the text were align as the initial anchors –Form an envelope of possible alignments Alignments excluded when sentences across anchors or their respective distance from an anchor differ greatly –Choose word pairs their distributions are similar in most of the sentences –Find pairs of source and target sentences which contain many possible lexical correspondences The most reliable of pairs are used to induce a set of partial alignment (add to the list of anchors) Iterations

25 Sentence Alignment Lexical method Approach 1 –Experiments On Scientific American articles –96% coverage achieved after 4 iterations, the reminders is 1:0 and 0:1 matches On 1000 Hansard sentences –Only 7 errors (5 of them are due to the error of sentence boundary detection) were found after 5 iterations –Problem If a large text is accompanied with only endpoints for anchors, the pillow must be set to large enough, or the correct alignments will be lost –Pillow is treated as a constraint

26 Sentence Alignment Lexical method Approach 2 (Chen 1993) –Sentence alignment is done by constructing a simple word-to- word alignment –Best alignment is achieved by maximizing the likelihood of the corpus given the translation model –Like the method proposed by Gale and Church(1993), except that a translation model is used to estimate the cost of a certain alignment The translation model

27 Sentence Alignment Lexical method Approach 3 (Haruno and Yamazaki, 1996) –Function words are left out and only content words are used for lexical matching –Part-of-speech taggers are needed –For short text, an on-line dictionary is used instead of the finding of word correspondences adopted by Kay and Röscheisen (1993)

28 Offset Alignment Perspective –Do not attempt to align beads of sentences but just align position offsets in two parallel texts –Avoid the influence of noises or confusions in texts Can alleviate the problems caused by the absence of sentence markups Approach 1: (Church 1993) –Induce an alignment by cognates, proper nouns, numbers, etc. Cognate words: words similar across languages Cognate words share ample supply of identical character sequences between source and target languages –Use DP to find a alignment for the occurrence of matched character 4-grams along the diagonal line

29 Offset Alignment Approach 1 –Problem Fail completely when language with different character sets (English ←→Chinese) Matched n-grams Source Text Target Text

30 Offset Alignment Approach 2: (Fung and McKeown 1993) –Two-sage processing –First stage (to infer a small bilingual dictionary) For each word a signal is produced, as an arrive vector of integer number of words between each occurrence –E.g., word appears in offsets (1, 263, 267, 519) has an arrival vector (262,4,252) Perform Dynamic Time Warping to match the arrival vectors of two English and Cantonese words to determine the similarity relations Pairs of an English word and Cantonese word with very similar signals are retained in the dictionary –Properties Genuinely language independent Sensitive to lexical content

31 Offset Alignment Approach 2: (Fung and McKeown 1993) –Second stage Use DP to find a alignment for the occurrence of strongly- related word pairs along the diagonal line Matched word pairs Source Text Target Text

32 Sentence/Offset Alignment: Summary

33 Word Alignment The sentence/offset alignment can be extended to a word alignment Some criteria are then used to select aligned word pairs to include them into the bilingual dictionary –Frequency of word correspondences –Association measures –….

34 Statistical Machine Translation The noisy channel model –Assumptions: An English word can be aligned with multiple French words while each French word is aligned with at most one English word Independence of the individual word-to-word translations Language ModelTranslation ModelDecoder e: English f: French |e|=l |f|=m

35 Statistical Machine Translation Three important components involved –Language model Give the probability p(e) –Translation model –Decoder normalization constant all possible alignments (the English word that a French word f j is aligned with) translation probability

36 Statistical Machine Translation EM Training –E-step –M-step Number of times that occurred in the English sentences while in the corresponding French sentences