1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Bilingual Lexical Acquisition From Comparable Corpora Andrea Mulloni.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
5/28/031 Data Intensive Linguistics Statistical Alignment and Machine Translation.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
1 Machine Translation (MT) Definition –Automatic translation of text or speech from one language to another Goal –Produce close to error-free output that.
Machine translation Context-based approach Lucia Otoyo.
Statistical Alignment and Machine Translation
Comparable Corpora Kashyap Popat( ) Rahul Sharnagat(11305R013)
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Dec 9,
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.
A Language Independent Method for Question Classification COLING 2004.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Malay-English Bitext Mapping and Alignment Using SIMR/GSA Algorithms Mosleh Al-Adhaileh Tang Enya Kong Mosleh Al-Adhaileh and Tang Enya Kong Computer Aided.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
5/28/031 Data Intensive Linguistics Statistical Alignment and Machine Translation.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Data Mining and Decision Support
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Keyword Translation Accuracy and Cross-Lingual Question Answering in Chinese and Japanese Teruko Mitamura Mengqiu Wang Hideki Shima Frank Lin In CMU EACL.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
An Adaptive Learning with an Application to Chinese Homophone Disambiguation from Yue-shi Lee International Journal of Computer Processing of Oriental.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Statistical Machine Translation Part II: Word Alignments and EM
Approaches to Machine Translation
Statistical NLP: Lecture 7
Statistical NLP: Lecture 13
Statistical NLP: Lecture 9
Approaches to Machine Translation
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
Presentation transcript:

1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation

2 Overview 4 MT is very hard: translation programs available today do not perform very well. 4 Different approaches to MT: –Word for Word –Syntactic Transfer Approaches – Semantic Transfer Approaches –Interlingua 4 Most MT systems are a mix of probabilistic and non-probabilistic components, though there are a few completely statistical translation systems.

3 Overview (Cont’d) 4 A large part of implementing an MT system [e.g., probabilistic parsing, word sense disambiguation] is not specific to MT and is discussed in other, more general chapters. 4 Nonetheless, parts of MT that are specific to it are: text alignment and word alignment. 4 Definition: In the sentence alignment problem, one seeks to say that some group of sentences in one language corresponds in content to some other group of sentences in another language. Such a grouping is referred to as a bead of sentences.

4 Overview of the Lecture 4 Text Alignment 4 Word Alignment 4 Fully Statistical Attempt at MT

5 Text Alignment: Aligning Sentences and Paragraphs 4 Text alignment is useful for bilingual lexicography, MT, but also as a first step to using bilingual corpora for other tasks. 4 Text alignment is not trivial because translators do not always translate one sentence in the input into one sentence in the output, although they do so in 90% of the cases. 4 Another problem is that of crossing dependencies, where the order of sentences are changed in the translation.

6 Different Approached to Text Alignment 4 Length-Based Approaches: short sentences will be translated as short sentences and long sentences as long sentences. 4 Offset Alignment by Signal Processing Techniques: these approaches do not attempt to align beads of sentences but rather just to align position offsets in the two parallel texts. 4 Lexical Methods: Use lexical information to align beads of sentences.

7 Length-Based Methods I: General Approach 4 Goal: Find alignment A with highest probability given the two parallel texts S and T: arg max A P(A|S, T)=argmax A P(A, S, T) 4 To estimate the above probabilities, the aligned text is decomposed in a sequence of aligned beads where each bead is assumed to be independent of the others. Then P(A, S, T)   k=1 K P(B k ). 4 The question, then, is how to estimate the probability of a certain type of alignment bead given the sentences in that bead.

8 Length-Based Methods II: Gale and Church, The algorithm uses sentence length (measured in characters) to evaluate how likely an alignment of some number of sentences in L1 is with some number of sentences in L2. 4 The algorithm uses a Dynamic Programming technique that allows the system to efficiently consider all possible alignments and find the minimum cost alignment. 4 The method performs well (at least on related languages). It gets a 4% error rate. It works best on 1:1 alignments [only 2% error rate]. It has a high error rate on more difficult alignments.

9 Length-Based Methods II: Other Approaches 4 Brown et al., 1991: Same approach as Gale and Church, except that sentence lengths are compared in terms of words rather than characters. Other difference in goal: Brown et al. Didn’t want to align entire articles but just a subset of the corpus suitable for further research. 4 Wu, 1994: Wu applies Gale and Church’s method to a corpus of parallel English and Cantonese Text. The results are not much worse than on related languages. To improve accuracy, Wu uses lexical cues.

10 Offset Alignment by Signal Processing Techniques I : Church, Church argues that length-based methods work well on clean text but may break down in real-world situations (noisy OCR or unknown markup conventions) 4 Church’s method is to induce an alignment by using cognates (words that are similar across languages) at the level of character sequences. 4 The method consists of building a dot-plot, i.e., the source and translated text are concatenated and then a square graph is made with this text on both axes. A dot is placed at (x,y) when there is a match. [Unit=4-grams].

11 Offset Alignment by Signal Processing Techniques II: Church, 1993 (Cont’d) 4 Signal processing methods are then used to compress the resulting plot. 4 The interesting part in a dot-plot is called the bitext maps. These maps show the correspondence between the two languages. 4 In the bitext maps, can be found faint, roughly straight diagonals corresponding to cognates. 4 A heuristic search along this diagonal provides an alignment in terms of offsets in the two texts.

12 Offset Alignment by Signal Processing Techniques III: Fung & McKeown, Fung and McKeown’s algorithm works: –Without having found sentence boudaries. –In only roughly parallel text (with certain sections missing in one language) –With unrelated language pairs. 4 The technique is to infer a small bilingual dictionary that will give points of alignment. 4 For each word, a signal is produced, as an arrival vector of integer numbers giving the numver of words between each occurrence of the word at hand.

13 Lexical Methods of Sentence Alignment I: Kay & Roscheisen, Assume the first and last sentences of the texts align. These are the initial anchors. 4 Then, until most sentences are aligned: 1. Form an envelope of possible alignments. 2. Choose pairs of words that tend to co-occur in these potential partial alignments. 3. Find pairs of source and target sentences which contain many possible lexical correspondences. The most reliable of these pairs are used to induce a set of partial alignments which will be part of the final result.

14 Lexical Methods of Sentence Alignment II: Chen, Chen does sentence alignment by constructing a simple word-to-word translation model as he goes along. 4 The best alignment is the one that maximizes the likelihood of generating the corpus given the translation model. 4 This best alignment is found by using dynamic programming.

15 Lexical Methods of Sentence Alignment III: Haruno & Yamazaki, Their method is a variant of Kay & Roscheisen (1993) with the following differences: –For structurally very different languages, function words impede alignment. They eliminate function words using a POS Tagger. –If trying to align short texts, there are not enough repeated words for reliable alignment using Kay & Roscheisen (1993). So they use an online dictionary to find matching word pairs

16 Word Alignment 4 A common use of aligned texts is the derivation of bilingual dictionaries and terminology databases. 4 This is usually done in two steps: First, the text alignment is extended to a word alignment. Then, some criterion, such as frequency is used to select aligned pairs for which there is enough evidence to include them in the bilingual dictionary. 4 Using a  2 measure works well unless one word in L1 occurs with more than one word in L2. Then, it is useful to assume a one-to-one correspondence. 4 Future work is likely to use existing bilingual dictionaries.

17 Fully Statistical MT I 4 MT has been attempted using a noisy channel model. Such a model requires: 4 A Language Model 4 A Translation Model 4 A Decoder 4 Translation Probabilities 4 An evaluation of the model found that only 48% of French sentences were translated correctly. The errors were either incorrect decodings or ungrammatical decodings.

18 Fully Statistical MT II: Problems with the Model 4 Fertility is Asymmetric 4 Independence Assumptions 4 Sensitivity to Training Data 4 Efficiency 4 No Notion of Phrases 4 Non-Local Dependencies 4 Morphology 4 Sparse Data Problems. 4 In summary, non-linguistic models are fairly successful for word alignments, but they fail for MT.