Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Machine Translation II How MT works Modes of use.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation III Empirical approaches to MT: Example-based MT Statistical MT LELA30431/chapter50.pdf.
Machine Translation Course 9 Diana Trandab ă ț Academic year
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Latest Developments in (S)MT Harold Somers University of Manchester MT Wars II: The Empire (Linguistics) strikes back.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Microsoft Research Faculty Summit Robert Moore Principal Researcher Microsoft Research.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
Natural Language Processing Expectation Maximization.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
Machine translation Context-based approach Lucia Otoyo.
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
1 Montague Grammar and MT Chris Brew, The Ohio State University
February 2006Machine Translation II.21 Postgraduate Diploma In Translation Example Based Machine Translation Statistical Machine Translation.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.
Neural Machine Translation
Statistical Machine Translation Part II: Word Alignments and EM
Approaches to Machine Translation
CSE 517 Natural Language Processing Winter 2015
Statistical NLP: Lecture 13
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Approaches to Machine Translation
Statistical Machine Translation Papers from COLING 2004
Presentation transcript:

Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)

2/38 Parallel corpora Corpora of texts and their translations Basic idea that such parallel corpora implicitly contain lots of information about translation equivalence Nowadays many such “bitexts” are available –bilingual countries have laws, parliamentary proceedings, and other documents –large multinational organizations (UN, EU [Europarl corpus], etc.) –multinational commercial organizations produce multilingual texts

3/38 Bilingual concordance Source: TransSearch, Laboratoire de Recherche Appliquée en Linguistique Informatique, Université de Montréal

4/38 Parallel corpora Usually not corpora in the strict sense (planned, annotated, etc.) Usefulness may depend on –the quality of translation –the closeness of translation –whether we have a text and its translation, or a multilingually authored text –the language pair Parallel corpus needs to be aligned

5/38 Alignment Means annotating the bilingual corpus to show explicitly the correspondences –at sentence level –at word and phrase level Main difficulty for sentence alignment is that translations do not always keep sentence boundaries, or even sentence order In addition, translation may be “localized” and therefore not especially faithful

6/38 Sentence-level alignment If parallel corpus is quite a literal translation, this can be done using quite low-level information –sentence length –looking for anchors proper names, dates, figures eg in a parliamentary debate, speakers’ names

7/38 Alignment tools

8/38 Corpus-based MT Translation memory (tool for translators) –database of previous translations –find close matching examples to current translation unit –translator decides what to do with it

9/38 Note that translator has to know/decide what bits of the target sentence to change

10/38 Corpus-based MT Translation memory (tool for translators) –database of previous translations –find close matching examples to current translation unit –translator decides what to do with it Example-based translation –similar idea, but computer program tries to manipulate example(s) –may involve “learning” general rules from multiple examples

11/38 Statistical MT Pioneered by IBM in early 1990s Spurred on by better success in speech recognition of statistical over linguistic rule- based approaches Idea that translation can be modelled as a statistical process Seems to work best in limited domain where given data is a good model of future translations

12/38 Translation as a probabilistic problem For a given SL sentence S i, there are  number of “translations” T of varying probability Task is to find for S i the sentence T j for which the probability P(T j | S i ) is the highest

13/38 Two models P(T j | S i ) is a function of two models: –The probabilities of the individual words that make up T j given the individual words in S i - the “translation model” –The probability that the individual words that make up T j are in the appropriate order – the “language model”

14/38 Expressed in mathematical terms: Since S is a given, and constant, this can be simplified as Translation modelLanguage model

15/38 So how do we translate? For a given input sentence S i we have to have a practical way to find the T j that maximizes the formula We have to start somewhere, so we start with the translation model: which words look most likely to help us? In a systematic way we can keep trying different combinations together with the language model until we stop getting improvements

16/38 Input sentence Translation model Bag of possible words Most probable translation Seek improvement by trying other combinations Language model

17/38 Where do the models come from? All the statistical parameters are pre-computed (“learned”), based on a parallel corpus Language model is probabilities of word sequences (n-grams) Translation model is derived from aligned parallel corpus This approach is attractive to some as an example of “machine learning” –The computer learns to translate (just) from seeing previous examples of translation

18/38 The translation model Take sentence-aligned parallel corpus Extract entire vocabulary for both languages For every word-pair, calculate probability that they correspond – e.g. by comparing distributions

19/38 Problem: fertility “fertility”: not all word correspondences are 1:1 –Some words have multiple possible translations, e.g. the  {le, la, l’, les} –Some words have no translation, e.g. in il se rase ‘he shaves’, se  –Some words are translated by several words, e.g. cheap  peu cher –Not always obvious how to align

20/38 Problem: distortion Notice that corresponding words do not appear in the same order. The translation model includes probabilities for “distortion” –e.g. P(2|5): the P that w s in position 2 will produce a w t in position 5 –can be more complex: P(5|2,4,6): the P that w s in position 2 will produce a w t in position 5 when S has 4 words and T has 6.

21/38 The language model Impractical to calculate probability of every word sequence: –Many will be very improbable … –Because they are ungrammatical –Or because they happen not to occur in the data Probabilities of sequences of n words (“n- grams”) more practical –Bigram model: where P(w i |w i–1 )  f(w i–1, w i )/f(w i )

22/38 Sparse data Relying on n-grams with a large n risks 0- probabilities Bigrams are less risky but sometimes not discriminatory enough –e.g. I hire men who is good pilots 3- or 4-grams allow a nice compromise, and if a 3-gram is previously unseen, we can give it a score based on the component bigrams (“smoothing”)

23/38 Put it all together and …? To build a statistical MT system we need: –Aligned bilingual corpus –“Training programs” which will extract from the corpora all the statistical data for the models –A “decoder” which takes a given input, and seeks the output that evaluates the magic argmax formula – based on a heuristic search algorithm Software for this purpose is freely available – Claim is that an MT system for a new language pair can be built in a matter of hours

24/38 SMT latest developments Nevertheless, quality is limited SMT researchers quickly learned that this crude approach can get them so far (quite far actually), but that to go the extra distance you need linguistic knowledge (eg morphology, “phrases”, consitutents) Latest developments aim to incorporate this Big difference is that it too can be LEARNED (automatically) from corpora So SMT still contrasts with traditional RBMT where rules are “hand coded” by linguists

25/38 Direct phrase alignment (Wang & Waible 1998, Och et al., 1999, Marcu & Wong 2002) Enhance word translation model by adding joint probabilities, i.e. probabilities for phrases Phrase probabilities compensate for missing lexical probabilities Easy to integrate probabilities from different sources/methods, allows for mutual compensation

26/38 Word alignment induced model Koehn et al. 2003; example stolen from Knight & Koehn Maria did not slap the green witch Maria no daba una botefada a la bruja verda Start with all phrase pairs justified by the word alignment

27/38 Word alignment induced model Koehn et al. 2003; example stolen from Knight & Koehn (Maria, Maria), (no, did not) (daba una botefada, slap), (a la, the), (verde, green), (bruja, witch)

28/38 Word alignment induced model Koehn et al. 2003; example stolen from Knight & Koehn (Maria, Maria), (no, did not) (daba una botefada, slap), (a la, the), (verde, green) (bruja, witch), (Maria no, Maria did not), (no daba una botefada, did not slap), (daba una botefada a la, slap the), (bruja verde, green witch) etc.

29/38 Word alignment induced model Koehn et al. 2003; example stolen from Knight & Koehn (Maria, Maria), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch), (verde, green), (Maria no, Maria did not), (no daba una bofetada, did not slap), (daba una bofetada a la, slap the), (bruja verde, green witch), (Maria no daba una bofetada, Maria did not slap), (no daba una bofetada a la, did not slap the), (a la bruja verde, the green witch), (Maria no daba una bofetada a la, Maria did not slap the), (daba una bofetada a la bruja verde, slap the green witch), (no daba una bofetada a la bruja verde, did not slap the green witch), (Maria no daba una bofetada a la bruja verde, Maria did not slap the green witch)

30/38 Alignment templates Och et al. 1999; further developed by Marcu and Wong 2002, Koehn and Knight 2003, Koehn et al. 2003) Problem of sparse data worse for phrases So use word classes instead of words –alignment templates instead of phrases –more reliable statistics for translation table –smaller translation table –more complex decoding Word classes are induced (by distributional statistics), so may not correspond to intuitive (linguistic) classes Takes context into account

31/38 Problems with phrase-based models Still do not handle very well... –dependencies (especially long-distance) –distortion –discontinuities (e.g. bought = habe... gekauft ) More promising seems to be...

32/38 Syntax-based SMT Better able to handle –Constituents –Function words –Grammatical context (e.g. case marking) Inversion Transduction Grammars Hierarchical transduction model Tree-to-string translation Tree-to-tree translation

33/38 Inversion transduction grammars Wu and colleagues (1997 onwards) Grammar generates two trees in parallel and mappings between them Rules can specify order changes Restriction to binary rules limits complexity

34/38 Inversion transduction grammars

35/38 Inversion transduction grammars Grammar is trained on word-aligned bilingual corpus: Note that all the rules are learned automatically Translation uses a decoder which effectively works like traditional RBMT: –Parser uses source side of transduction rules to build a parse tree –Transduction rules are applied to transform the tree –The target text is generated by linearizing the tree

36/38

37/38

38/38 Other approaches Other approaches use more and more “linguistic” information In each case automatically learned, especially from treebanks Traditional (“rule-based”) MT used (hand- written) grammars and lexicons State-of-the-art MT is moving back in this direction, except that linguistic rules are machine learned