Machine Translation III Empirical approaches to MT: Example-based MT Statistical MT LELA30431/chapter50.pdf.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Example-based Machine Translation The other corpus-based approach to MT.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
CSA4050: Advanced Topics in NLP Example Based MT.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Machine Translation Course 9 Diana Trandab ă ț Academic year
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
1 Historical Developments of Translation Technology (TT) widespread use of fax machines, enabling translation services to operate internationally 1980s.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
1 Session 1 Advantages and Disadvantages of Translation Technology (TT) - Historical development of translation technology - Focus on TM and MT (Theory.
Statistical Machine Translation. General Framework Given sentences S and T, assume there is a “translator oracle” that can calculate P(T|S), the probability.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Example-Based Machine Translation Kevin Duh April 21, 2005 UW Machine Translation Reading Group.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Language Translators By: Henry Zaremba. Origins of Translator Technology ▫1954- IBM gives a demo of a translation program called the “Georgetown-IBM experiment”
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
What a professional translator should know about Machine Translation Harold Somers Professor Emeritus University of Manchester.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
Machine translation Context-based approach Lucia Otoyo.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Introduction to CL & NLP CMSC April 1, 2003.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
February 2006Machine Translation II.21 Postgraduate Diploma In Translation Example Based Machine Translation Statistical Machine Translation.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Introduction Chapter 1 Foundations of statistical natural language processing.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.
Natural Language Processing Statistical Inference: n-grams
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Neural Machine Translation
Approaches to Machine Translation
Statistical NLP: Lecture 13
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Approaches to Machine Translation
Information Retrieval
Presentation transcript:

Machine Translation III Empirical approaches to MT: Example-based MT Statistical MT LELA30431/chapter50.pdf

2/30 Introduction Empirical approaches: what does that mean? –Empirical vs rationalist –Data-driven vs rule-driven Pure empiricism: statistical MT Hybrid empiricism: Example-based MT

3/30 Empirical approaches Approaches based on pure data Contrast with “rationalist” approach: rule-based systems of “2nd generation” Larger storage, faster processors, and availability of textual data in huge quantities suggest data-driven approach may be possible “Data” here means just raw text

4/30 Flashback Early thoughts on MT (Warren Weaver 1949) included possibility that translation was like code-breaking (cryptanalysis). Weaver – with Claude Shannon – invented “information theory” Given enough data, patterns could be identified and applied to new text

5/30 Back to the future Data-driven approach encouraged by availability of machine-readable parallel text, notably at first Canadian and Hong Kong Hansards, then EU documents, and dual-language web pages Two basic approaches: –Statistical MT –Example-based MT

6/30 Example-based MT “Translation by analogy” First proposed by Nagao (1984) but not implemented until early 1990s Very intuitive: translate text on the basis of recognising bits that have been previously translated, and sticking them together –Cf tourist phrasebook approach

7/30 Example-based MT Like an extension of Translation Memory Based on a database of translation examples System finds closely matching previous example(s) (unlike TM) identifies the corresponding fragments in the target text(s) (align) And recombines them to give the target text

8/30 He buys a book on international politics Input Matches He buys a notebook. Kare wa n ō to o kau. I read a book on international politics. Watashi wa kokusai seiji nitsuite kakareta hon o yomu. Result Kare wa o kau. kokusai seiji nitsuite kakareta hon Example (Sato & Nagao 1990)

9/30 Learning templates The monkey ate a peach.  saru wa momo o tabeta. The man ate a peach.  hito wa momo o tabeta. monkey  saru man  hito The … ate a peach.  … wa momo o tabeta. The dog ate a rabbit.  inu wa usagi o tabeta. dog  inu rabbit  usagi The … ate a ….  … wa … o tabeta. The dog ate a peach.  inu wa momo o tabeta.

10/30 Some problems include … Source of examples –Genuine text or hand-crafted? Identifying matching fragments –Preprocessed storage implication Prejudge what will be useful –“on the fly” – needs a dictionary Partial matching Sticking fragments together (boundary friction) Conflicting/multiple examples

11/30 Partial matching The operation was interrupted because the file was hidden. a. The operation was interrupted because the Ctrl-c key was pressed. b. The specified method failed because the file is hidden. c. The operation was interrupted by the application. d. The requested operation cannot be completed because the disk is full.

12/30 Boundary friction (1) Consider again: He buys a book on politics Matches He buys a notebook. Kare wa n ō to o kau. I read a book on politics. Watashi wa seiji nitsuite kakareta hon o yomu. He buys a pen. Kare wa pen o kau. She wrote a book on politics. Kanojo wa seiji nitsuite kakareta hon o kaita. Result Kare wa o kau Kare wa o kau. wa seiji nitsuite kakareta hon o

13/30 Boundary friction (2) Input: The handsome boy entered the room Matches: The handsome boy ate his breakfast. I saw the handsome boy. Der schöne Junge aß sein Frühstück Ich sah den schönen Jungen.

14/30 Competing examples In closing, I will say that I am sad for workers in the airline industry. My colleague spoke about the airline industry. People in the airline industry have become unemployed. This tax will cripple some of the small companies in the airline industry. En terminant, je dirai que c’est triste pour les travailleurs et les travailleuses du secteur de l’aviation. Mon collègue a parlé de l’industrie du transport aérien. Des gens de l’industrie aérienne sont devenus chômeurs. Cette surtaxe va nuire aux petits transporteurs aériens. Results from Canadian Hansard using TransSearch

15/30 Statistical MT Pioneered by IBM in early 1990s Spurred on by better success in speech recognition of statistical over linguistic rule- based approaches Idea that translation can be modelled as a statistical process Seems to work best in limited domain where given data is a good model of future translations

16/30 Translation as a probabilistic problem For a given SL sentence S i, there are  number of “translations” T of varying probability Task is to find for S i the sentence T j for which the probability P(T j | S i ) is the highest

17/30 Two models P(T j | S i ) is a function of two models: –The probabilities of the individual words that make up T j given the individual words in S i - the “translation model” –The probability that the individual words that make up T j are in the appropriate order – “the language model”

18/30 Expressed in mathematical terms: Since S is a given, and constant, this can be simplified as Translation modelLanguage model

19/30 So how do we translate? For a given input sentence S i we have to have a practical way to find the T j that maximizes the formula We have to start somewhere, so we start with the translation model: which words look most likely to help us? In a systematic way we can keep trying different combinations together with the language model until we stop getting improvements

20/30 Input sentence Translation model Bag of possible words Most probable translation Seek improvement by trying other combinations Language model

21/30 Where do the models come from? All the statistical parameters are pre- computed, based on a parallel corpus Language model is probabilities of word sequences (n-grams) Translation model is derived from aligned parallel corpus

22/30 The translation model Take sentence-aligned parallel corpus Extract entire vocabulary for both languages For every word-pair, calculate probability that they correspond – e.g. by comparing distributions

23/30 Some obvious problems “fertility”: not all word correspondences are 1:1 –Some words have multiple possible translations, e.g. the  {le, la, l’, les} –Some words have no translation, e.g. in il se rase ‘he shaves’, se  –Some words are translated by several words, e.g. cheap  peu cher –Not always obvious how to align

24/30 The proposal will not now be implemented Les propositions ne seront pas mises en application maintenant The ~ Les proposal ~ propositions will ~ seront now ~ maintenant implemented ~ mises en application be ~  not ~ ne…pas will not ~ ne seront pas } many:many not allowed; only 1:n (n  0) and in practice, n<3 

25/30 Some word-pair probabilities from Canadian Hansard French Pfertility P le la l’ les.023 ce.013 il.012 de.009 à.007 que.007 ‘the’ French Pfertility P pas ne non faux.006 plus.002 ce.002 que.002 jamais.002 ‘not’ French Pfertility P bravo entendre entendu.002 entende.001 ‘hear’

26/30 Another problem: distortion Notice that corresponding words do not appear in the same order. The translation model includes probabilities for “distortion” –e.g. P(2|5): the P that w s in position 2 will produce a w t in position 5 –can be more complex: P(5|2,4,6): the P that w s in position 2 will produce a w t in position 5 when S has 4 words and T has 6.

27/30 The language model Impractical to calculate probability of every word sequence: –Many will be very improbable … –Because they are ungrammatical –Or because they happen not to occur in the data Probabilities of sequences of n words (“n- grams”) more practical –Bigram model: where P(w i |w i–1 )  f(w i–1, w i )/f(w i )

28/30 Sparse data Relying on n-grams with a large n risks 0- probabilities Bigrams are less risky but sometimes not discriminatory enough –e.g. I hire men who is good pilots 3- or 4-grams allow a nice compromise, and if a 3-gram is previously unseen, we can give it a score based on the component bigrams (“smoothing”)

29/30 Put it all together and …? To build a statistical MT system we need: –Aligned bilingual corpus –“Training programs” which will extract from the corpora all the statistical data for the models –A “decoder” which takes a given input, and seeks the output that evaluates the magic argmax formula – based on a heuristic search algorithm Software for this purpose is freely available –e.g. Claim is that an MT system for a new language pair can be built in a matter of hours

30/30 SMT latest developments Nevertheless, quality is limited SMT researchers quickly learned (just like in the 1960s) that this crude approach can get them so far (quite far actually), but that to go the extra distance you need linguistic knowledge (eg morphology, “phrases”, consitutents) Latest developments aim to incorporate this Big difference is that it too can be LEARNED (automatically) from corpora So SMT still contrasts with traditional RBMT where rules are “hand coded” by linguists