Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Translation Models Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some.
Greedy best-first search Use the heuristic function to rank the nodes Search strategy –Expand node with lowest h-value Greedily trying to find the least-cost.
GIZA ++ A review of how to run GIZA++ By: Bridget McInnes
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Machine Translation Course 9 Diana Trandab ă ț Academic year
Word Alignment Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some slides.
Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,
Data Mining Techniques Outline
1 Duluth Word Alignment System Bridget Thomson McInnes Ted Pedersen University of Minnesota Duluth Computer Science Department 31 May 2003.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Gobalisation Week 8 Text processes part 2 Spelling dictionaries Noisy channel model Candidate strings Prior probability and likelihood Lab session: practising.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
Image Annotation and Feature Extraction
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
Maximum likelihood estimation of relative transcript abundances Advanced bioinformatics 2012.
Statistical Machine Translation Or, How to Put a Lot of Work Into Getting the Wrong Answer Timothy White Presentation For CSC CSC 9010: Natural Language.
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
Statistical Machine Translation Part III – Phrase-based SMT Alexander Fraser CIS, LMU München WSD and MT.
Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.
Korea Maritime and Ocean University NLP Jung Tae LEE
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein – UC Berkeley.
(Statistical) Approaches to Word Alignment
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation Course 4 Diana Trandab ă ț Academic year:
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
January 2012Spelling Models1 Human Language Technology Spelling Models.
Computational Linguistics Seminar LING-696G Week 6.
Recap: distributional hypothesis What is tezgüino? – A bottle of tezgüino is on the table. – Everybody likes tezgüino. – Tezgüino makes you drunk. – We.
Computational Linguistics Seminar LING-696G Week 8.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
Computational Linguistics Seminar LING-696G Week 9.
Spring 2010 Lecture 2 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn, Kevin Knight, Chris Quirk LING 575: Seminar on statistical machine.
Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.
Statistical Machine Translation Part II: Word Alignments and EM
CSE 517 Natural Language Processing Winter 2015
Statistical NLP Spring 2011
Statistical NLP: Lecture 13
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Expectation-Maximization Algorithm
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
Machine Translation and MT tools: Giza++ and Moses
Word Alignment David Kauchak CS159 – Fall 2019 Philipp Koehn
Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky
Machine Translation and MT tools: Giza++ and Moses
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Presentation transcript:

Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU

Translation Model 

Lexical (Word) Translation  How to translate a word?  Dictionary look up: Haus: house, building, home, household, shell  Multiple translations: some more frequent than others  How do we determine probabilities for possible candidate translations?  Collect statistics from a parallel corpus: Translation of HausCount house8,000 building1,600 home200 household150 shell50

Estimate Translation Probabilities  Use relative frequencies to estimate probabilities P(s|t) = 0.8, if t = house 0.16, if t = building 0.02, if e = home 0.015, if e = household 0.005, if e = shell Translation of HausCount house8,000 building1,600 home200 household150 shell50 Total10,000

Alignment  das Haus ist klein the house is small

Reordering  klein ist das Haus the house is small

One-to-many, one-to-none  das Haus ist klitzeklein the house is very small das Haus ist klein house is small

Inserting words  NULL das Haus ist klein the house is just small

Translation Process as String Re-Writing SMT Translation Model takes these alignment characteristics into account: John did not slap the green witch John not slap slap slap the green witch John no daba una botefada la verde bruja John no daba una botefada a la verde bruja John no daba una botefada a la bruja verde One-to-many, many-to-none FERTITLITY TRANSLATION INSERTION Reordering DISTORTION IBM Model 3

Translation Model Parameters (1/3)  Translation Model takes these characteristics into account, modelling them using different parameters.  t: Lexical / word-to-word translation parameters  t(house|Haus)  t(building|Haus)…  i.e. what is the probability that “Haus” will produce the English word house/building whenever “Haus” appears?  n: Fertility parameters  n(1|klitzklein)  n(2|klitzklein) …  i.e. what is the probability that “klitzklein” will produce exactly 1/2… English words?

Translation Model Parameters (2/3)  d: Distortion parameters  d(2|2)  d(3|2)  i.e. what is the probability that the German word in position 2 of the German sentence will generate an English word that ends up in position 2/3 of an English translation?  Enhanced distortion scheme takes into account the lengths of the German and English sentences:  d(3|2,4,6): Same as for d(3|2), except we also specify that the given German string has 4 words and the given English string has 6 words  We also have word-translation parameters corresponding to insertions:  t( just | NULL) = ?  i.e. what is the probability that the English word just is inserted into the English string?  Insertion strategy: Pretend that each German sentence begins with the invisible word NULL

Translation Model Parameters: Insertion  p: set a single parameter p1 and use it as follows:  Assign fertilities to each word in the German string  At this point we are ready to start translating these German words into English words  As each word is translated, we insert an English word into the target string with probability p1  The probability p0 of not inserting an extra word is given as: p0 = 1 – p1  What about distortion parameters for inserted words?  Overly-simplistic to say that NULL will generate a word at position X rather than somewhere else – insertions are unpredictable. Instead:  Generate all English words predicted by actually occurring German words (i.e. not NULL)  Position these English words according to distortion parameters  Then, generate possible insertion words and position them in the spaces left over  i.e. if there are 3-NULL generated words and 3 left-over slots, then there are 3!=6 ways of inserting, all which we assign an equal probability of 1/6

Summary of Translation Model Parameters FERTITLITY TRANSLATION INSERTION DISTORTION nTable plotting source words against fertilities tTable plotting source words against target words p1Single number indicating the probability of insertion dTable plotting source string positions against target string positions

Learning Translation Models  How can we automatically acquire parameter values for t, n, d and p from data?  If we had a set of source language strings (e.g. German) and for each of those strings a sequence of step-by-step rewritings into English… problem solved!  Fairly unlikely to have this type of data  If we had a set of word alignments, we could estimate the parameters of our generative translation model  If we had the parameters, we could estimate the alignments  Chicken & Egg problem  How can collect estimates from non-aligned data?  Expectation Maximization Algorithm (EM)  We can gather information incrementally, each new piece helping us build the next.

Expectation Maxmization Algorithm  Incomplete Data  If we had complete data, we could estimate the model  If we had a model we could fill in the gaps in the data  i.e. if we had a rough idea about which words correspond, then we could use this knowledge to infer more data  EM in a nutshell:  Initialise model parameters (i.e. uniform)  Assign probabilities to the missing data  Estimate model parameters from completed data  Iterate