Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Comment est ta maison? Level 1 Level 2 Level 3.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Clustering Beyond K-means
Word Alignment Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some slides.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Expectation Maximization Algorithm
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Learning Bayesian Networks
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011.
Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 9, Friday June 15 th, 2007 (EM.
Chapter 7: Sampling Distributions
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.
School of Computer Science & Information Technology G6DICP - Lecture 9 Software Development Techniques.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Korea Maritime and Ocean University NLP Jung Tae LEE
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Week 6.  Lab 1 and 2 results  Common mistakes in Style  Lab 1 common mistakes in Design  Lab 2 common mistakes in Design  Tips on PE preparation.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Computational Linguistics Seminar LING-696G Week 6.
Cognitive Computer Vision 3R400 Kingsley Sage Room 5C16, Pevensey III
Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.
Hidden Markov Models BMI/CS 576
Statistical Machine Translation Part II: Word Alignments and EM
Statistical NLP Spring 2011
CSCI 5832 Natural Language Processing
Statistical Machine Translation
Bayesian Models in Machine Learning
CSCI 5832 Natural Language Processing
Chapter 9: Sampling Distributions
Expectation-Maximization Algorithm
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
Machine Translation and MT tools: Giza++ and Moses
EM for Inference in MV Data
Word Alignment David Kauchak CS159 – Fall 2019 Philipp Koehn
Machine Translation and MT tools: Giza++ and Moses
EM for Inference in MV Data
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
SUCCESS CRITERIA: UNDERSTAND BASIC SENTENCES FOR GRADE E.
CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.
Presentation transcript:

Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL, DCU

Translation Model & Parameters: Recap 

How to learn TM parameters?  Answer: from alignments  French – English, translation model is English-French (i.e. P(F|E) )  What is we don’t have alignments?  Infer alignments automatically; if we have a rough idea about which words correspond, we can use this knowledge to infer more alignments (Cantauri exercise) La chambre bleue est très petite The blue room is very small Le salon rouge est à côté de la cuisine The red room is beside the kitchen t(chambre | room) = t(salon | room) = t(le | the) = t(la | the) = n(1 | is) = n(2 | is) = n(1 | beside) = n(2 | beside) = d(2 | 2) = d(3 | 2) = d(5 | 5) = d(6 | 5) = TRANSLATIONFERTILITYDISTORTION Number of times “room” is aligned with “chambre” % Number of times “room” occurs

Expectation Maximization Algorithm (EM)  EM Algorithm consists of two main steps:  Expectation-Step: Apply model to the data  Parts of the model are hidden (here: alignments)  Using the model, assign probabilities to possible values  Maximization-Step: Estimate the model from the data  Take currently assigned values as fact  Collect counts (weighted by probabilities)  Estimate model from counts  Iterate these steps until convergence  To apply EM we need to be able to:  Expectation-Step: compute probability of alignments  Maximization-Step: collect counts

Expectation Maximization Algorithm (EM)  EM in a nutshell:  Initialise model parameters (i.e. uniform)  initialisation  Assign probabilities to the missing data  calculate P(a,f|e) and P(a|ef)  Estimate model parameters from completed data  calculate new values of t from fractional counts  Iterate  Start off ignoring fertility, distortion and insertion parameters and try to estimate translation (lexical) parameters only  IBM Model 1(IBM Models 1 – 5)

EM Step 1: Initialise Parameters 

t(la|the) = 1/3t(la|house)= 1/3 t(maison|the)= 1/3t(maison|house)= 1/3 t(bleue|the)= 1/3t(bleue|house) = 1/3 t(la|blue)= 1/3 t(bleue|blue)= 1/3 t(la|house)= 1/3  Given 3 sentence pairs  the blue house la maison bleue  the house la maison  the la  As we have no seed alignments, we have to consider all possible alignments.  Set t parameters uniformly:

EM Step 2: Compute P(a,f|e)  t(la|the) = 1/3t(la|house)= 1/3 t(maison|the)= 1/3t(maison|house)= 1/3 t(bleue|the)= 1/3t(bleue|house) = 1/3 t(la|blue)= 1/3 t(maison|blue)= 1/3 t(bleue|blue)= 1/3

EM Step 2: Normalise P(a,f|e) to yield P(a|e,f)  From previous step:  Normalize P(a,f|e) values to yield P(a|e,f) ( normalize by sum of probabilities of possible alignments for the source string in question ): P(a1,f|e) = 1/27P(a7,f|e) = 1/9 P(a2,f|e) = 1/27P(a8,f|e) = 1/9 P(a3,f|e) = 1/27P(a9,f|e) = 1/3 P(a4,f|e) = 1/27 P(a5,f|e) = 1/27 P(a6,f|e) = 1/27 only 1 alignment, therefore P(a9|e,f) will always be 1

EM Step 3: Collect Fractional Counts  Collect fractional counts for each translation pair ( i.e. for each translation pair, sum values of P(a|e,f) where the word pair occurs ):

EM Step 3: Normalize Fractional Counts  Normalize fractional counts to get revised parameters for t

Step 4 Iterate: Repeat Step 2  Given our new parameter values, re-compute the probability of each of the possible alignments P(a,f|e):

2 nd Iteration: Step 2 Normalise P(a,f|e)  Normalize P(a,f|e) values to yield P(a|e,f):

2 nd Iteration: Step 3 Collect Fractional Counts  Collect fractional counts for each translation pair

2 nd Iteration: Step 3 Normalize Fractional Counts  Normalize fractional counts to get revised parameters for t

EM: Convergence  After the second iteration, are t values are:  We continue EM until our t values converge  It is clear to see already, after 2 iterations, how some translation candidates are (correctly) becoming more likely then others

Beyond IBM Model 1  Previous example shows how EM can be used to calculate lexical parameters  IBM Model 1  But what about fertility, distortion & insertion parameters?  Need more complex models (IBM Models 2 – 5)  IBM Model 2 involves both word translations and adds absolute reordering (distortion) model  IBM Model 3: adds fertility model  IBM Model 4: relative reordering model  IBM Model 5: Fixes deficiency