Machine Translation and MT tools: Giza++ and Moses

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation IBM Model 1 CS626/CS460 Anoop Kunchukuttan Under the guidance of Prof. Pushpak Bhattacharyya.
Word Alignment Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some slides.
Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
Korea Maritime and Ocean University NLP Jung Tae LEE
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Computational Linguistics Seminar LING-696G Week 6.
Recap: distributional hypothesis What is tezgüino? – A bottle of tezgüino is on the table. – Everybody likes tezgüino. – Tezgüino makes you drunk. – We.
Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.
Statistical Machine Translation Part II: Word Alignments and EM
Alexander Fraser CIS, LMU München Machine Translation
Statistical Models for Automatic Speech Recognition
Suggestions for Class Projects
Statistical NLP: Lecture 13
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
CSCI 5832 Natural Language Processing
Introduction to EM algorithm
Statistical Models for Automatic Speech Recognition
Introduction to IBM Model 1&2 Alignment
CSCI 5832 Natural Language Processing
Three classic HMM problems
Expectation-Maximization Algorithm
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
(Ack: Material taken from JurafskyMartin 2nd Ed., Brown et. al. 1993)
Statistical Machine Translation Papers from COLING 2004
Statistical Machine Translation Part IIIb – Phrase-based Model
Machine Translation(MT)
Word Alignment David Kauchak CS159 – Fall 2019 Philipp Koehn
Machine Translation and MT tools: Giza++ and Moses
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.
Presentation transcript:

Machine Translation and MT tools: Giza++ and Moses -Nirdesh Chauhan

Outline Problem statement in SMT Translation models Using Giza++ and Moses

Introduction to SMT Given a sentence in foreign language F, find most appropriate translation in English E P(F|E) – Translation model P(E) – Language model

The Generation Process Partition: Think of all possible partitions of the source language Lexicalization: For a give partition, translate each phrase into the foreign language Reordering: permute the set of all foreign words - words possibly moving across phrase boundaries We need the notion of alignment to better explain mathematic behind the generation process

Alignment

Word-based alignment For each word in source language, align words from target language that this word possibly produces Based on IBM models 1-5 Model 1 – simplest As we go from models 1 to 5, models get more complex but more realistic This is all that Giza++ does

Alignment A function from target position to source position: The alignment sequence is: 2,3,4,5,6,6,6 Alignment function A: A(1) = 2, A(2) = 3 .. A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2).. To allow spurious insertion, allow alignment with word 0 (NULL) No. of possible alignments: (I+1)J

IBM Model 1: Generative Process

IBM Model 1: Details No assumptions. Above formula is exact. Choosing length: P(J|E) = P(J|E,I) = P(J|I) = Choosing Alignment: all alignments equiprobable Translation Probability

Training Alignment Models Given a parallel corpora, for each (F,E) learn the best alignment A and the component probabilities: t(f|e) for Model 1 lexicon probability P(f|e) and alignment probability P(ai|ai-1,I) How to compute these probabilities if all you have is a parallel corpora

Intuition : Interdependence of Probabilities If you knew which words are probable translation of each other then you can guess which alignment is probable and which one is improbable If you were given alignments with probabilities then you can compute translation probabilities Looks like a chicken and egg problem EM algorithm comes to the rescue

Expectation Maximization (EM) Algorithm Used when we want maximum likelihood estimate of the parameters of a model when the model depends on hidden variables In present case, parameters are Translation Probabilities, and hidden Variables are alignment probabilities Init: Start with an arbitrary estimate of parameters E-step: compute the expected value of hidden variables M-Step: Recompute the parameters that maximize the likelihood of data given the expected value of the hidden variables from E-step

Example of EM Algorithm Green house Casa verde The house La case Init: Assume that any word can generate any word with equal prob: P(la|house) = 1/3

E-Step E-Step:

M-Step

E-Step again 1/3 2/3 2/3 1/3 Repeat till convergence

Limitation: Only 1->Many Alignments allowed

Phrase-based alignment More natural Many-to-one mappings allowed

Generating Bi-directional Alignments Existing models only generate uni-directional alignments Combine two uni-directional alignments to get many-to-many bi- directional alignments

Hindi-Eng Alignment | छुट्टियों के लिए गोवा एक प्रमुख समुद्र-तटीय गंतव्य है Goa | is a premier beach vacation destination

Eng-Hindi Alignment | छुट्टियों के लिए गोवा एक प्रमुख समुद्र-तटीय गंतव्य है Goa | is a premier beach vacation destination

Combining Alignments + | P=4/5=.8,R=4/7=.6 P=2/3=.67, R=2/7=.3 छुट्टियों के लिए गोवा एक प्रमुख समुद्र-तटीय गंतव्य है Goa + is a premier | beach vacation destination P=4/5=.8,R=4/7=.6 P=2/3=.67, R=2/7=.3 P=5/6=.83,R=5/7=.7 P=6/9=.67,R=6/7=.85

A Different Heuristic from Moses-Site GROW-DIAG-FINAL(e2f,f2e): neighboring = ((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(1,1)) alignment = intersect(e2f,f2e); GROW-DIAG(); FINAL(e2f); FINAL(f2e); GROW-DIAG(): iterate until no new points added for english word e = 0 ... en for foreign word f = 0 ... fn if ( e aligned with f ) for each neighboring point ( e-new, f-new ): if (( e-new, f-new ) in union( e2f, f2e ) and ( e-new not aligned and f-new not aligned )) add alignment point ( e-new, f-new ) FINAL(a): for english word e-new = 0 ... en for foreign word f-new = 0 ... fn if ( ( ( e-new, f-new ) in alignment a) and ( e-new not aligned or f-new not aligned ) ) Proposed Changes: After growing diagonal Align the shorter sentence first And use alignments only from corresponding directional alignment

Generating Phrase Alignments छुट्टियों के लिए गोवा एक प्रमुख समुद्र-तटीय गंतव्य है Goa + is a premier beach vacation destination a premier beach vacation destination एक प्रमुख समुद्र-तटीय गंतव्य है premier beach vacation प्रमुख समुद्र-तटीय

Using Moses and Giza++ Refer to http://www.statmt.org/moses_steps.html

Steps Install all packages in Moses Input - sentence aligned parallel corpus Training Tuning Generate output on test corpus (decoding)

Example train.pr train.en h e l l o hh eh l ow hh ah l ow w o r l d c o m p o u n d w o r d h y p h e n a t e d o n e b o o m k w e e z l e b o t t e r train.pr hh eh l ow hh ah l ow w er l d k aa m p aw n d w er d hh ay f ah n ey t ih d ow eh n iy b uw m k w iy z l ah b aa t ah r

Sample from Phrase-table b o ||| b aa ||| (0) (1) ||| (0) (1) ||| 1 0.666667 1 0.181818 2.718 b ||| b ||| (0) ||| (0) ||| 1 1 1 1 2.718 c o m p o ||| aa m p ||| (2) (0,1) (1) (0) (1) ||| (1,3) (1,2,4) (0) ||| 1 0.0486111 1 0.154959 2.718 c ||| p ||| (0) ||| (0) ||| 1 1 1 1 2.718 d w ||| d w ||| (0) (1) ||| (0) (1) ||| 1 0.75 1 1 2.718 d ||| d ||| (0) ||| (0) ||| 1 1 1 1 2.718 e b ||| ah b ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.6 2.718 e l l ||| ah l ||| (0) (1) (1) ||| (0) (1,2) ||| 1 1 0.5 0.5 2.718 e l l ||| eh l ||| (0) (0) (1) ||| (0,1) (2) ||| 1 0.111111 0.5 0.111111 2.718 e l ||| eh ||| (0) (0) ||| (0,1) ||| 1 0.111111 1 0.133333 2.718 e ||| ah ||| (0) ||| (0) ||| 1 1 0.666667 0.6 2.718 h e ||| hh ah ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.6 2.718 h ||| hh ||| (0) ||| (0) ||| 1 1 1 1 2.718 l e b ||| l ah b ||| (0) (1) (2) ||| (0) (1) (2) ||| 1 1 1 0.5 2.718 l e ||| l ah ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.5 2.718 l l o ||| l ow ||| (0) (0) (1) ||| (0,1) (2) ||| 0.5 1 1 0.227273 2.718 l l ||| l ||| (0) (0) ||| (0,1) ||| 0.25 1 1 0.833333 2.718 l o ||| l ow ||| (0) (1) ||| (0) (1) ||| 0.5 1 1 0.227273 2.718 l ||| l ||| (0) ||| (0) ||| 0.75 1 1 0.833333 2.718 m ||| m ||| (0) ||| (0) ||| 1 0.5 1 1 2.718 n d ||| n d ||| (0) (1) ||| (0) (1) ||| 1 1 1 1 2.718 n e ||| eh n iy ||| (1) (2) ||| () (0) (1) ||| 1 1 0.5 0.3 2.718 n e ||| n iy ||| (0) (1) ||| (0) (1) ||| 1 1 0.5 0.3 2.718 n ||| eh n ||| (1) ||| () (0) ||| 1 1 0.25 1 2.718 o o m ||| uw m ||| (0) (0) (1) ||| (0,1) (2) ||| 1 0.5 1 0.181818 2.718 o o ||| uw ||| (0) (0) ||| (0,1) ||| 1 1 1 0.181818 2.718 o ||| aa ||| (0) ||| (0) ||| 1 0.666667 0.2 0.181818 2.718 o ||| ow eh ||| (0) ||| (0) () ||| 1 1 0.2 0.272727 2.718 o ||| ow ||| (0) ||| (0) ||| 1 1 0.6 0.272727 2.718 w o r ||| w er ||| (0) (1) (1) ||| (0) (1,2) ||| 1 0.1875 1 0.424242 2.718 w ||| w ||| (0) ||| (0) ||| 1 0.75 1 1 2.718

Testing output h o t  hh aa t p h o n e  p|UNK hh ow eh n iy b o o k  b uw k