MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation IBM Model 1 CS626/CS460 Anoop Kunchukuttan Under the guidance of Prof. Pushpak Bhattacharyya.
Word Alignment Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some slides.
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Segmentation and Fitting Using Probabilistic Methods
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein.
1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:
The EM algorithm (Part 1) LING 572 Fei Xia 02/23/06.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011.
Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
Korea Maritime and Ocean University NLP Jung Tae LEE
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
CSE 517 Natural Language Processing Winter 2015
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Parallel Implementation Of Word Alignment Model: IBM MODEL 1 Professor: Dr.Azimi Fateme Ahmadi-Fakhr Afshin Arefi Saba Jamalian Dept. of Electrical and.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Machine Translation Course 4 Diana Trandab ă ț Academic year:
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Computational Linguistics Seminar LING-696G Week 6.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.
Statistical Machine Translation Part II: Word Alignments and EM
Alexander Fraser CIS, LMU München Machine Translation
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Statistical Models for Automatic Speech Recognition
Introduction to IBM Model 1&2 Alignment
CSCI 5832 Natural Language Processing
Expectation-Maximization Algorithm
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
(Ack: Material taken from JurafskyMartin 2nd Ed., Brown et. al. 1993)
Machine Translation and MT tools: Giza++ and Moses
Statistical Machine Translation Part IIIb – Phrase-based Model
Word Alignment David Kauchak CS159 – Fall 2019 Philipp Koehn
Machine Translation and MT tools: Giza++ and Moses
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.
Presentation transcript:

MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan

Outline  Problem statement in SMT  Translation models  Using Giza++ and Moses

Introduction to SMT  Given a sentence in foreign language F, find most appropriate translation in English E  P(F|E) – Translation model  P(E) – Language model

The Generation Process 4  Partition: Think of all possible partitions of the source language  Lexicalization: For a give partition, translate each phrase into the foreign language  Reordering: permute the set of all foreign words - words possibly moving across phrase boundaries  We need the notion of alignment to better explain mathematic behind the generation process

Alignment

Word-based alignment  For each word in source language, align words from target language that this word possibly produces  Based on IBM models 1-5  Model 1 – simplest  As we go from models 1 to 5, models get more complex but more realistic  This is all that Giza++ does

Alignment A function from target position to source position: 7 The alignment sequence is: 2,3,4,5,6,6,6 Alignment function A: A(1) = 2, A(2) = 3.. A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2).. To allow spurious insertion, allow alignment with word 0 (NULL) No. of possible alignments: (I+1) J

IBM Model 1: Generative Process 8

IBM Model 1: Details  No assumptions. Above formula is exact.  Choosing length: P(J|E) = P(J|E,I) = P(J|I) =  Choosing Alignment: all alignments equiprobable  Translation Probability 9

Training Alignment Models 10  Given a parallel corpora, for each (F,E) learn the best alignment A and the component probabilities:  t(f|e) for Model 1  lexicon probability P(f|e) and alignment probability P(a i |a i-1,I)  How to compute these probabilities if all you have is a parallel corpora

Intuition : Interdependence of Probabilities 11  If you knew which words are probable translation of each other then you can guess which alignment is probable and which one is improbable  If you were given alignments with probabilities then you can compute translation probabilities  Looks like a chicken and egg problem  EM algorithm comes to the rescue

Expectation Maximization (EM) Algorithm 12 Used when we want maximum likelihood estimate of the parameters of a model when the model depends on hidden variables -In present case, parameters are Translation Probabilities, and hidden Variables are alignment probabilities Init: Start with an arbitrary estimate of parameters E-step: compute the expected value of hidden variables M-Step: Recompute the parameters that maximize the likelihood of data given the expected value of the hidden variables from E-step

Example of EM Algorithm 13 Green house Casa verde The house La case Init: Assume that any word can generate any word with equal prob: P(la|house) = 1/3

E-Step 14 E-Step:

M-Step 15

E-Step again 16 1/32/3 1/3 Repeat till convergence

Limitation: Only 1->Many Alignments allowed 17

Phrase-based alignment  More natural  Many-to-one mappings allowed

Generating Bi-directional Alignments  Existing models only generate uni-directional alignments  Combine two uni-directional alignments to get many-to-many bi- directional alignments 19

Hindi-Eng Alignment छुट्टियोंकेलिएगोवा एक प्रमुखसमुद्र - तटीयगंतव्यहै Goa | is a | premier | beach vacation ||| destination || 20

Eng-Hindi Alignment छुट्टि यों केलिएगोवा एक प्रमुखसमुद्र - तटीयगंतव्यहै Goa | is a | premier | beach | vacation | destination | 21

Combining Alignments छुट्टि यों केलिएगोवा एक प्रमुख समुद्र - तटीय गंतव्यहै Goa + is a + premi er | | beach | vacati on || + destin ation | || 22 P=2/3=.67, R=2/7=.3 P=4/5=.8,R=4/7=.6 P=5/6=.83,R=5/7=.7 P=6/9=.67,R=6/7=.85

A Different Heuristic from Moses-Site 23 GROW-DIAG-FINAL(e2f,f2e): neighboring = ((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(1,1)) alignment = intersect(e2f,f2e); GROW-DIAG(); FINAL(e2f); FINAL(f2e); GROW-DIAG(): iterate until no new points added for english word e = 0... en for foreign word f = 0... fn if ( e aligned with f ) for each neighboring point ( e-new, f-new ): if (( e-new, f-new ) in union( e2f, f2e ) and ( e-new not aligned and f-new not aligned )) add alignment point ( e-new, f-new ) FINAL(a): for english word e-new = 0... en for foreign word f-new = 0... fn if ( ( ( e-new, f-new ) in alignment a) and ( e-new not aligned or f-new not aligned ) ) add alignment point ( e-new, f-new ) Proposed Changes: After growing diagonal Align the shorter sentence first And use alignments only from corresponding directional alignment

Generating Phrase Alignments छुट्टि यों केलिएगोवा एक प्रमुख समुद्र - तटीय गंतव्यहै Goa + is a + premi er + beach + vacati on ++ + destin ation a premier beach vacation destination एक प्रमुख समुद्र - तटीय गंतव्यहै premier beach vacation प्रमुख समुद्र - तटीय

Using Moses and Giza++  Refer to

Steps  Install all packages in Moses Input - sentence aligned parallel corpus  Training  Tuning  Generate output on test corpus (decoding)

Example  train.en h e l l o w o r l d c o m p o u n d w o r d h y p h e n a t e d o n e b o o m k w e e z l e b o t t e r  train.pr hh eh l ow hh ah l ow w er l d k aa m p aw n d w er d hh ay f ah n ey t ih d ow eh n iy b uw m k w iy z l ah b aa t ah r

Sample from Phrase-table b o ||| b aa ||| (0) (1) ||| (0) (1) ||| b ||| b ||| (0) ||| (0) ||| c o m p o ||| aa m p ||| (2) (0,1) (1) (0) (1) ||| (1,3) (1,2,4) (0) ||| c ||| p ||| (0) ||| (0) ||| d w ||| d w ||| (0) (1) ||| (0) (1) ||| d ||| d ||| (0) ||| (0) ||| e b ||| ah b ||| (0) (1) ||| (0) (1) ||| e l l ||| ah l ||| (0) (1) (1) ||| (0) (1,2) ||| e l l ||| eh l ||| (0) (0) (1) ||| (0,1) (2) ||| e l ||| eh ||| (0) (0) ||| (0,1) ||| e ||| ah ||| (0) ||| (0) ||| h e ||| hh ah ||| (0) (1) ||| (0) (1) ||| h ||| hh ||| (0) ||| (0) ||| l e b ||| l ah b ||| (0) (1) (2) ||| (0) (1) (2) ||| l e ||| l ah ||| (0) (1) ||| (0) (1) ||| l l o ||| l ow ||| (0) (0) (1) ||| (0,1) (2) ||| l l ||| l ||| (0) (0) ||| (0,1) ||| l o ||| l ow ||| (0) (1) ||| (0) (1) ||| l ||| l ||| (0) ||| (0) ||| m ||| m ||| (0) ||| (0) ||| n d ||| n d ||| (0) (1) ||| (0) (1) ||| n e ||| eh n iy ||| (1) (2) ||| () (0) (1) ||| n e ||| n iy ||| (0) (1) ||| (0) (1) ||| n ||| eh n ||| (1) ||| () (0) ||| o o m ||| uw m ||| (0) (0) (1) ||| (0,1) (2) ||| o o ||| uw ||| (0) (0) ||| (0,1) ||| o ||| aa ||| (0) ||| (0) ||| o ||| ow eh ||| (0) ||| (0) () ||| o ||| ow ||| (0) ||| (0) ||| w o r ||| w er ||| (0) (1) (1) ||| (0) (1,2) ||| w ||| w ||| (0) ||| (0) |||

Testing output  h o t  hh aa t  p h o n e  p|UNK hh ow eh n iy  b o o k  b uw k