Statistical Machine Translation IBM Model 1 CS626/CS460 Anoop Kunchukuttan Under the guidance of Prof. Pushpak Bhattacharyya.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
Statistical Topic Modeling part 1
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.
Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.
1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:
The EM algorithm (Part 1) LING 572 Fei Xia 02/23/06.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Expectation Maximization Algorithm
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011.
Statistical Alignment and Machine Translation
Comparable Corpora Kashyap Popat( ) Rahul Sharnagat(11305R013)
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Direct Translation Approaches: Statistical Machine Translation
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.
An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.
Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
(Statistical) Approaches to Word Alignment
NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Maximum Entropy … the fact that a certain prob distribution maximizes entropy subject to certain constraints representing our incomplete information, is.
Parallel Implementation Of Word Alignment Model: IBM MODEL 1 Professor: Dr.Azimi Fateme Ahmadi-Fakhr Afshin Arefi Saba Jamalian Dept. of Electrical and.
Machine Translation Course 4 Diana Trandab ă ț Academic year:
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Computational Linguistics Seminar LING-696G Week 6.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Statistical Machine Translation Part II: Word Alignments and EM
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Statistical NLP: Lecture 13
Statistical Machine Translation
Expectation-Maximization Algorithm
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
Machine Translation and MT tools: Giza++ and Moses
Machine Translation and MT tools: Giza++ and Moses
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.
Presentation transcript:

Statistical Machine Translation IBM Model 1 CS626/CS460 Anoop Kunchukuttan Under the guidance of Prof. Pushpak Bhattacharyya

Why Statistical Machine Translation? Not scalable to build rule based systems between every pair of languages as in transfer based systems – Can translation models be learnt from data? Many language phenomena and language divergences which cannot be encoded in rules – Can translation patterns be memorized from data?

Noisy Channel Model e Noisy Channel f Depicts model of translation from sentence f to sentence e. Task is to recover e from noisy f. P(f|e): Translation model – Addresses adequacy P(e): Language model – addresses fluency

Three Aspects Modelling – Propose a probabilistic model for sentence translation Training – Learn the model parameters from data Decoding – Given a new sentence, use the learnt model to translate the input sentence IBM Models 1 to 5 [1] define various generative models, and their training procedures.

Generative Process 1 Given sentence e of length l Select the length of the sentence f, say m For each position j in f – Choose the position a j to align in sentence e – Choose the word f j भारतीय रेलके सबसेबड़ेनियोक्ताओंमेंएकहैदुनिया worldtheinemployersthelargestofoneisRailwaysIndianThe j=0 ajaj This process serves as the basis for IBM Models 1 and 2

Alignments The generative process explains only one way of generating a sentence pair – Each way corresponds to an alignment Total probability of the sentence pair is the sum of probability over all alignments Input: Parallel sentences 1…S in languages E and F But alignments are not known Goal: Learn the model P(f|e)

IBM Model 1 Is a special case of Generative Process 1 Assumptions: – Uniform distribution for length of f – All alignments are equally likely Goal: Learn parameters t(f|e) for model P(f|e) for all f є F and e є E Chicken and egg situation w.r.t – Alignments – Word translations

Model 1 Training If the alignments are known, the translation probabilities can be calculated simply by counting the aligned words. But, if translation probabilities were not known then the alignments could be estimated. We know neither! Suggests an iterative method where the alignments and translation method are refined over time. It is the Expectation-Maximization Algorithm

Model 1 Training Algorithm E-Step for each sentence in training corpus – for each f,e pair : Compute c(f|e;f(s),e(s)) – Use t(f|e) values from previous iteration M-Step for each f,e pair: compute t(f|e) Use the c(f|e) values computed in E-step Initialize all t(f|e) to any value in [0,1]. Repeat the E-step and M-step till t(f|e) values converge c(f|e) is the expected count that f and e are aligned

Let’s train Model 1 Corpus आकाश बैंक जाने के रस्ते पर चला Akash walked on the road to the bank श्याम नदी तट पर चला Shyam walked on the river bank आकाश द्वारा नदी तट से रेट की चोरी हो रही है Sand on the banks of the river is being stolen by Akash Stats 3 sentences English (e) vocabulary size: 15 Hindi (f) vocabulary size: 18

Model 1 in Action c(f|e)sentenceIteration 1Iteration 2Iteration 5Iteration 19Iteration 20 आकाश |akash आकाश |akash बैंक |bank बैंक |bank t(f|e) Iteration 1Iteration 2Iteration 5Iteration 19Iteration 20 आकाश |akash बैंक |bank तट |bank तट |river

Where did we get the Model 1 equations from? See the presentation model1_derivation.pdf, for more on parameter training

IBM Model 2 Is a special case of Generative Process 1 Assumptions: – Uniform distribution for length of f – All alignments are equally likely

Model 2 Training Algorithm E-Step for each sentence in training corpus – for each f,e pair : Compute c(f|e;f(s),e(s)) and c(i|j,m,l) – Use t(f|e) and a(i|j,m,l) values from previous iteration M-Step for each f,e pair: compute t(f|e) Use the c(f|e) and c(i|j,m,l) values computed in E-step Initialize all t(f|e) and and a(i|j,m,l) to any value in [0,1]. Repeat the E-step and M-step till t(f|e) values converge Training process as in Model 1, except that equations become messier!

References 1.Peter Brown, Stephen Della Pietra, Vincent Della Pietra, Robert Mercer.The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics Kevin Knight. A Statistical MT Tutorial Workbook A Statistical MT Tutorial Workbook 3.Philip Koehnn. Statistical Machine Translation

Generative Process 2 For each word e i in sentence e – Select the number of words to generate – Select the words to generate – Permute the words Choose the number of words in f for which there are no alignments in e. – Choose the words – Insert them into proper locations

Generative Process 2 worldthe के employersthelargestofoneisRailwaysIndianThe भारतीयरेल निगम है एक दुनिया बड़े सबसे नियोक्ताओं मेंके This process serves as the basis for IBM Models 3 to 5

Generative Process 2 (Contd …)