Stephan Vogel - Machine Translation1 Machine Translation Word Alignment Stephan Vogel Spring Semester 2011.

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

Thomas Schoenemann University of Düsseldorf, Germany ACL 2013, Sofia, Bulgaria Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment TexPoint.

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Stephan Vogel - Machine Translation1 Machine Translation Distortion Model Stephan Vogel Spring Semester 2011.

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.

Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.

A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.

ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.

C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.

Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.

Probabilistic Model of Sequences Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)

1 Today  Tools (Yves)  Efficient Web Browsing on Hand Held Devices (Shrenik)  Web Page Summarization using Click- through Data (Kathy)  On the Summarization.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??

9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan.

Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.

1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.

Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.

A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.

Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011.

Albert Gatt Corpora and Statistical Methods Lecture 9.

THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.

Natural Language Processing Expectation Maximization.

Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU

Stephan Vogel - Machine Translation1 Statistical Machine Translation Word Alignment Stephan Vogel MT Class Spring Semester 2011.

Text Models. Why? To “understand” text To assist in text search & ranking For autocompletion Part of Speech Tagging.

Direct Translation Approaches: Statistical Machine Translation

Machine Translation Discriminative Word Alignment Stephan Vogel Spring Semester 2011.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

LREC 2008 AWN 1 Arabic WordNet: Semi-automatic Extensions using Bayesian Inference H. Rodríguez 1, D. Farwell 1, J. Farreres 1, M. Bertran 1, M. Alkhalifa.

The CMU Arabic-to-English Statistical MT System Alicia Tribble, Stephan Vogel Language Technologies Institute Carnegie Mellon University.

Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)

2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.

Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová

1 Machine Translation MIRA and MBR Stephan Vogel Spring Semester 2011.

Stochastic Inversion Transduction Grammars Dekai Wu Advanced Machine Translation Seminar Presented by: Sanjika Hewavitharana 04/13/2006.

NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.

Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.

Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

SMT – TIDES – and all that Stephan Vogel Language Technologies Institute Carnegie Mellon University Aus der Vogel-Perspektive A Bird’s View (human translation)

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

From Genomics to Geology: Hidden Markov Models for Seismic Data Analysis Samuel Brown February 5, 2009.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

(Statistical) Approaches to Word Alignment

A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.

Natural Language Processing Statistical Inference: n-grams

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.

Getting the structure right for word alignment: LEAF Alexander Fraser and Daniel Marcu Presenter Qin Gao.

Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.

Seed Generation and Seeded Version Space Learning Version 0.02 Katharina Probst Feb 28,2002.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.

Statistical Machine Translation Part II: Word Alignments and EM

Statistical NLP: Lecture 13

CSCI 5832 Natural Language Processing

CSCI 5832 Natural Language Processing

N-Gram Model Formulas Word sequences Chain rule of probability

Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.

Statistical Machine Translation Papers from COLING 2004

Presentation transcript:

Stephan Vogel - Machine Translation1 Machine Translation Word Alignment Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation2 Overview lIBM 3: Fertility lIBM 4: Relative Distortion Acknowledgement: These slides are based on slides by Hermann Ney and Franz Josef Och

Stephan Vogel - Machine Translation3 Fertility Models lBasic concept: each word in one language can generate multiple words in the other language deseo – I would like übermorgen – the day after tomorrow departed – fuhr ab The same word can generate different number of words -> probability distribution  lAlignment is function -> fertility only on one side lIn my terminology: target words have fertility, i.e. each target word can cover multiple source words lOthers say source word generates multiple target words lSome source words are aligned to NULL word, i.e. NULL word has fertility lMany target words are not aligned, i.e. have fertility 0

Stephan Vogel - Machine Translation4 The Generative Story e0e0 e1e1 e2e2 e3e3 e4e4 e5e f01 f11 f12 f31 f41 f42 f43 f1f1 f2f2 f3f3 f4f4 f5f5 f6f6 f7f7 fertility generation word generation permutation generation

Stephan Vogel - Machine Translation5 Fertility Model Alignment model: Select fertility for each English word: For each English word select a tablet of French words: Select a permutation for the entire sequence of French words: Sum over all realizations:

Stephan Vogel - Machine Translation6 Fertility Model: Constraints Fertility bound to alignment: Permutation: French words:

Stephan Vogel - Machine Translation7 Fertility Model Decomposition into factors: Apply chain rule to each factor, limit dependencies: Fertility generation (IBM 3,4,5): Word generation (IBM 3,4,5): Permutation generation (only IBM 3): Note: 1/    results from special model for i = 0.

Stephan Vogel - Machine Translation8 Fertility Model: Some Issues lPermutation model can not guaranty that p is a permutation -> Words ca be stacked on top of each other -> This leads to deficiency lPosition i = 0 is not a real position -> special alignment and fertility model for the empty word

Stephan Vogel - Machine Translation9 Fertility Model: Empty Position lAlignment assumptions for the empty position i = 0 Uniform position distribution for each of the  0 French words generated from e 0 lPlace these French words only after all other words have been placed lAlignment model for the positions aligned to the Empty position: lOne position: lAll positions:

Stephan Vogel - Machine Translation10 Fertility Model: Empty Position lFertility model for words generated by e 0, i.e. by empty position lWe assume that each word from f 1 J requires the Empty word with probability [1 – p 0 ] Probability that exactly    from the J words in f 1 J require the Empty word:

Stephan Vogel - Machine Translation11 Fertility Model: Generative Process 1. Select a fertility for each position i > 0: 2. Select a fertility for position i = 0: 3. Define the length J for French string: 4. Select a French word for each pair (i  i>=0: 5. Select a position for each pair (i  i>0:

Stephan Vogel - Machine Translation12 Fertility Model: Generative Process 1. Select a fertility for each position i > 0: 2. Select a fertility for position i = 0: 3. Define the length J for French string: 4. Select a French word for each pair (i  i>=0: 5. Select a position for each pair (i  i>0:

Stephan Vogel - Machine Translation13 Fertility Model: Generative Process 6. Check: if any position was chosen more then once then return FAILURE 7. Select positions in f for the empty position from the remaining vacant positions: Result: Alignment: French words:

Stephan Vogel - Machine Translation14 Deficiency lDistortion model for real words is deficient lDistortion model for empty word is non-deficient lDeficiency can be reduced by aligning more words to the empty word lTraining corpus likelihood can be increased by aligning more words with empty word lPlay with p0!

Stephan Vogel - Machine Translation15 IBM 4: 1 st Order Distortion Model lIntroduce more detailed dependencies into the alignment (permutation) model lFirst order dependency along e-axis HMM IBM4

Stephan Vogel - Machine Translation16 Inverted Alignment lConsider alignments lDependency along I axis: jumps along the J axis lTwo first order models for aligning first word in a set and for aligning remaining words lWe skip the math :-)

Stephan Vogel - Machine Translation17 Characteristics of Alignment Models ModelAlignmentFertilityE-stepDeficient IBM1UniformNoExactNo IBM20-orderNoExactNo HMM1-orderNoExactNo IBM30-orderYesApproxYes IBM41-orderYesApproxYes IBM51-orderYesApproxNo

Stephan Vogel - Machine Translation18 Consideration: Overfitting lTraining on data has always the danger of overfitting lModel describes training data in too much detail lBut does not perform well on unseen test data lSolution: Smoothing lLexicon: distribute some of the probability mass from seen events to unseen events lfor p( f | e ), do this for each e) lFor unseen e: uniform distribution or ??? lDistortion: interpolate with uniform distribution lFertility: for many languages ‘longer word’ = ‘more content’ lE.g. compounds or agglutinative morphology lTrain a model for fertility given word length and interpolate with lInterpolate fertility estimates based on word frequency: frequent word, use the word model, low frequency word bias towards the length model

Stephan Vogel - Machine Translation19 Extension: Using Manual Dictionaries lAdding manual dictionaries lSimple method 1: add as bilingual data lSimple method 2: interpolate manual with trained dictionary lUse constraint GIZA (Gao, Nguyen, Vogel, WMT 2010) lCan put higher weight on word pairs from dictionary (Och, ACL 2000) lNot so simple: “But dictionaries are data too” (Brown et al, HLT 93) lProblem: manual dictionaries do not have inflected form lPossible Solution: lGenerate additional word forms (Vogel and Monson, LREC 04)

Stephan Vogel - Machine Translation20 Extension: Using POS lUse POS in distortion model lWe had: lNow we condition of word class of previous aligned target lAvailable in GIZA++ lAutomatic clustering of vocabulary into word classes with mkcls lDefault: 50 classes lUse POS as 2 nd ‘Lexicon’ model (e.g. Zhao et al, ACL 2005) lTrain p( C(f) | C(d ), start with initial model trained with IBM1 just on word classes lAlign sentence pairs using p( C(f) | C(d ) and p( f | e ) lUpdate both distributions from Viterbi path

Stephan Vogel - Machine Translation21 And Much More … lAdd fertilities to HMM model lSymmetrize during training: i.e. update lexicon probabilities based on symmetrized alignment lBenefit from shorter sentence pairs lSplit long sentences based on initial alignment and retrain lExtract phrase pairs and add reliable ones to training data lAnd then all the work on discriminative word alignment

Stephan Vogel - Machine Translation22 Alignment Results lUnbalanced between wrong and missing -> unbalanced between precision and recall lChinese is harder, many missing links -> low precision lOne direction seems harder: related to which side has more words lAlignment models generate one link per source word AlignmentCorrectWrongMissingPrecisionRecallAER Arabic-English IBM4 S2T202,89872,488134, IBM4 T2S232,840106,441104, Combined244,81489,65292, Chinese-English IBM4 S2T186,620172,865341,18352, IBM4 T2S299,744151,478228, Combined296, ,

Stephan Vogel - Machine Translation23 Unaligned Words AlignmentNULL AlignmentNot Aligned Arabic-English Manual Alignment IBM4 S2T IBM4 T2S Combined Chinese-Engish Manual Alignment IBM4 S2T IBM4 T2S Combined lNULL Alignment explicit, part of the model; non-aligned happens lThis is serious: alignment model neglects 1/3 of target words lAlignment is very asymmetric, therefore combination

Stephan Vogel - Machine Translation24 Alignment Errors for Most Frequent Words (CH-EN)

Stephan Vogel - Machine Translation25 Sentence Length Distribution lSentences are often unbalanced lWrong sentence alignment lBad translations lBut also language divergences lMay wanna remove unbalance sentences lSentence length model very weak SL Table: Target sentence length distribution for source sentence length 10

Stephan Vogel - Machine Translation26 Summary lWord Alignment Models lAlignment is (mathematically) a function, i.e many source words to 1 target word, but not the other way round lSymmetry by training in both directions lModel IBM1 lword-word probabilities lSimple training with Expectation-Maximization lModel IBM2 lPosition alignment lTraining also with EM lModel HMM lRelative positions (first order model) lTraining with Viterbi or Forward-Backward Algorithm lAlignment errors reflect restrictions in generative alignment models