Statistical Machine Translation

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation IBM Model 1 CS626/CS460 Anoop Kunchukuttan Under the guidance of Prof. Pushpak Bhattacharyya.

Segmentation via Maximum Entropy Model. Goals Is it possible to learn the segmentation problem automatically? Using a model which is frequently used in.

Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.

Statistical NLP: Lecture 11

Statistical NLP: Hidden Markov Models Updated 8/12/2005.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Part II. Statistical NLP Advanced Artificial Intelligence Probabilistic Context Free Grammars Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.

A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

Forward-backward algorithm LING 572 Fei Xia 02/23/06.

Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??

Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.

Statistical Machine Translation Part IX – Better Word Alignment, Morphology and Syntax Alexander Fraser ICL, U. Heidelberg CIS, LMU München

THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.

Natural Language Processing Expectation Maximization.

Statistical Machine Translation Part VIII – Log-Linear Models Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Probability theory: (lecture 2 on AMLbook.com)

Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart

12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.

Statistical Machine Translation Part III – Phrase-based SMT Alexander Fraser CIS, LMU München WSD and MT.

Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.

Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Getting the structure right for word alignment: LEAF Alexander Fraser and Daniel Marcu Presenter Qin Gao.

Computational Linguistics Seminar LING-696G Week 6.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.

Correlation Clustering

Learning, Uncertainty, and Information: Learning Parameters

Statistical Machine Translation Part II: Word Alignments and EM

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

HADI Tutorial Templates

Statistical Machine Translation Part IV – Log-Linear Models

Alexander Fraser CIS, LMU München Machine Translation

Statistical Machine Translation Part III – Phrase-based SMT / Decoding

Training Tree Transducers

Hidden Markov Models Part 2: Algorithms

Bayesian Models in Machine Learning

CSCI 5832 Natural Language Processing

COMSOC ’06 6 December 2006 Rob LeGrand

Yield Optimization: Divide and Conquer Method

Expectation-Maximization Algorithm

- Viterbi training - Baum-Welch algorithm - Maximum Likelihood

Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.

Machine Translation and MT tools: Giza++ and Moses

Statistical Machine Translation Part IIIb – Phrase-based Model

Word Alignment David Kauchak CS159 – Fall 2019 Philipp Koehn

Machine Translation and MT tools: Giza++ and Moses

Attention for translation

Valentin I. Spitkovsky April 16, 2010

Statistical Machine Translation Part VI – Phrase-based Decoding

OUTLINE Questions? Quiz Go over homework Next homework Forecasting.

Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.

Presentation transcript:

2014.05.22 Statistical Machine Translation Statistical Machine Translation Part III: Word Alignment and Noisy Channel Alexander Fraser ICL, U. Heidelberg CIS, LMU München 2014.05.22 Statistical Machine Translation

Outline Measuring alignment quality Types of alignments IBM Model 1 Training IBM Model 1 with Expectation Maximization Noisy channel The higher IBM Models Approximate Expectation Maximization for IBM Models 3 and 4 Heuristics for improving IBM alignments

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2009

Slide from Koehn 2008

Slide from Koehn 2009

Training IBM Models 3/4/5 Approximate Expectation Maximization Focusing probability on small set of most probable alignments

Maximum Approximation Mathematically, P(e| f) = ∑ P(e, a | f) An alignment represents one way e could be generated from f But for IBM models 3, 4 and 5 we approximate Maximum approximation: P(e| f) = argmax P(e , a | f) Another approximation close to this will be discussed in a few slides a a

Model 3/4/5 training: Approx. EM Bootstrap Viterbi alignments Translation Model Initial parameters E-Step Viterbi alignments Refined parameters M-Step

Model 3/4/5 E-Step E-Step: search for Viterbi alignments Solved using local hillclimbing search Given a starting alignment we can permute the alignment by making small changes such as swapping the incoming links for two words Algorithm: Begin: Given a starting alignment, make list of possible small changes (e.g. list every possible swap of the incoming links for two words) for each possible small change Create new alignment A2 by copying A and applying small change If score(A2) > score(best) then best = A2 end for Choose best alignment as starting point, goto Begin:

Model 3/4/5 M-Step M-Step: reestimate parameters Count events in the neighborhood of the Viterbi Neighborhood approximation: consider only those alignments reachable by one change to the alignment Calculate p(e,a|f) only over this neighborhood, then divide by the sum over alignments in the neighborhood to get p(a|e,f) All alignments outside neighborhood are not considered! Sum counts over sentences, weighted by p(a|e,f) Normalize counts to sum to 1

Search Example

Slide from Koehn 2009

Thank you for your attention!