Machine Translation Course 5 Diana Trandab ă ț Academic year: 2014-2015.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Statistical Machine Translation
1 Statistical Machine Translation Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 8 October 27, 2004.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Unsupervised Learning
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Machine Translation Introduction to Statistical MT.
Word Alignment Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some slides.
Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,
Learning Objectives Explain similarities and differences among algorithms, programs, and heuristic solutions List the five essential properties of an algorithm.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Expectation Maximization Algorithm
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Statistical Machine Translation Part III – Phrase-based SMT Alexander Fraser CIS, LMU München WSD and MT.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
CHAPTER 13 NATURAL LANGUAGE PROCESSING. Machine Translation.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Korea Maritime and Ocean University NLP Jung Tae LEE
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
CSE 517 Natural Language Processing Winter 2015
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Chapter 10 Algorithmic Thinking. Learning Objectives Explain similarities and differences among algorithms, programs, and heuristic solutions List the.
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.
NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation Diana Trandab ă ţ Academic Year
Machine Translation Course 4 Diana Trandab ă ț Academic year:
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Computational Linguistics Seminar LING-696G Week 6.
Computational Linguistics Seminar LING-696G Week 9.
Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.
Statistical Machine Translation Part II: Word Alignments and EM
CSE 517 Natural Language Processing Winter 2015
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Expectation-Maximization Algorithm
Machine Translation and MT tools: Giza++ and Moses
Statistical Machine Translation Papers from COLING 2004
Machine Translation(MT)
Word Alignment David Kauchak CS159 – Fall 2019 Philipp Koehn
Machine Translation and MT tools: Giza++ and Moses
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.
Presentation transcript:

Machine Translation Course 5 Diana Trandab ă ț Academic year:

Translation model Machine Translation pyramid Statistical modeling and IBM Models EM algorithm Word alignment Flaws of word-based translation Phrase-based translation Syntax-based translation

Expectation Maximization initialize model parameters (e.g. uniform) assign probabilities to the missing data estimate model parameters from completed data iterate

EM algorithm … la maison … la maison bleu … la fleur … … the house … the blue house … the flower Initial step: all alignments equally likely Model learns that, e.g. la is often aligned with the

EM algorithm … la maison … la maison bleu … la fleur … … the house … the blue house … the flower After one iteration Alignments, e.g. between la and the are more likely

EM algorithm … la maison … la maison bleu … la fleur … … the house … the blue house … the flower After one iteration Alignments, e.g. between la and the are more likely

EM algorithm … la maison … la maison bleu … la fleur … … the house … the blue house … the flower After one iteration Alignments, e.g. between la and the are more likely

EM algorithm … la maison … la maison bleu … la fleur … … the house … the blue house … the flower After another iteration It becomes apparent that alignments, e.g. between fleur and flower are more likely

EM algorithm … la maison … la maison bleu … la fleur … … the house … the blue house … the flower Convergence Inherent hidden structure revealed by EM

EM algorithm … la maison … la maison bleu … la fleur … … the house … the blue house … the flower P(la|the)=0.453 P(le|the)=0.334 P(maison|house)=0.876 P(bleu|blue)=0.563 Parameter estimation from aligned corpus

IBM model 1 and EM EM Algorithm consists of two steps Expectation-Step: Apply model to the data – parts of the model are hidden (here: alignments) – using the model, assign probabilities to possible values Maximization-Step: Estimate model from data – take assign values as fact – collect counts (weighted by probabilities) – estimate model from counts Iterate these steps until convergence

IBM model 1 and EM We need to be able to compute: – Expectation-Step: probability of alignments – Maximization-Step: count collection

Higher IBM Models Only IBM Model 1 has global maximum – training of a higher IBM model builds on previous model Computationally biggest change in Model 3 – trick to simplify estimation does not work anymore – exhaustive count collection becomes computationally too expensive – sampling over high probability alignments is used instead IBM Model 1 lexical translation IBM Model 2 adds absolute reordering model IBM Model 3 adds fertility model IBM Model 4 relative reordering model IBM Model 5 fixes deficiency

Word alignment with IBM models IBM Models create a one-to-many mapping – words are aligned using an alignment function – a function may return the same value for different input (one-to-many mapping) – a function cannot return multiple values for one input (no many-to-one mapping) But we need many-to-many mappings

Symmetrizing word alignments (sp-en) Marianodabaunabofetadaalabrujaverde Mary did not slap the green witch

Symmetrizing word alignments (en-sp) Marianodabaunabofetadaalabrujaverde Mary did not slap the green witch

Improved Word Alignments Spanish to EnglishEnglish to Spanish Intersection

Grow additional Alignment Points Marianodabaunabofetadaalabrujaverde Mary did not slap the green witch

Improved Word Alignments Heuristics for adding alignment points – only to directly neighboring – also to diagonally neighboring – also to non-neighboring – prefer English-to-foreign or foreign-to-English – use lexical probabilities or frequencies – extend only to unaligned words No Clear Advantage to any Strategy – depends on corpus size – depends on language pair

Flaws of word-based MT Multiple English Words for one German Word German: Zeitmangel erschwert das Problem. Gloss: LACK OF TIME MAKES MORE DIFFICULT THE PROBLEM Correct translation: Lack of time makes the problem more difficult. MT output: Time makes the problem. Phrasal Translation German: Eine Diskussion eruebrigt sich demnach Gloss: A DISCUSSION IS MADE UNNECESSARY ITSELF THEREFORE Correct translation: Therefore, there is no point in a discussion. MT output: A debate turned therefore.

Flaws of word-based MT Syntactic Transformations German: Das ist der Sache nicht angemessen. Gloss: THAT IS THE MATTER NOT APPROPRIATE. Correct translation: That is not appropriate for this matter MT output: That is the thing is not appropriate. German: Den Vorschlag lehnt die Kommission ab. Gloss: THE PROPOSAL REJECTS THE COMMISSION OFF. Correct translation: The commission rejects the proposal. MT output: The proposal rejects the commission.

“One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ ” Warren Weaver (1955:18, quoting a letter he wrote in 1947)