CSCI 5832 Natural Language Processing

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Word Alignment Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some slides.
Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,
INTRODUCTION TO Machine Learning 3rd Edition
DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
CSCI 5582 Artificial Intelligence
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
Natural Language Processing Expectation Maximization.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation Course 5 Diana Trandab ă ț Academic year:
12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.
Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Natural Language Processing Lecture 23—12/1/2015 Jim Martin.
Machine Translation Course 4 Diana Trandab ă ț Academic year:
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.
Statistical Machine Translation Part II: Word Alignments and EM
Approaches to Machine Translation
CSE 517 Natural Language Processing Winter 2015
Statistical Machine Translation Part IV – Log-Linear Models
Alexander Fraser CIS, LMU München Machine Translation
Statistical NLP: Lecture 13
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Expectation-Maximization Algorithm
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
Approaches to Machine Translation
Machine Translation and MT tools: Giza++ and Moses
Statistical Machine Translation Papers from COLING 2004
Statistical Machine Translation Part IIIb – Phrase-based Model
Machine Translation(MT)
Word Alignment David Kauchak CS159 – Fall 2019 Philipp Koehn
Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky
CSCI 5832 Natural Language Processing
Machine Translation and MT tools: Giza++ and Moses
CSCI 5582 Artificial Intelligence
Neural Machine Translation
CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.
CSC 578 Neural Networks and Deep Learning
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

CSCI 5832 Natural Language Processing Lecture 26 Jim Martin 11/17/2018 CSCI 5832 Spring 2007

Today 4/26 Review Worked out EM alignment example Machine translation framework Word-based models Worked out EM alignment example Phrase-based translation Syntax-based… 11/17/2018 CSCI 5832 Spring 2007

Sample from Today’s Al Jazeera 11/17/2018 CSCI 5832 Spring 2007

MT Pyramid SOURCE TARGET syntax semantics interlingua phrases words 11/17/2018 CSCI 5832 Spring 2007

Bayes Rule/Noisy Channel Garbled English Spanish English Translation Model P(s|e) Language Model P(e) Que hambre tengo yo I am so hungry Decoding algorithm argmax P(e) * P(s|e) e Given a source sentence s, the decoder should consider many possible translations … and return the target string e that maximizes P(e | s) By Bayes Rule, we can also write this as: P(e) x P(s | e) / P(s) and maximize that instead. P(s) never changes while we compare different e’s, so we can equivalently maximize this: P(e) x P(s | e) 11/17/2018 CSCI 5832 Spring 2007

Parts List We need probabilities for n (x|y) The probability that word y will yield x outputs in the translation… (fertility) p The probability of a null insertion t The actual word translation probability table d(j|i) the probability that a word at position i will make an appearance at position j in the translation 11/17/2018 CSCI 5832 Spring 2007

Parts List Every one of these can be learned from a sentence aligned corpus… Ie. A corpus where sentences are paired but nothing else is specified And the EM algorithm 11/17/2018 CSCI 5832 Spring 2007

Word Alignment … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … Inherent hidden structure revealed by EM training! 11/17/2018 CSCI 5832 Spring 2007

EM: Worked out example Focus only on the word translation probs How many alignments are there? … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … 11/17/2018 CSCI 5832 Spring 2007

EM: Worked out Knight example Focus only on the word translation probs How many alignments are there? … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … 2 6 2 11/17/2018 CSCI 5832 Spring 2007

EM: Step 1 Make up some numbers for the parameters of interest. In this case, just the word translation probabilities. (la|the) (m|the) (b|the) (f|the) (la|house) (m|house) (b|house) (la|blue) (m|blue) (b|blue) (la|flower) (f|flower) 11/17/2018 CSCI 5832 Spring 2007

EM: Step 1 Make up some numbers for the parameters of interest. In this case, just the translation probs. word translation probs (la|the) 1/4 (m|the) 1/4 (b|the) 1/4 (f|the) 1/4 (la|house) 1/3 (m|house) 1/3 (b|house) 1/3 (la|blue) 1/3 (m|blue) 1/3 (b|blue) 1/3 (la|flower) 1/2 (f|flower) 1/2 11/17/2018 CSCI 5832 Spring 2007

EM: Step 2 Make some simple assumptions and produce some normalized alignment probabilities la maison bleue the blue house la maison the house la maison the house ¼ * 1/3 = 1/12 ¼ * 1/3 = 1/12 ¼ * 1/3 * 1/3 = 1/36 11/17/2018 CSCI 5832 Spring 2007

EM: Step 2 (normalize) Make some simple assumptions and produce some normalized alignment probabilities la maison the house la maison the house la maison bleue the blue house ¼ * 1/3 = 1/12 ¼ * 1/3 = 1/12 ¼ * 1/3 * 1/3 = 1/36 (1/12) / (2/12) = 1/2 (1/12) / (2/12) = 1/2 1/6 For each 11/17/2018 CSCI 5832 Spring 2007

EM: Step 3 Now that we have the probability of each alignment we can go back and count the evidence in the alignments for each translation pair and prorate them based on the alignments they come from. 11/17/2018 CSCI 5832 Spring 2007

EM: Step 3 Now that we have the probability of each alignment we can go back and count the evidence in the alignments for each translation pair and prorate them based on the alignments they come from. Huh? Let’s just look at (la | the). What evidence do we have? 11/17/2018 CSCI 5832 Spring 2007

EM: Step 3 Evidence for (la|the) … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … 2 1 1 11/17/2018 CSCI 5832 Spring 2007

EM: Step 3 Evidence for (la|the) … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … 2 1 1 1* ½ 2* 1/6 1* ½ 11/17/2018 CSCI 5832 Spring 2007

EM: Step 3 Evidence for (la|the) … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … 2 1 1 1* ½ 2* 1/6 1* ½ 8/6 = Discounted count for (la|the) 11/17/2018 CSCI 5832 Spring 2007

EM: Step 3 Do that for the other (?|the) and normalize (la| the) = 8/6 (m| the) = 5/6 (b| the) = 2/6 (f | the) = 3/6 (la| the) = 8/6 / 18/6 (m| the) = 5/6 / 18/6 (b| the) = 2/6 /18/6 (f | the) = 3/6 / 18/6 (la| the) = .44 (m| the) = .27 (b| the) = .11 (f | the) = .16 11/17/2018 CSCI 5832 Spring 2007

EM: Step 4 Do that for all the words in the table Recompute the alignment probabilities using the new word translation probabilities Recompute the fractional counts Normalize to get new word translation probs Iterate until done When you’re done you can use the numbers to get the most probable alignment. From which you can read off all the parameters you need. 11/17/2018 CSCI 5832 Spring 2007

EM/Alignment This is totally weird voodoo science We need some parameters which we don’t have We can get them from a word-aligned corpus So we make up some parameters to get the alignment and then use that alignment to get the right numbers. 11/17/2018 CSCI 5832 Spring 2007

Parts List Given a sentence alignment we can induce a word alignment Given that word alignment we can get the p, t, d and n parameters we need for the model. Ie. We can argmax P(e|f) by max over P(f|e)*P(e)… and we can do that by iterating over some large space of f possibilities. 11/17/2018 CSCI 5832 Spring 2007

Break Quiz Coverage Semantics (Compositional and IE) Discourse MT Chapter 16 Pages 1-29 Chapter 19 Sections 19.1-19.4 Review Bioinformatics Slides Discourse Chapter 21 Pages 1-31 (skip 20.6.2) MT Chapter 24 Pages 1-41 (skip 24.2) 11/17/2018 CSCI 5832 Spring 2007

Decoding Remember Viterbi? Just a fancier Viterbi Given foreign sentence f, find English sentence e that maximizes P(e) x P(f | e) Space is defined by the model (fertility, distortion, word translation model, etc.) Large space --> efficient decoding is required. 11/17/2018 CSCI 5832 Spring 2007

Decoding Que hambre tengo yo what hunger have I that hungry am me so make where 11/17/2018 CSCI 5832 Spring 2007

Decoding Que hambre tengo yo what hunger have I that hungry am me so make where 11/17/2018 CSCI 5832 Spring 2007

Decoding Que hambre tengo yo what hunger have I that hungry am me so make where 11/17/2018 CSCI 5832 Spring 2007

Decoding Que hambre tengo yo what hunger have I that hungry am me so make where 11/17/2018 CSCI 5832 Spring 2007

Decoder: Actually Translates New Sentences 1st target word 2nd target word 3rd target word 4th target word start end all source words covered Each partial translation hypothesis contains: - Last English word chosen + source words covered by it - Entire coverage vector (so far) of source sentence - Language model and translation model scores (so far) [Jelinek, 1969; Brown et al, 1996 US Patent; (Och, Ueffing, and Ney, 2001] 11/17/2018 CSCI 5832 Spring 2007

Dynamic Programming Beam Search 1st target word 2nd target word 3rd target word 4th target word best predecessor link start end all source words covered Each partial translation hypothesis contains: - Last English word chosen + source words covered by it - Entire coverage vector (so far) of source sentence - Language model and translation model scores (so far) [Jelinek, 1969; Brown et al, 1996 US Patent; (Och, Ueffing, and Ney, 2001] 11/17/2018 CSCI 5832 Spring 2007

Flaws of Word-Based MT Multiple English words for one French word IBM models can do one-to-many (fertility) but not many-to-one Phrasal Translation “real estate”, “note that”, “interest in” Syntactic Transformations Languages with differing word orders (SVO vs. VSO) Translation model penalizes any proposed re-ordering Language model not strong enough to force the verb to move to the right place 11/17/2018 CSCI 5832 Spring 2007

Phrase-Based Statistical MT 11/17/2018 CSCI 5832 Spring 2007

Phrase-Based Statistical MT Morgen fliege ich nach Kanada zur Konferenz Tomorrow I will fly to the conference In Canada Source (foreign) input segmented in to phrases “phrase” is any sequence of words Each phrase is probabilistically translated into English P(to the conference | zur Konferenz) P(into the meeting | zur Konferenz) HUGE TABLE!! Phrases are probabilistically re-ordered 11/17/2018 CSCI 5832 Spring 2007

Advantages of Phrase-Based Many-to-many mappings can handle non-compositional phrases (e.g., “real estate”) Local context is very useful for disambiguating “Interest rate”  … “Interest in”  … The more data, the longer the learned phrases Sometimes whole sentences Interesting parallel to concatenative synthesis for TTS 11/17/2018 CSCI 5832 Spring 2007

How to Learn the Phrase Translation Table? One method: “alignment templates” (Och et al, 1999) Start with word alignment, build phrases from that. Maria no dió una bofetada a la bruja verde This word-to-word alignment is a by-product of training a translation model like IBM-Model-3. This is the best (or “Viterbi”) alignment. Mary did not slap the green witch 11/17/2018 CSCI 5832 Spring 2007

How to Learn the Phrase Translation Table? One method: “alignment templates” (Och et al, 1999) Start with word alignment, build phrases from that. Maria no dió una bofetada a la bruja verde This word-to-word alignment is a by-product of training a translation model like IBM-Model-3. This is the best (or “Viterbi”) alignment. Mary did not slap the green witch 11/17/2018 CSCI 5832 Spring 2007

Word Alignment Induced Phrases Maria no dió una bofetada a la bruja verde Mary did not slap the green witch (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) 11/17/2018 CSCI 5832 Spring 2007

Word Alignment Induced Phrases Maria no dió una bofetada a la bruja verde Mary did not slap the green witch (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) 11/17/2018 CSCI 5832 Spring 2007

Word Alignment Induced Phrases Maria no dió una bofetada a la bruja verde Mary did not slap the green witch (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) 11/17/2018 CSCI 5832 Spring 2007

Word Alignment Induced Phrases Maria no dió una bofetada a la bruja verde Mary did not slap the green witch (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … 11/17/2018 CSCI 5832 Spring 2007

Word Alignment Induced Phrases Maria no dió una bofetada a la bruja verde Mary did not slap the green witch (Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … (Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch) 11/17/2018 CSCI 5832 Spring 2007

Phrase Pair Probabilities A certain phrase pair (f-f-f, e-e-e) may appear many times across the bilingual corpus. We hope so! We can calculate phrase substitution probabilities P(f-f-f | e-e-e) We can use these in decoding Much better results than word-based translation! 11/17/2018 CSCI 5832 Spring 2007