Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky

Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky
LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky Thanks to Kevin Knight for much of this material!! 4/11/2019 LING 138/238 Autumn 2004

Outline for MT Week Intro and a little history
Language Similarities and Divergences Four main MT Approaches Transfer Interlingua Direct Statistical Evaluation 4/11/2019 LING 138/238 Autumn 2004

Thanks to Bonnie Dorr! Next ten slides draw from her slides on BLEU
4/11/2019 LING 138/238 Autumn 2004

How do we evaluate MT? Human
Fluency Overall fluency Human rating of sentences read out loud Cohesion (Lexical chains, anaphora, ellipsis) Hand-checking for cohesion. Well-formedness 5-point scale of syntactic correctness Fidelity (same information as source?) Hand rating of target text on 100pt scale Clarity Comprehensibility Noise test Multiple choice questionnaire Readability cloze 4/11/2019 LING 138/238 Autumn 2004

Evaluating MT: Problems
Asking humans to judge sentences on a 5-point scale for 10 factors takes time and $$$ (weeks or months!) We can’t build language engineering systems if we can only evaluate them once every quarter!!!! We need a metric that we can run every time we change our algorithm. It would be OK if it wasn’t perfect, but just tended to correlate with the expensive human metrics, which we could still run in quarterly. 4/11/2019 LING 138/238 Autumn 2004

BiLingual Evaluation Understudy (BLEU —Papineni, 2001)
Automatic Technique, but …. Requires the pre-existence of Human (Reference) Translations Approach: Produce corpus of high-quality human translations Judge “closeness” numerically (word-error rate) Compare n-gram matches between candidate translation and 1 or more reference translations 4/11/2019 LING 138/238 Autumn 2004

Bleu Comparison Chinese-English Translation Example:
Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party. Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct. Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed the directions of the party. 4/11/2019 LING 138/238 Autumn 2004

How Do We Compute Bleu Scores?
Intuition: “What percentage of words in candidate occurred in some human translation?” Proposal: count up # of candidate translation words (unigrams) # in any reference translation, divide by the total # of words in # candidate translation But can’t just count total # of overlapping N-grams! Candidate: the the the the the the Reference 1: The cat is on the mat Solution: A reference word should be considered exhausted after a matching candidate word is identified. 4/11/2019 LING 138/238 Autumn 2004

“Modified n-gram precision”
For each word compute: (1) total number of times it occurs in any single reference translation (2) number of times it occurs in the candidate translation Instead of using count #2, use the minimum of #2 and #2, I.e. clip the counts at the max for the reference transcription Now use that modified count. And divide by number of candidate words. 4/11/2019 LING 138/238 Autumn 2004

Modified Unigram Precision: Candidate #1
It(1) is(1) a(1) guide(1) to(1) action(1) which(1) ensures(1) that(2) the(4) military(1) always(1) obeys(0) the commands(1) of(1) the party(1) Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed the directions of the party. What’s the answer??? 17/18 4/11/2019 LING 138/238 Autumn 2004

Modified Unigram Precision: Candidate #2
It(1) is(1) to(1) insure(0) the(4) troops(0) forever(1) hearing(0) the activity(0) guidebook(0) that(2) party(1) direct(0) Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed the directions of the party. What’s the answer???? 8/14 4/11/2019 LING 138/238 Autumn 2004

Modified Bigram Precision: Candidate #1
It is(1) is a(1) a guide(1) guide to(1) to action(1) action which(0) which ensures(0) ensures that(1) that the(1) the military(1) military always(0) always obeys(0) obeys the(0) the commands(0) commands of(0) of the(1) the party(1) Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed the directions of the party. 10/17 What’s the answer???? 4/11/2019 LING 138/238 Autumn 2004

Modified Bigram Precision: Candidate #2
It is(1) is to(0) to insure(0) insure the(0) the troops(0) troops forever(0) forever hearing(0) hearing the(0) the activity(0) activity guidebook(0) guidebook that(0) that party(0) party direct(0) Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed the directions of the party. What’s the answer???? 1/13 4/11/2019 LING 138/238 Autumn 2004

Catching Cheaters the(2) the the the(0) the(0) the(0) the(0)
Reference 1: The cat is on the mat Reference 2: There is a cat on the mat What’s the unigram answer? 2/7 What’s the bigram answer? 0/7 4/11/2019 LING 138/238 Autumn 2004

Bleu distinguishes human from machine translations
4/11/2019 LING 138/238 Autumn 2004

Bleu problems with sentence length
Candidate: of the Solution: brevity penalty; prefers candidates translations which are same length as one of the references Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed the directions of the party. Problem: modified unigram precision is 2/2, bigram 1/1! 4/11/2019 LING 138/238 Autumn 2004

Statistical MT Fidelity and fluency Best-translation:
Developed by researchers who were originally in speech recognition at IBM Called the IBM model 4/11/2019 LING 138/238 Autumn 2004

The IBM model Hmm, those two factors might look familiar…
Yup, it’s Bayes rule: 4/11/2019 LING 138/238 Autumn 2004

Fluency: P(T) How to measure that this sentence
That car was almost crash onto me is less fluent than this one: That car almost hit me. Answer: language models (N-grams!) For example P(hit|almost) > P(was|almost) But can use any other more sophisticated model of grammar Advantage: this is monolingual knowledge! 4/11/2019 LING 138/238 Autumn 2004

Faithfulness: P(S|T) French: ça me plait [that me pleases] English:
that pleases me - most fluent I like it I’ll take that one How to quantify this? Intuition: degree to which words in one sentence are plausible translations of words in other sentence Product of probabilities that each word in target sentence would generate each word in source sentence. 4/11/2019 LING 138/238 Autumn 2004

Faithfulness P(S|T) Need to know, for every target language word, probability of it mapping to every source language word. How do we learn these probabilities? Parallel texts! Lots of times we have two texts that are translations of each other If we knew which word in Source Text mapped to each word in Target Text, we could just count! 4/11/2019 LING 138/238 Autumn 2004

Faithfulness P(S|T) Sentence alignment: Word alignment
Figuring out which source language sentence maps to which target language sentence Word alignment Figuring out which source language word maps to which target language word 4/11/2019 LING 138/238 Autumn 2004

Big Point about Faithfulness and Fluency
Job of the faithfulness model P(S|T) is just to model “bag of words”; which words come from say English to Spanish. P(S|T) doesn’t have to worry about internal facts about Spanish word order: that’s the job of P(T) P(T) can do Bag generation: put the following words in order Have programming a seen never I language better Actual the hashing is since not collision-free usually the is less perfectly the of somewhat capacity table 4/11/2019 LING 138/238 Autumn 2004

P(T) and bag generation: the answer
“Usually the actual capacity of the table is somewhat less, since the hashing is not collision-free” 4/11/2019 LING 138/238 Autumn 2004

A motivating example Japanese phrase 2000nen taio
highest Y2K 2000 years 2000 year Taio Correspondence -highest Corresponding Equivalent Tackle Dealing with Deal with P(S|T) alone prefers: 2000 Correspondence Adding P(T) might produce correct Dealing with Y2K 4/11/2019 LING 138/238 Autumn 2004

More formally: The IBM Model
Let’s flesh out these intuitions about P(S|T) and P(T) a bit. Many of the next slides are drawn from Kevin Knight’s fantastic “A Statistical MT Tutorial Workbook”! 4/11/2019 LING 138/238 Autumn 2004

IBM Model 3 as probabilistic version of Direct MT
We translate English into Spanish as follows: Replace the words in the English sentence by Spanish words Scramble around the words to look like Spanish order But we can’t propose that English words are replaced by Spanish words one-for-one, because translations aren’t the same length. 4/11/2019 LING 138/238 Autumn 2004

IBM Model 3 (from Knight 1999)
For each word ei in English sentence, choose a fertility i. The choice of i depends only on ei, not other words or ’s. For each word ei, generate i Spanish words. Choice of French word depends only on English word ei, not English context or any Spanish words. Permute all the Spanish words. Each Spanish word gets assign absolute target position slot (1,2,3, etc). Choice of Spanish word position dependent only on absolute position of English word generating it. 4/11/2019 LING 138/238 Autumn 2004

Translation as String rewriting (from Knight 1999)
Mary did not slap the green witch Assign fertilities: 1 = copy over word, 2= copy twice, etc. 0 = delete Mary not slap slap slap the the green witch Replace English words with Spanish one-for-one: Mary no daba una botefada a la verde bruja Permute the words Mary no daba una botefada a la bruja verde 4/11/2019 LING 138/238 Autumn 2004

Model 3: P(S|T) training parameters
What are the parameters for this model? Just look at dependencies: Words: P(casa|house) Fertilities: n(1|house): prob that “house” will produce 1 Spanish word whenever ‘house’ appears. Distortions: d(5|2) prob that English word in position 2 of English sentence generates French word in position 5 of French translation Actually, distortions are d(5,2,4,6) where 4 is length of English sentence, 6 is Spanish length Remember, P(S|T) doesn’t have to model fluency 4/11/2019 LING 138/238 Autumn 2004

Model 3: last twist Imagine some Spanish words are “spurious”; they appear in Spanish even though they weren’t in English original Like function words; we generated “a la” from “the” by giving “the” fertility 2 Instead, we could give “the” fertility 1, and generat “a” spuriously Do this by pretending every English sentence contains invisible word NULL as word 0. Then parameters like t(a|NULL) give probability of word “a” generating spuriously from NULL 4/11/2019 LING 138/238 Autumn 2004

Spurious words We could imagine having n(3|NULL) (probability of being exactly 3 spurious words in a Spanish translation) Instead, of n(0|NULL), n(1|NULL) … N(25|NULL), have a single parameter p1 After assign fertilities to non-NULL English words we want to generate (say) z Spanish words. As we genreate each of z words, we optionally toss in spurious Spanish word with probability p1 Probability of not tossing in spurious word p0=1-p1 4/11/2019 LING 138/238 Autumn 2004

Distortion probabilities for spurious words
Can’t just have d(5|0,4,6), I.e. chance that NULL word will end up in position 5. Why? These are spurious words! Could occur anywhere!! To hard to predict Instead, Use normal-word distortion parameters to choose positions for normally-generated Spanish words Put Null-generated words into empty slots left over If three NULL-generated words, and three empty slots, then there are 3!, or six, ways for slotting them all in We’ll assign a probability of 1/6 for each way 4/11/2019 LING 138/238 Autumn 2004

Real Model 3 For each word ei in English sentence, choose fertility i with prob n(i| ei) Choose number 0 of spurious Spanish words to be generated from e0=NULL using p1 and sum of fertilities from step 1 Let m be sum of fertilities for all words including NULL For each i=0,1,2,…L , k=1,2,… I : choose Spanish word ikwith probability t(ik|ei) For each i=1,2,…L , k=1,2,… I : choose target Spanish position ikwith prob d(ik|I,L,m) For each k=1,2,…, 0 choose position 0k from 0 -k+1 remaining vacant positions in 1,2,…m for total prob of 1/ 0! Output Spanish sentence with words ik in positions ik (0<=I<=1,1<=k<= I) 4/11/2019 LING 138/238 Autumn 2004

String rewriting Mary did not slap the green witch (input)
Mary not slap slap slap the green witch (choose fertilities) Mary not slap slap slap NULL the green witch (choose number of spurious words) Mary no daba una botefada a la verde bruja (choose translations) Mary no daba una botefada a la bruja verde (choose target positions) 4/11/2019 LING 138/238 Autumn 2004

Model 3 parameters N,t,p,d If we had English strings and step-by-step rewritings into Spanish, we could: Compute n(0|did) by locating every instance of “did”, see what happens to it during first rewriting step If “did” appeared 15,000 times and was deleted during the first rewriting step 13,000 times, then n(0|did) = 13/15 4/11/2019 LING 138/238 Autumn 2004

Alignments NULL And the program has been implemented | | | | | | /|\
| | | | | | /|\ Le programme a ete mis en application If we had lots of alignments like this, n(0|d): how many times “did” connects to no French words T(maison|house) how many of all French words generated by “house” were “maison” D(5|2,4,6) out of all times some word2 moved somewhere, how many times to word5? 4/11/2019 LING 138/238 Autumn 2004

Where to get alignments
It turns out we can bootstrap alignments If we just have a bilingual corpus We can bootstrap alignments Assume some startup values for n,d,, etc Use values for n,d, , etc to use model 3 to do “forced alignment”; I.e. to pick the best word alignments between sentences Use these alignments to retrain n,d, , etc Go to 2 This is called the Expectation-Maximization or EM algorithm 4/11/2019 LING 138/238 Autumn 2004

Summary Intro and a little history
Language Similarities and Divergences Four main MT Approaches Transfer Interlingua Direct Statistical Evaluation 4/11/2019 LING 138/238 Autumn 2004

Classes LINGUIST 139M/239M. Human and Machine Translation. (Martin Kay) CSCI 224N. Natural Language Processing (Chris Manning) 4/11/2019 LING 138/238 Autumn 2004

Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky

Similar presentations

Presentation on theme: "Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky

Similar presentations

Presentation on theme: "Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky"— Presentation transcript:

Similar presentations

About project

Feedback