CSCI 5582 Artificial Intelligence

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Problems for Statistical MT Preprocessing Language modeling Translation modeling Decoding Parameter optimization Evaluation.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Arthur Chan Prepared for Advanced MT Seminar
Evaluation of Text Generation: Automatic Evaluation vs. Variation Amanda Stent, Mohit Singhai, Matthew Marge.
Machine Translation- 4 Autumn 2008 Lecture Sep 2008.
Machine Translation Introduction to Statistical MT.
MT Evaluation CA446 Week 9 Andy Way School of Computing Dublin City University, Dublin 9, Ireland
BLEU, Its Variants & Its Critics Arthur Chan Prepared for Advanced MT Seminar.
Orange: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin & Franz Josef Och (presented by Bilmes) or Orange: a.
CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 17 Jim Martin.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 26 Jim Martin.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
Automatic Evaluation Philipp Koehn Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology.
Natural Language Processing Expectation Maximization.
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
Machine Translation- 5 Autumn 2008 Lecture Sep 2008.
Matthew Snover (UMD) Bonnie Dorr (UMD) Richard Schwartz (BBN) Linnea Micciulla (BBN) John Makhoul (BBN) Study of Translation Edit Rate with Targeted Human.
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 16 5 September 2007.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Arthur Chan Prepared for Advanced MT Seminar
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
SMT – Final thoughts Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
Computing & Information Sciences Kansas State University Rabat, MoroccoHuman Language Technologies Workshop Human Language Technologies (HLT) Workshop.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Statistical Machine Translation Part III – Phrase-based SMT Alexander Fraser CIS, LMU München WSD and MT.
Machine Translation- 2 Autumn 2008 Lecture 17 4 Sep 2008.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Estimating N-gram Probabilities Language Modeling.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Computational Linguistics Seminar LING-696G Week 6.
Computing & Information Sciences Kansas State University Friday, 05 Dec 2008CIS 530 / 730: Artificial Intelligence Lecture 39 of 42 Friday, 05 December.
Statistical Machine Translation Part II: Word Alignments and EM
METEOR: Metric for Evaluation of Translation with Explicit Ordering An Improved Automatic Metric for MT Evaluation Alon Lavie Joint work with: Satanjeev.
Alexander Fraser CIS, LMU München Machine Translation
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Statistical Machine Translation Papers from COLING 2004
Statistical Machine Translation Part IIIb – Phrase-based Model
Machine Translation(MT)
Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky
SMT – Final thoughts David Kauchak CS159 – Spring 2019
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
CSCI 5582 Artificial Intelligence
Presentation transcript:

CSCI 5582 Artificial Intelligence Lecture 25 Jim Martin CSCI 5582 Fall 2006

Today 12/5 Machine Translation Review MT Phrase-based models Training Decoding Phrase-based models Evaluation CSCI 5582 Fall 2006

Readings Chapters 22 and 23 in Russell and Norvig Chapter 24 of Jurafsky and Martin CSCI 5582 Fall 2006

Machine Translation CSCI 5582 Fall 2006

Statistical MT Systems Spanish Broken English Spanish/English Bilingual Text Text Statistical Analysis Statistical Analysis What hunger have I, Hungry I am so, I am so hungry, Have I that hunger … Que hambre tengo yo I am so hungry CSCI 5582 Fall 2006

Statistical MT Systems Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Broken English Spanish English Translation Model P(s|e) Language Model P(e) Que hambre tengo yo I am so hungry Decoding algorithm argmax P(e) * P(s|e) e CSCI 5582 Fall 2006

Bayes Rule Que hambre tengo yo I am so hungry Broken English Spanish English Translation Model P(s|e) Language Model P(e) Que hambre tengo yo I am so hungry Decoding algorithm argmax P(e) * P(s|e) e Given a source sentence s, the decoder should consider many possible translations … and return the target string e that maximizes P(e | s) By Bayes Rule, we can also write this as: P(e) x P(s | e) / P(s) and maximize that instead. P(s) never changes while we compare different e’s, so we can equivalently maximize this: P(e) x P(s | e) CSCI 5582 Fall 2006

Four Problems for Statistical MT Language model Given an English string e, assigns P(e) by the usual methods we’ve been using sequence modeling. Translation model Given a pair of strings <f,e>, assigns P(f | e) again by making the usual markov assumptions Training Getting the numbers needed for the models Decoding algorithm Given a language model, a translation model, and a new sentence f … find translation e maximizing P(e) * P(f | e) CSCI 5582 Fall 2006

Language Model Trivia Google Ngrams data Number of tokens: 1,024,908,267,229 Number of sentences: 95,119,665,584 Number of unigrams: 13,588,391 Number of bigrams: 314,843,401 Number of trigrams: 977,069,902 Number of fourgrams: 1,313,818,354 Number of fivegrams: 1,176,470,663 CSCI 5582 Fall 2006

3 Models IBM Model 1 IBM Model 3 Phrase-Based Models (Google/ISI) Dumb word to word IBM Model 3 Handles deletions, insertions and 1-to-N translations Phrase-Based Models (Google/ISI) Basically Model 1 with phrases instead of words CSCI 5582 Fall 2006

Alignment Probabilities Recall what of all of the models are doing Argmax P(e|f) = P(f|e)P(e) In the simplest models P(f|e) is just direct word-to-word translation probs. So let’s start with how to get those, since they’re used directly or indirectly in all the models. CSCI 5582 Fall 2006

Training alignment probabilities Step 1: Get a parallel corpus Hansards Canadian parliamentary proceedings, in French and English Hong Kong Hansards: English and Chinese Step 2: Align sentences Step 3: Use EM to train word alignments. Word alignments give us the counts we need for the word to word P(f|e) probs CSCI 5582 Fall 2006

Step 3: Word Alignments Of course, sentence alignments aren’t what we need. We need word alignments to get the stats we need. It turns out we can bootstrap word alignments from raw sentence aligned data (no dictionaries) Using EM Recall the basic idea of EM. A model predicts the way the world should look. We have raw data about how the world looks. Start somewhere and adjust the numbers so that the model is doing a better job of predicting how the world looks. CSCI 5582 Fall 2006

EM Training: Word Alignment Probs … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … All word alignments equally likely All P(french-word | english-word) equally likely. CSCI 5582 Fall 2006

EM Training Constraint Recall what we’re doing here… Each English word has to translate to some french word. But its still true that CSCI 5582 Fall 2006

EM for training alignment probs … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … “la” and “the” observed to co-occur frequently, so P(la | the) is increased. CSCI 5582 Fall 2006 Slide from Kevin Knight

EM for training alignment probs … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … “house” co-occurs with both “la” and “maison”, but P(maison | house) can be raised without limit, to 1.0, while P(la | house) is limited because of “the” (pigeonhole principle) CSCI 5582 Fall 2006 Slide from Kevin Knight

EM for training alignment probs … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … settling down after another iteration CSCI 5582 Fall 2006 Slide from Kevin Knight

EM for training alignment probs … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … Inherent hidden structure revealed by EM training! For details, see: Section 24.6.1 in the chapter “A Statistical MT Tutorial Workbook” (Knight, 1999). “The Mathematics of Statistical Machine Translation” (Brown et al, 1993) Free Alignment Software: GIZA++ CSCI 5582 Fall 2006 Slide from Kevin Knight

Direct Translation … la maison … la maison bleue … la fleur … … the house … the blue house … the flower … P(juste | fair) = 0.411 P(juste | correct) = 0.027 P(juste | right) = 0.020 … Possible English translations, rescored by language model New French sentence CSCI 5582 Fall 2006

Phrase-Based Translation Generative story here has three steps Discover and align phrases during training Align and translate phrases during decoding Finally move the phrases around CSCI 5582 Fall 2006

Phrase-based MT Language model P(E) Translation model P(F|E) How to train the model Decoder: finding the sentence E that is most probable CSCI 5582 Fall 2006

Generative story again Group English source words into phrases e1, e2, …, en Translate each English phrase ei into a Spanish phrase fj. The probability of doing this is (fj|ei) Then (optionally) reorder each Spanish phrase We do this with a distortion probability A measure of distance between positions of a corresponding phrase in the 2 languages “What is the probability that a phrase in position X in the English sentences moves to position Y in the Spanish sentence?” CSCI 5582 Fall 2006

Distortion probability The distortion probability is parameterized by The start position of the foreign (Spanish) phrase generated by the ith English phrase ei. The end position of the foreign (Spanish) phrase generated by the I-1th English phrase ei-1. We’ll call the distortion probability d(.) CSCI 5582 Fall 2006

Final translation model for phrase-based MT Let’s look at a simple example with no distortion CSCI 5582 Fall 2006

Training P(F|E) What we mainly need to train is (fj|ei) Assume as before we have a large bilingual training corpus And suppose we knew exactly which phrase in Spanish was the translation of which phrase in the English We call this a phrase alignment If we had this, we could just count-and-divide: CSCI 5582 Fall 2006

But we don’t have phrase alignments What we have instead are word alignments: CSCI 5582 Fall 2006

Getting phrase alignments To get phrase alignments: We first get word alignments How? EM as before… Then we “symmetrize” the word alignments into phrase alignments CSCI 5582 Fall 2006

Final Problem Decoding… Given a trained model and a foreign sentence produce Argmax P(e|f) Can’t use Viterbi it’s too restrictive Need a reasonable efficient search technique that explores the sequence space based on how good the options look… A* CSCI 5582 Fall 2006

A* Recall for A* we need Goal State Operators Heuristic CSCI 5582 Fall 2006

A* Recall for A* we need Goal State Good coverage of source Operators Translation of phrases/words distortions deletions/insertions Heuristic Probabilities (tweaked) CSCI 5582 Fall 2006

A* Decoding Why not just use the probability as we go along? Turns it into Uniform-cost not A* That favors shorter sequences over longer ones. Need to counter-balance the probability of the translation so far with its “progress towards the goal”. CSCI 5582 Fall 2006

A*/Beam Sorry… Even that doesn’t work because the space is too large So as we go we’ll prune the space as paths fall below some threshold CSCI 5582 Fall 2006

A* Decoding CSCI 5582 Fall 2006

A* Decoding CSCI 5582 Fall 2006

A* Decoding CSCI 5582 Fall 2006

Break Homework Last quiz… Average over the quizzes I’m going to send out an additional test set Last quiz… Next Average over the quizzes 81% with a sd of 11… That’s (q1/55 + q2/50 + q3/50)/3 CSCI 5582 Fall 2006

Break Quiz True Forward EM W,W,D Yes Anything CSCI 5582 Fall 2006

WWD Hard way WWD WDD DWD DDD CSCI 5582 Fall 2006

YES Red, Square YES NO P(Red|Yes)P(Square|Yes)P(Yes) = .5*.5*.6 = .15 NO P(Red|No)P(Square|No) = = .5*.5*.4=.1 CSCI 5582 Fall 2006

Anything All three features give .6 accuracy. Doesn’t matter which is chosen it’s arbitrary F1 R: 1,2,3,7,10: 3Y,2N 3 Right G: 5,6,8: 2Y, 1N 2 Right B: 4,9: 1Y, 1N 1 Right CSCI 5582 Fall 2006

Evaluation There are 2 dimensions along which MT systems can be evaluated Fluency How good is the output text as an example of the target language Fidelity How well does the output text convey the source text Information content and style CSCI 5582 Fall 2006

Evaluating MT: Human tests for fluency Rating tests: Give the raters a scale (1 to 5) and ask them to rate Or distinct scales for Clarity, Naturalness, Style Or check for specific problems Cohesion (Lexical chains, anaphora, ellipsis) Hand-checking for cohesion. Well-formedness 5-point scale of syntactic correctness CSCI 5582 Fall 2006

Evaluating MT: Human tests for fidelity Adequacy Does it convey the information in the original? Ask raters to rate on a scale Bilingual raters: give them source and target sentence, ask how much information is preserved Monolingual raters: give them target + a good human translation CSCI 5582 Fall 2006

Evaluating MT: Human tests for fidelity Informativeness Task based: is there enough info to do some task? CSCI 5582 Fall 2006

Evaluating MT: Problems Asking humans to judge sentences on a 5-point scale for 10 factors takes time and $$$ (weeks or months!) Can’t build language engineering systems if we can only evaluate them once every quarter!!!! Need a metric that we can run every time we change our algorithm. It’s OK if it isn’t perfect, just needs to correlate with the human metrics, which we could still run in periodically. CSCI 5582 Fall 2006 Bonnie Dorr

Automatic evaluation Assume we have one or more human translations of the source passage Compare the automatic translation to these human translations using some simple metric Bleu NIST Meteor Precision/Recall CSCI 5582 Fall 2006

BiLingual Evaluation Understudy (BLEU) Automatic Technique Requires the pre-existence of Human (Reference) Translations Approach: Produce corpus of high-quality human translations Judge “closeness” numerically (word-error rate) Compare n-gram matches between candidate translation and 1 or more reference translations CSCI 5582 Fall 2006 Slide from Bonnie Dorr

BLEU Evaluation Metric Reference (human) translation: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport . N-gram precision (score is between 0 & 1) What percentage of machine n-grams can be found in the reference translation? An n-gram is an sequence of n words Not allowed to use same portion of reference translation twice (can’t cheat by typing out “the the the the the”) Brevity penalty Can’t just type out single word “the” (precision 1.0!) *** Amazingly hard to “game” the system (i.e., find a way to change machine output so that BLEU goes up, but quality doesn’t) Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance. CSCI 5582 Fall 2006 Slide from Bonnie Dorr

BLEU Evaluation Metric Reference (human) translation: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport . BLEU4 formula (counts n-grams up to length 4) exp (1.0 * log p1 + 0.5 * log p2 + 0.25 * log p3 + 0.125 * log p4 – max(words-in-reference / words-in-machine – 1, 0) p1 = 1-gram precision P2 = 2-gram precision P3 = 3-gram precision P4 = 4-gram precision Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance. CSCI 5582 Fall 2006 Slide from Bonnie Dorr

Multiple Reference Translations Reference translation 1: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport . Reference translation 3: The US International Airport of Guam and its office has received an email from a self-claimed Arabian millionaire named Laden , which threatens to launch a biochemical attack on such public places as airport . Guam authority has been on alert . Reference translation 4: US Guam International Airport and its office received an email from Mr. Bin Laden and other rich businessman from Saudi Arabia . They said there would be biochemistry air raid to Guam Airport and other public places . Guam needs to be in high precaution about this matter . Reference translation 2: Guam International Airport and its offices are maintaining a high state of alert after receiving an e-mail that was from a person claiming to be the wealthy Saudi Arabian businessman Bin Laden and that threatened to launch a biological and chemical attack on the airport and other public places . Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance. Reference translation 1: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport . Reference translation 3: The US International Airport of Guam and its office has received an email from a self-claimed Arabian millionaire named Laden , which threatens to launch a biochemical attack on such public places as airport . Guam authority has been on alert . Reference translation 4: US Guam International Airport and its office received an email from Mr. Bin Laden and other rich businessman from Saudi Arabia . They said there would be biochemistry air raid to Guam Airport and other public places . Guam needs to be in high precaution about this matter . Reference translation 2: Guam International Airport and its offices are maintaining a high state of alert after receiving an e-mail that was from a person claiming to be the wealthy Saudi Arabian businessman Bin Laden and that threatened to launch a biological and chemical attack on the airport and other public places . Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance. CSCI 5582 Fall 2006 Slide from Bonnie Dorr

BLEU in Action 枪手被警方击毙。 (Foreign Original) the gunman was shot to death by the police . (Reference Translation) the gunman was police kill . #1 wounded police jaya of #2 the gunman was shot dead by the police . #3 the gunman arrested by police kill . #4 the gunmen were killed . #5 the gunman was shot to death by the police . #6 gunmen were killed by police ?SUB>0 ?SUB>0 #7 al by the police . #8 the ringer is killed by the police . #9 police killed the gunman . #10 CSCI 5582 Fall 2006 Slide from Bonnie Dorr

BLEU in Action 枪手被警方击毙。 (Foreign Original) the gunman was shot to death by the police . (Reference Translation) the gunman was police kill . #1 wounded police jaya of #2 the gunman was shot dead by the police . #3 the gunman arrested by police kill . #4 the gunmen were killed . #5 the gunman was shot to death by the police . #6 gunmen were killed by police ?SUB>0 ?SUB>0 #7 al by the police . #8 the ringer is killed by the police . #9 police killed the gunman . #10 green = 4-gram match (good!) red = word not matched (bad!) CSCI 5582 Fall 2006 Slide from Bonnie Dorr

Bleu Comparison Chinese-English Translation Example: Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party. Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct. Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed the directions of the party. CSCI 5582 Fall 2006 Slide from Bonnie Dorr

BLEU Tends to Predict Human Judgments (variant of BLEU) CSCI 5582 Fall 2006