Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Probabilistic models Haixu Tang School of Informatics.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation IBM Model 1 CS626/CS460 Anoop Kunchukuttan Under the guidance of Prof. Pushpak Bhattacharyya.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 11 (Lab): Probability reminder.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
1 BASIC NOTIONS OF PROBABILITY THEORY. NLE 2 What probability theory is for Suppose that we have a fair dice, with six faces, and that we keep throwing.
DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.
1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:
September SOME BASIC NOTIONS OF PROBABILITY THEORY Universita’ di Venezia 29 Settembre 2003.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
1 Copyright M.R.K. Krishna Rao 2003 Chapter 5. Discrete Probability Everything you have learned about counting constitutes the basis for computing the.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
November 2005CSA3180: Statistics III1 CSA3202: Natural Language Processing Statistics 3 – Spelling Models Typing Errors Error Models Spellchecking Noisy.
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Some Probability Theory and Computational models A short overview.
CSCI 115 Chapter 3 Counting. CSCI 115 §3.1 Permutations.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
1 CHAPTERS 14 AND 15 (Intro Stats – 3 edition) PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY.
Ngram models and the Sparcity problem. The task Find a probability distribution for the current word in a text (utterance, etc.), given what the last.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original.
A Joint Source-Channel Model for Machine Transliteration Li Haizhou, Zhang Min, Su Jian Institute for Infocomm Research 21 Heng Mui Keng Terrace, Singapore.
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Probability Michael J. Watts
Machine Translation Course 4 Diana Trandab ă ț Academic year:
3/7/20161 Now it’s time to look at… Discrete Probability.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Computational Linguistics Seminar LING-696G Week 6.
Neural Codes. Neuronal codes Spiking models: Hodgkin Huxley Model (brief repetition) Reduction of the HH-Model to two dimensions (general) FitzHugh-Nagumo.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
AP Statistics From Randomness to Probability Chapter 14.
Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.
1 Neural Codes. 2 Neuronal Codes – Action potentials as the elementary units voltage clamp from a brain cell of a fly.
Conditional Probability
Lecture 15: Text Classification & Naive Bayes
Statistical NLP: Lecture 13
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
CSCI 5832 Natural Language Processing
Statistical Machine Translation
CSCI 5832 Natural Language Processing
N-Gram Model Formulas Word sequences Chain rule of probability
Machine Translation(MT)
Lecture 12: Machine Translation (II) November 4, 2004 Dan Jurafsky
A Path-based Transfer Model for Machine Translation
CSCI 5582 Artificial Intelligence
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
Presentation transcript:

Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight

Martin KayTranslation—Meaning2 Elementary Probability The probability of an event e occurring in a given trial. A number in the range giving the proportion of the trials in which e is expected to occur. 0—Never.5—half the time 1—Always

Martin KayTranslation—Meaning3 Elementary Probability P(e) — a priori probability. The chance that e happens P(f | e) — conditional probability. The probability of f happening in a situation where e happens. P(e, f) — joint probability. The probability of e and f happening together.

Martin KayTranslation—Meaning4 Elementary Probability Independent e f P(e | f) = P(f | e) = P(e) * P(f)

Martin KayTranslation—Meaning5 Bayes’ Rule Elementary Probability e f P(e, f) Dependent P(e | f) P(f) == P(f | e) P(e) P(e, f) = P(f, e) P(e) P(f) = P(f, e) P(e)

Martin KayTranslation—Meaning6 Bayes’ Rule P(e, f) = P(f, e) P(e) P(f)

Martin KayTranslation—Meaning7 Statistical Machine Translation P(e | f) — The probability that e is an English translation of the given French sentence f. argmax P(e | f) e The e that gives the maximum value of P(e | f) P(f | e) P(e) P(f) =  Does not effect the maximum

Martin KayTranslation—Meaning8 Statistical Machine Translation P(e | f) — The probability that e is an English translation of the given French sentence f. P(f | e) P(e) P(f) = Language modelTranslation model

Martin KayTranslation—Meaning9 The Noisy Channel Model A person wants to say e but, by the time it comes out, it has been corrupted by noise to become f. To make our best guess as to what was intended we reason about 1. The things English speakers are likely to say, and 2. The statistics of the corruption process

Martin KayTranslation—Meaning10 P(e) as a program eP(e)...

Martin KayTranslation—Meaning11 P(f | e) as a program e P(f | e)... f e Note: Arrows point to the right because this is a theory of how French sentences are generated

Martin KayTranslation—Meaning12 Two models are better than one... P(f | e) P(e) P(f) = Language modelTranslation model P(e | f)... because they constrain one another, so neither has to take as much responsibility

Martin KayTranslation—Meaning13 For example P(f | e):e and f contain words that are translations of one another in any order P(e):gives a high value to e iff it is grammatical.

Martin KayTranslation—Meaning14 Language Models — 1 Find all the n English sentences on the web. If a sentence occurs k times, assign it a probability of k/n. Problems Big data base Still does not contain many sentences

Martin KayTranslation—Meaning15 Language Models — 2 Use a probabilistic grammar Morphology Syntax...

Martin KayTranslation—Meaning16 Language Models — 3 Use N-grams Ihr naht euch wieder, schwankende Gestalten Die früh sich einst dem trüben Blick gezeigt. Versuch ich wohl, euch diesmal festzuhalten? Fühl ich mein Herz noch jenem Wahn geneigt? Ihr naht euch naht euch wieder euch wieder, schwankende wieder, schwankende Gestalten schwankende Gestalten Die Gestalten Die früh Die früh sich früh sich einst sich einst dem... Ihr naht euch naht euch wieder euch wieder, schwankende Other models can be used in place of this

Martin KayTranslation—Meaning17 Smoothing What happens when N-grams appear that have never been seen before? Answer: smoothing Construct, say 3-gram, 2-gram and 1-gram (2nd-order, 1st-order, and 0-th order) models and take a certain proportion of the probability estimate from each. let n k (s) be the probability estimate of s in the (k+1)-st order model. Estimate probability of “abc” as 3 n 3 (“abc”) + 2 n 2 (“bc”) + 1 n 1 (“c”) + 0 where = 1.

Martin KayTranslation—Meaning18 How does English become French? IBM Model 3 Replace English word by French words that appear opposite them in a bilingual dictionary and then scramble their order

Martin KayTranslation—Meaning19 Translation can change length Each English word e i has a fertility  i which gives the sequence number of the French word that will be generated for it. Each French has a target position in its sentence which is a function of the position in the English sentence of the word it translates.

Martin KayTranslation—Meaning20 Translation as string rewriting Hans ist nicht in dem Esszimmer gegangen Hans nicht into dem Esszimmer Esszimmer gegangen gegangen Hans not in the dining room did go Assign fertilities Apply fertilities Translate words Hans did not go into the dining room Permute

Martin KayTranslation—Meaning21 Parameters t(not, nicht): The probablity that German nicht will become English not. n(5 | 2) The probability that the English for a German word in position 2 of the sentence will be placed in position 5. p 1 The probability of a adding a spurious word, Add a word NULL at the beginning of the source sentence that can give rise to new (spurious) words in the target. These can be inserted anywhere after the other words have been arranged.

Martin KayTranslation—Meaning22 The Model-3 procedure 1. For each English word e i choose a fertility  i with probability n(  i | e i ). 2. Choose the number  o of NULL words to insert with probability p1 +  i. 3. Let m = sum of all fertilities (including  o ) 4. For i in (1..n) and k in in (1..  i ), choose a French word  ik with probability t(  ik | e i ). 5. For i in (1..  o ) choose a French position  ik for a NULL translation with a total probability 1/  o. 6. Arrange and output the sentence

Martin KayTranslation—Meaning23 Parameters n — fertilities t — translations d — position p — NULL translations 2 dimensons 1 dimension Scaler

Martin KayTranslation—Meaning24 Parameter Values Parameter values could be estimated easily on the basis of an English text and its translation into French where corresponding sentences have been aligned. An alignment of a pair of word strings is simply a mapping of the words of one string onto the words of the other. It can be represented by a vector A where A i = j if the i-th English word is translated by the j-th French word. The frequncy of a translation, ordering, etc. is simply the number of alignments in which it is observed.

Martin KayTranslation—Meaning25 Alignments Since alignments are not given, consider all k alignments for a given sentence pair, but add 1/k instead of 1 for the count for a given parameter. Given values for the parameters, we can estimate probabilities of the aligments. Given alignments, we can make better estimates of the parameters The EM-algorithm (Estimation Maximization)

Martin KayTranslation—Meaning26 Problems with Model 3 Distortions are a lousy model of word order. Long sentences have to many alignments for training. Maybe use only good alignments but, how to find them (and only them).