Recap: WordNet A very large lexical database of English:

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

The Structure of Sentences Asian 401
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation Course 9 Diana Trandab ă ț Academic year
Lexical Semantics and Word Senses Hongning Wang
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Albert Gatt Corpora and Statistical Methods Lecture 9.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Machine translation Context-based approach Lucia Otoyo.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Lexical Semantics and Word Senses Hongning Wang
Recap: distributional hypothesis What is tezgüino? – A bottle of tezgüino is on the table. – Everybody likes tezgüino. – Tezgüino makes you drunk. – We.
KNN & Naïve Bayes Hongning Wang
Statistical Machine Translation Part II: Word Alignments and EM
Approaches to Machine Translation
Chapter 9 Structuring System Requirements: Logic Modeling
Statistical NLP: Lecture 13
Data Mining Lecture 11.
CSCI 5832 Natural Language Processing
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Relevance Feedback Hongning Wang
CSCI 5832 Natural Language Processing
WordNet: A Lexical Database for English
Chapter 9 Structuring System Requirements: Logic Modeling
CSCI 5832 Natural Language Processing
Chapter 9 Structuring System Requirements: Logic Modeling
CSCI 5832 Natural Language Processing
WordNet WordNet, WSD.
Expectation-Maximization Algorithm
Approaches to Machine Translation
CS4705 Natural Language Processing
Machine Translation and MT tools: Giza++ and Moses
Memory-augmented Chinese-Uyghur Neural Machine Translation
Machine Translation(MT)
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Machine Translation and MT tools: Giza++ and Moses
Chapter 9 Structuring System Requirements: Logic Modeling
A Path-based Transfer Model for Machine Translation
Chapter 9 Structuring System Requirements: Logic Modeling
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
CSC 578 Neural Networks and Deep Learning
CS249: Neural Language Model
Natural Language Processing Machine Translation
Presentation transcript:

Recap: WordNet A very large lexical database of English: George Miller, Cognitive Science Laboratory of Princeton University, 1985 A very large lexical database of English: 117K nouns, 11K verbs, 22K adjectives, 4.5K adverbs Word senses grouped into synonym sets (“synsets”) linked into a conceptual-semantic hierarchy 82K noun synsets, 13K verb synsets, 18K adjectives synsets, 3.6K adverb synsets Avg. # of senses: 1.23/noun, 2.16/verb, 1.41/adj, 1.24/adverb Conceptual-semantic relations hypernym/hyponym CS@UVa CS 6501: Text Mining

Recap: WordNet similarity Path based similarity measure between words Shortest path between two concepts (Leacock & Chodorow 1998) sim = 1/|shortest path| Path length to the root node from the least common subsumer (LCS) of the two concepts (Wu & Palmer 1994) sim = 2*depth(LCS)/(depth(w1)+depth(w2)) http://wn-similarity.sourceforge.net/ the most specific concept which is an ancestor of both A and B. CS@UVa CS 6501: Text Mining

Recap: distributional semantics Use the contexts in which words appear to measure their similarity Assumption: similar contexts => similar meanings Approach: represent each word 𝑤 as a vector of its contexts 𝑐 Vector space representation Each dimension corresponds to a particular context 𝑐 𝑛 Each element in the vector of 𝑤 captures the degree to which the word 𝑤 is associated with the context 𝑐 𝑛 Similarity metric Cosine similarity CS@UVa CS 6501: Text Mining

Recap: signature of target word “The bank refused to give me a loan.” Simplified Lesk Words in context Signature(bank) = {refuse, give, loan} Original Lesk Augmented signature of the target word Signature(bank) = {refuse, reject, request,... , give, gift, donate,... loan, money, borrow,...} CS@UVa CS 6501: Text Mining

Statistical Machine Translation Hongning Wang CS@UVa

Machine translation CS@UVa CS 6501: Text Mining

How do human translate languages? Is a bilingual dictionary sufficient? John loves Mary. Jean aime Marie. John told Mary a story. Jean a raconté une histoire à Marie. John is a computer scientist. Jean est informaticien. John swam across the lake. Jean a traversé le lac à la nage. CS@UVa CS 6501: Text Mining

Correspondences One-to-one One-to-many/many-to-one Many-to-many A bilingual dictionary is clearly insufficient! One-to-one John = Jean, aime = loves, Mary=Marie One-to-many/many-to-one Mary = [à Marie] [a computer scientist] = informaticien Many-to-many [swam across __] = [a traversé __ à la nage] Reordering required told Mary1 [a story]2 = a raconté [une histoire]2 [à Marie]1 CS@UVa CS 6501: Text Mining

Lexical divergences Different senses of homonymous words generally have different translations Different senses of polysemous words may also have different translations English - German (river) bank - Ufer (financial) bank - Bank I know that he bought the book: Je sais qu’il a acheté le livre. I know Peter: Je connais Peter. I know math: Je m’y connais en maths. CS@UVa CS 6501: Text Mining

Syntactic divergences Word order SVO (Sbj-Verb-Obj), SOV, VSO,… fixed or free? Head-marking vs. dependent-marking Dependent-marking (English): the man’s house Head-marking (Hungarian): the man house-his Pro-drop languages can omit pronouns Italian (with inflection): I eat = mangio; he eats = mangia Chinese (without inflection): I/he eat: chīfàn CS@UVa CS 6501: Text Mining

Semantic divergences Aspect English has a progressive aspect ‘Peter swims’ vs. ‘Peter is swimming’ German can only express this with an adverb: ‘Peter schwimmt’ vs. ‘Peter schwimmt gerade’ Clearly, a bilingual dictionary is insufficient; and machine translation is difficult! CS@UVa CS 6501: Text Mining

Machine translation approaches The Vauquois triangle CS@UVa CS 6501: Text Mining

Statistical machine translation Main stream of current machine translation paradigm The idea was introduced by Warren Weaver in 1949 Re-introduced in 1993 by researchers at IBM's Thomas J. Watson Research Center Now it is the most widely studied/used machine translation method 1966: ALPAC report: human translation is far cheaper and better - kills MT for a long time CS@UVa CS 6501: Text Mining

Noisy-Channel framework [Shannon 48] Translating French to English 𝐸𝑛 𝑔 ∗ =𝑎𝑟𝑔𝑚𝑎 𝑥 𝐸𝑛𝑔 𝑝(𝐸𝑛𝑔|𝐹𝑟𝑒) P(Fre|Eng) Source Transmitter (encoder) Noisy Channel Receiver (decoder) Destination Eng Fre Eng’ P(Eng) Observation Guessed input P(Eng’|Fre)=? Language model Translation model CS@UVa CS 6501: Text Mining

Translation with a noisy channel model Bayes rule Translation model 𝑝 𝐹𝑟𝑒 𝐸𝑛𝑔 should capture the faithfulness of the translation. It needs to be trained on a parallel corpus Language model p(Eng) should capture the fluency of the translation. It can be trained on a very large monolingual corpus 𝐸𝑛 𝑔 ∗ =𝑎𝑟𝑔𝑚𝑎 𝑥 𝐸𝑛𝑔 𝑝(𝐸𝑛𝑔|𝐹𝑟𝑒) =𝑎𝑟𝑔𝑚𝑎 𝑥 𝐸𝑛𝑔 𝑝 𝐹𝑟𝑒 𝐸𝑛𝑔 𝑝(𝐸𝑛𝑔) Observed (given) Translation Model Language Model CS@UVa CS 6501: Text Mining

Parallel corpora The same text in two (or more) languages High-quality manually crafted translations European Parliament Proceedings Parallel Corpus CS@UVa CS 6501: Text Mining

Parallel corpora The same text in two (or more) languages High-quality manually crafted translations CS@UVa CS 6501: Text Mining

Parallel corpora The same text in two (or more) languages High-quality manually crafted translations CS@UVa CS 6501: Text Mining

Translation model 𝑝 𝐹𝑟𝑒 𝐸𝑛𝑔 Specifying translation probabilities This probability needs word-alignment to estimate English French Frequency green witch grüne Hexe … at home zuhause 10534 daheim 9890 is ist 598012 this week diese Woche CS@UVa CS 6501: Text Mining

Recap: machine translation approaches The Vauquois triangle CS@UVa CS 6501: Text Mining

Recap: translation with a noisy channel model Bayes rule Translation model 𝑝 𝐹𝑟𝑒 𝐸𝑛𝑔 should capture the faithfulness of the translation. It needs to be trained on a parallel corpus. Language model p(Eng) should capture the fluency of the translation. It can be trained on a very large monolingual corpus. 𝐸𝑛 𝑔 ∗ =𝑎𝑟𝑔𝑚𝑎 𝑥 𝐸𝑛𝑔 𝑝(𝐸𝑛𝑔|𝐹𝑟𝑒) =𝑎𝑟𝑔𝑚𝑎 𝑥 𝐸𝑛𝑔 𝑝 𝐹𝑟𝑒 𝐸𝑛𝑔 𝑝(𝐸𝑛𝑔) Observed (given) Translation Model Language Model CS@UVa CS 6501: Text Mining

Estimation of translation probability If we have ground-truth word-alignments in the parallel corpus, maximum likelihood estimator is sufficient 𝑝 𝑓 𝑒 = 𝑐(𝑒→𝑓) 𝑤 𝑐(𝑒→𝑤) John told Mary a story. Jean a raconté une histoire à Marie. CS@UVa CS 6501: Text Mining

Language model p(Eng) Specifying the likelihood of observing a sentence in the target language N-gram language model Relax the language complexity Occurrence of current word only depends on previous N-1 words: 𝑝 𝑤 1 … 𝑤 𝑛 = 𝑖 𝑝( 𝑤 𝑖 | 𝑤 𝑖−1 ,…, 𝑤 𝑖−𝑁−1 ) CS@UVa CS 6501: Text Mining

Language model p(Eng) Specifying the likelihood of observing a sentence in the target language Google (2007) uses 5-grams to 7-grams, which result in huge models, but the effect on translation quality levels off quickly CS@UVa CS 6501: Text Mining

Statistical machine translation CS@UVa CS 6501: Text Mining

IBM translation models A generative model based on noisy channel framework Generate the translation sentence e with regard to the given sentence f by a stochastic process Generate the length of f Generate the alignment of e to the target sentence f Generate the words of f 𝐸𝑛 𝑔 ∗ =𝑎𝑟𝑔𝑚𝑎 𝑥 𝐸𝑛𝑔 𝑝 𝐹𝑟𝑒 𝐸𝑛𝑔 𝑝(𝐸𝑛𝑔) CS@UVa CS 6501: Text Mining

Word alignment One to one, one to many and reordering John told Mary a story. ___ _ ____ ___ ______ ____ ______. John told Mary a story Source sentence CS@UVa CS 6501: Text Mining

Word alignment One to one, one to many and reordering John told Mary a story. ___ _ ____ ___ ______ ____ ______. John told Mary a story Source sentence CS@UVa CS 6501: Text Mining

Word alignment One to one, one to many and reordering John told Mary a story. ___ _ ____ ___ ______ ____ ______. John told Mary a story Source sentence CS@UVa CS 6501: Text Mining

Word alignment One to one, one to many and reordering John told Mary a story. Target sentence Jean a raconté une histoire à Marie. Jean a raconté une histoire à Marie John told Mary Story Source sentence CS@UVa CS 6501: Text Mining

Word alignment Many to one and missing word John swam across the lake. Target sentence Jean a traversé le lac à la nage. A special symbol Jean a traversé le lac à la nage NULL John swam across the lake Source sentence CS@UVa CS 6501: Text Mining

Representing word alignments Alignment table 1 2 3 4 5 6 7 8 Jean a traversé le lac à la nage NULL John swam across the lake Target Position 1 2 3 4 5 6 7 8 Source Position CS@UVa CS 6501: Text Mining

IBM translation models Translation model with word alignment 𝑝 𝐹𝑟𝑒 𝐸𝑛𝑔 = 𝑎∈𝐴(𝐸𝑛𝑔,𝐹𝑟𝑒) 𝑝(𝐹𝑟𝑒,𝑎|𝐸𝑛𝑔) Generate the words of f with respect to alignment 𝒂 marginalize over all possible alignments 𝑎 𝑝 𝒇,𝒂 𝒆 =𝑝(𝑚|𝒆) 𝑗=1 𝑚 𝑝 𝑎 𝑗 𝑎 1..𝑗−1 , 𝑓 1,..𝑗−1 ,𝑚,𝒆 𝑝( 𝑓 𝑗 | 𝑎 1..𝑗 , 𝑓 1,..𝑗−1 ,𝑚,𝒆) Length of target sentence f Word alignment 𝑎 𝑗 Translation of 𝑓 𝑗 CS@UVa CS 6501: Text Mining

IBM translation models Sequence of 5 translation models Different assumptions and realization of the components in the translation models, i.e., length model, alignment model and translation model Model 1 is the simplest and becomes the basis of follow-up IBM translation models CS@UVa CS 6501: Text Mining

Parameters in Model 1 Length probability 𝑝(𝑚|𝒆) Probability of generating a source sentence of length 𝑚 given a target sentence 𝒆 Assumed to be a constant - 𝑝 𝑚 𝒆 =𝜖 Alignment probability 𝑝 𝑎 𝒆 Probability of source position 𝑖 is aligned to target position 𝑗 Assumed to be uniform - 𝑝 𝑎 𝒆 = 1 𝑛 length of source sentence CS@UVa CS 6501: Text Mining

Parameters in Model 1 Translation probability 𝑝 𝑓 𝑎,𝑒 Probability of English word e i is translated to French word 𝑓 𝑗 - 𝑝 𝑓 𝑗 𝑒 𝑎 𝑗 After the simplification, Model 1 becomes 𝑝 𝒇,𝒂 𝒆 =𝑝(𝑚|𝒆) 𝑗=1 𝑚 𝑝 𝑎 𝑗 𝑎 1..𝑗−1 , 𝑓 1,..𝑗−1 ,𝑚,𝒆 𝑝( 𝑓 𝑗 | 𝑎 1..𝑗 , 𝑓 1,..𝑗−1 ,𝑚,𝒆) = 𝜖 𝑛+1 𝑚 𝑗=1 𝑚 𝑝( 𝑓 𝑗 | 𝑒 𝑎 𝑗 ) We add a NULL word in the source sentence CS@UVa CS 6501: Text Mining

Generative process in Model 1 For a particular English sentence 𝑒 = 𝑒 1 .. 𝑒 𝑛 of length 𝑛 Source 1 2 3 4 5 NULL John swam across the lake Order of actions 1. Choose a length 𝑚 for the target sentence (e.g m = 8) Transmitter 1 2 3 4 5 6 7 8 ? 2. Choose an alignment 𝑎= 𝑎 1 … 𝑎 𝑚 for the source sentence Target Position 1 2 3 4 5 6 7 8 Source Position 3. Translate each source word 𝑒 𝑎 𝑗 into the target language English John across the lake NULL swam Alignment 1 3 4 5 2 Encoded Jean a traversé le lac à la nage CS@UVa CS 6501: Text Mining

Decoding process in Model 1 𝑝 𝒆 𝒇 =1 𝑒 −55 For a particular English sentence 𝑒 = 𝑒 1 .. 𝑒 𝑛 of length 𝑛 1 2 3 4 5 NULL John flies across the river Search through all English sentences 𝑝(𝒆) Order of actions 1. Choose a length 𝑚 for the target sentence (e.g m = 8) 1 2 3 4 5 6 7 8 ? Search through all possible alignments 𝑝 𝑚 𝒆 =𝜖 2. Choose an alignment 𝑎= 𝑎 1 … 𝑎 𝑚 for the source sentence 𝑝 𝑎 𝒆 = 1 𝑛 Target Position 1 2 3 4 5 6 7 8 Source Position 𝑗=1 𝑚 𝑝( 𝑓 𝑗 | 𝑒 𝑎 𝑗 ) 3. Translate each source word 𝑒 𝑎 𝑗 into the target language English John flies the river NULL across Alignment 1 2 4 5 3 Encoded Jean a traversé le lac à la nage Receiver CS@UVa CS 6501: Text Mining

Decoding process in Model 1 𝑝 𝒆 𝒇 =1 𝑒 −15 For a particular English sentence 𝑒 = 𝑒 1 .. 𝑒 𝑛 of length 𝑛 1 2 3 4 5 NULL John swam across the lake Search through all English sentences 𝑝(𝒆) Order of action 1. Choose a length 𝑚 for the target sentence (e.g m = 8) 1 2 3 4 5 6 7 8 ? Search through all possible alignments 𝑝 𝑚 𝒆 =𝜖 2. Choose an alignment 𝑎= 𝑎 1 … 𝑎 𝑚 for the source sentence 𝑝 𝑎 𝒆 = 1 𝑛 Target Position 1 2 3 4 5 6 7 8 Source Position 𝑗=1 𝑚 𝑝( 𝑓 𝑗 | 𝑒 𝑎 𝑗 ) 3. Translate each source word 𝑒 𝑎 𝑗 into the target language English John across the lake NULL swam Alignment 1 3 4 5 2 Encoded Jean a traversé le lac à la nage Receiver CS@UVa CS 6501: Text Mining

Decoding process in Model 1 Search space is huge Presumably all “sentences” in English English sentence length is unknown All permutation of words in the vocabulary Heuristics to reduce search space Trade-off between translation accuracy and efficiency CS@UVa CS 6501: Text Mining

Estimation of translation probability If we do not have ground-truth word-alignments, appeal to Expectation Maximization algorithm Intuitively, guess the alignment based on the current translation probability first; and then update the translation probability EM algorithm will be carefully discussed in our later lecture of “Text Clustering” CS@UVa CS 6501: Text Mining

Other translation models IBM models 2-5 are more complex Word order and string position of the aligned words Phase-based translation in the source and target languages Incorporate syntax or quasi-syntactic structures Greatly reduce search space CS@UVa CS 6501: Text Mining

What you should know Challenges in machine translation Lexical/syntactic/semantic divergences Statistical machine translation Source-channel framework for statistical machine translation Generative process IBM model 1 Idea of word alignment CS@UVa CS 6501: Text Mining

Today’s reading Speech and Language Processing Chapter 25: Machine Translation CS@UVa CS 6501: Text Mining