Download presentation
Presentation is loading. Please wait.
Published byNatalie Hawkins Modified over 9 years ago
1
Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter
2
Introduction ● Word translation disambiguation: Word sense disambiguation, dealing with the special case of translation Plant Flora ex: “plant and animal life” Translations: French = “flore” Chinese = “zhiwu” Factory ex: “Nissan car and truck plant” Translations: French = “usine” Chinese = “gongchang”
3
● Supervised learning? Expensive! ● Better: bootstrapping: Small amount of training data and as new data is classified, use it as training as well ● Monolingual bootstrapping: ● Train with a few English sentences containing an ambiguous word, and labeled with Chinese translation of this word ● Create a classifier, and classify new sentences ● Further train with these ● Bilingual bootstrapping: ● Similar to monolingual, but classifier for each language using classified data in BOTH languages ● how? We'll see!
4
● One-sense-per-discourse heuristic: when an ambiguous word appears in the same text many times, it usually has the same sense Step 1: Create the classifier context = words surrounding ambiguous word in a sentence for each (ambiguous word){ for each (possible sense){ use the classified data to create a binary classifier with the classes this sense, and not this sense * each class contains: (context, sense, probability) trios }} ● * naïve Bayesian ensemble: a linear combination of for each (word in context) calculate probability of this sense given this word Monolingual Bootstrapping
5
Step 2: Classify new data context = words surrounding ambiguous word in a sentence aWord = an ambiguous word C= aWord's classified data U= aWord's unclassified data for each (aWord){ for each (context in U){ calculate the most probable word sense given this context if (probability is above a threshold) Store this context and sense } C = C + (context, sense, probability) for the top b probabilities U = U - context }
6
Bilingual Bootstrapping ● Similar to monolingual bootstrapping ● Adds these extentions: ● repeatedly constructs classifiers in both languages in parallel ● boosts performance of classifiers by exchanging info between the languages
7
Initially: some classified data
8
After classifying some new data
9
The Classifier ● Appropriate Chinese classifications transformed (translated) to English and included in the corresponding English classifier aWord = an ambiguous word E = aWord's classified data for English Ce= Classified data for Chinese words that can be translations of aWord. ie, the links 1 and 2 in the diagram. ● Classifier is similar to monolingual bootstrapping, only Classified data = E + C
10
Monolingual VS Bilingual Bootstrapping ● Bilingual can always perform better ● Why? Asymmetric relationship (many-to-many mapping) between the ambiguous words in the two languages ● Classes A and D are equivalent ● can transform instances with D and use to boost performance on classification to A ● Misclassified in D ('x's) should be in C, which is not related to A. So little negative effect ● Monolingual can only use instances in A and B. When # of misclassified inscreases, performance will stop improving
11
Experiment Experimental Settings ● resolves ambiguities on only selectes ambiguous words such as line and interest ● Bilingual uses only pre classified data in English (this is ok, it will share these with the Chinese side) ● Consider two implementatoions of monolingual classifier: 1. using naïve Bayesian ensemble (MD-B) 2. using decision lists (MD-D)
12
Experiment ● Apply all three implementations on certain words (line, interest) using a benchmark data set (contains mostly Wall Street Journal data). Parts served for training data, others for test data ● Collect words that could be translations from HIT distionary ● For each sense, used their intuition to pick an English word to describe it (seed word) ● View these seed words as a classified “sentence” ● Unclassified data: from the web (news sites). The distribution of senses was roughly balances
13
Results: ● Used a baseline method (Major): always choose most frequent sense ● BB consistently and significantly out performs all other unsupervised methods ● BB performs well even against supervised methods, and has the additional plus of being unsupersivised and therefore less expensive
14
Conclusion ● Bilingual bootstrapping is pretty good! ● It has the advantages of being unsupervised, without the usual performance loss ● Future work ● theoretical analysis (ex: generalization error) ● extention to more complicated machine translation tasks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.