Presentation is loading. Please wait.

Presentation is loading. Please wait.

Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter.

Similar presentations


Presentation on theme: "Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter."— Presentation transcript:

1 Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter

2 Introduction ● Word translation disambiguation: Word sense disambiguation, dealing with the special case of translation Plant Flora ex: “plant and animal life” Translations: French = “flore” Chinese = “zhiwu” Factory ex: “Nissan car and truck plant” Translations: French = “usine” Chinese = “gongchang”

3 ● Supervised learning? Expensive! ● Better: bootstrapping: Small amount of training data and as new data is classified, use it as training as well ● Monolingual bootstrapping: ● Train with a few English sentences containing an ambiguous word, and labeled with Chinese translation of this word ● Create a classifier, and classify new sentences ● Further train with these ● Bilingual bootstrapping: ● Similar to monolingual, but classifier for each language using classified data in BOTH languages ● how? We'll see!

4 ● One-sense-per-discourse heuristic: when an ambiguous word appears in the same text many times, it usually has the same sense Step 1: Create the classifier context = words surrounding ambiguous word in a sentence for each (ambiguous word){ for each (possible sense){ use the classified data to create a binary classifier with the classes this sense, and not this sense * each class contains: (context, sense, probability) trios }} ● * naïve Bayesian ensemble: a linear combination of for each (word in context) calculate probability of this sense given this word Monolingual Bootstrapping

5 Step 2: Classify new data context = words surrounding ambiguous word in a sentence aWord = an ambiguous word C= aWord's classified data U= aWord's unclassified data for each (aWord){ for each (context in U){ calculate the most probable word sense given this context if (probability is above a threshold) Store this context and sense } C = C + (context, sense, probability) for the top b probabilities U = U - context }

6 Bilingual Bootstrapping ● Similar to monolingual bootstrapping ● Adds these extentions: ● repeatedly constructs classifiers in both languages in parallel ● boosts performance of classifiers by exchanging info between the languages

7 Initially: some classified data

8 After classifying some new data

9 The Classifier ● Appropriate Chinese classifications transformed (translated) to English and included in the corresponding English classifier aWord = an ambiguous word E = aWord's classified data for English Ce= Classified data for Chinese words that can be translations of aWord. ie, the links 1 and 2 in the diagram. ● Classifier is similar to monolingual bootstrapping, only Classified data = E + C

10 Monolingual VS Bilingual Bootstrapping ● Bilingual can always perform better ● Why? Asymmetric relationship (many-to-many mapping) between the ambiguous words in the two languages ● Classes A and D are equivalent ● can transform instances with D and use to boost performance on classification to A ● Misclassified in D ('x's) should be in C, which is not related to A. So little negative effect ● Monolingual can only use instances in A and B. When # of misclassified inscreases, performance will stop improving

11 Experiment Experimental Settings ● resolves ambiguities on only selectes ambiguous words such as line and interest ● Bilingual uses only pre classified data in English (this is ok, it will share these with the Chinese side) ● Consider two implementatoions of monolingual classifier: 1. using naïve Bayesian ensemble (MD-B) 2. using decision lists (MD-D)

12 Experiment ● Apply all three implementations on certain words (line, interest) using a benchmark data set (contains mostly Wall Street Journal data). Parts served for training data, others for test data ● Collect words that could be translations from HIT distionary ● For each sense, used their intuition to pick an English word to describe it (seed word) ● View these seed words as a classified “sentence” ● Unclassified data: from the web (news sites). The distribution of senses was roughly balances

13 Results: ● Used a baseline method (Major): always choose most frequent sense ● BB consistently and significantly out performs all other unsupervised methods ● BB performs well even against supervised methods, and has the additional plus of being unsupersivised and therefore less expensive

14 Conclusion ● Bilingual bootstrapping is pretty good! ● It has the advantages of being unsupervised, without the usual performance loss ● Future work ● theoretical analysis (ex: generalization error) ● extention to more complicated machine translation tasks


Download ppt "Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter."

Similar presentations


Ads by Google