1 Scaling Up Word Sense Disambiguation via Parallel Texts Yee Seng Chan Hwee Tou Ng Department of Computer Science National University of Singapore
2 Supervised WSD Word Sense Disambiguation (WSD) –Identifying the correct meaning, or sense, of a word in context Supervised learning –Successful approach –Collect corpus where each ambiguous word is annotated with the correct sense –Current systems usually rely on SEMCOR, a relatively small manually annotated corpus, affecting scalability
3 Data Acquisition Need to tackle data acquisition bottleneck Manually annotated corpora: –DSO corpus (Ng & Lee, 1996) –Open Mind Word Expert (OMWE) (Chklovski & Mihalcea, 2002) Parallel texts: –Our prior work (Ng, Wang, & Chan, 2003) exploited English-Chinese parallel texts for WSD
4 WordNet Senses of channel Sense 1: A path over which electrical signals can pass Sense 2: A passage for water Sense 3: A long narrow furrow Sense 4: A relatively narrow body of water Sense 5: A means of communication or access Sense 6: A bodily passage or tube Sense 7: A television station and its programs
5 Chinese Translations of channel Sense 1: 频道 (pin dao) Sense 2: 水道 (shui dao), 水渠 (shui qu), 排水 渠 (pai shui qu) Sense 3: 沟 (gou) Sense 4: 海峡 (hai xia) Sense 5: 途径 (tu jing) Sense 6: 导管 (dao guan) Sense 7: 频道 (pin dao)
6 Parallel Texts for WSD … The institutions have already consulted the staff concerned through various channels, including discussion with the staff representatives. … 有关院校已透过不同 的途径征询校内有关 员工的意见,包括与 有关的职员代表磋商 … 途径 (tu jing): “sense tag”
7 Approach 1.Use manually translated English-Chinese parallel texts 2.Parallel text alignment 3.Manually provide Chinese translations for WordNet senses of a word (serve as “sense- tags”) 4.Gather training examples from the English portion of parallel texts 5.Train WSD classifiers to disambiguate English words in new contexts
8 Issues (Ng, Wang, & Chan 2003) evaluated on 22 nouns. Can this approach scale up to a large set of nouns? Previous evaluation was on lumped senses. How would it perform in a fine-grained disambiguation setting? In practice, would any difficulties arise in the gathering of training examples from parallel texts?
9 Size of Parallel Corpora Parallel CorporaEnglish (Mwords/MB) Chinese (Mchars/MB) Hong Kong Hansards39.9 / / Hong Kong News16.8 / / 67.6 Hong Kong Laws9.9 / / 37.5 Sinorama3.8 / / 13.5 Xinhua News2.1 / / 8.9 English Translation of Chinese Treebank 0.1 / / 0.4 Sub-total72.6 / / Total138 / 681.1
10 Parallel Text Alignment Sentence alignment: –Corpora available in sentence-aligned form Pre-processing: –English: tokenization –Chinese: word segmentation Word alignment: –GIZA++ (Och & Ney, 2000)
11 Selection of Translations WordNet 1.7 as sense inventory Chinese translations from 2 sources: –Oxford Advanced Learner’s English-Chinese dictionary –Kingsoft Powerword 2003 (Chinese translation of the American Heritage dictionary) –Providing Chinese translations for all the WordNet senses of a word takes 15 minutes on average. If the same Chinese translation is assigned to several senses, only the least numbered sense will have a valid translation Oxford definition entries for channel Kingsoft Powerword definition entries for channel WordNet sense entries for channel
12 Scope of Experiments Aim: scale up to a large set of nouns Frequently occurring nouns are highly ambiguous. Maximize benefits: –Select 800 most frequent noun types in the Brown corpus (BC) –Represents 60% of noun tokens in BC
13 WSD Used the WSD program of (Lee & Ng, 2002) Knowledge sources: parts-of-speech, surrounding words, local collocations Learning algorithm: Naïve Bayes Achieves state-of-the-art WSD accuracy
14 Evaluation Set Suitable evaluation data set: set of nouns in the SENSEVAL-2 English all- words task
15 Summary Figures Noun setNo. of noun types No. of noun tokens WNs1 accuracy (%) Avg. no. of senses All nouns MFSet All − MFSet
16 Evaluation on MFSet Gather parallel text examples for nouns in MFSet For comparison, what is the accuracy of training on manually annotated examples? –SEMCOR (SC) –SEMCOR + OMWE (SC+OM)
17 Evaluation Results (in %) System Evaluation set MFSet S1 (best SE2 system)72.9 S265.4 S364.4 WNs1 (WordNet sense 1)61.1 SC (SEMCOR)67.8 SC+OM (SEMCOR + OMWE)68.4 P1 (parallel text)69.6
18 Evaluation on All Nouns Want an indication of P1 performance on all nouns Expanded evaluation set to all nouns in SENSEVAL-2 English all-words task Used WNs1 strategy for nouns where parallel text examples are not available
19 Evaluation Results (in %) System Evaluation set MFSetAll nouns S1 (best SE2 system) S S WNs1 (WordNet sense 1) SC (SEMCOR) SC+OM (SEMCOR + OMWE) P1 (parallel text)
20 Lack of Matches Lack of matching English occurrences for some Chinese translations: –Sense 7 of noun report: »“the general estimation that the public has for a person” »assigned translation “ 名声 ” (ming sheng) –In parallel corpus, no occurrences of report aligned to “ 名声 ” (ming sheng) –No examples gathered for sense 7 of report –Affects recall
21 Examples from other Nouns Can gather examples for sense 7 of report from other English nouns having the same corresponding Chinese translations: 名声 (ming sheng) Sense 7 of report: “the general estimation that the public has for a person” Sense 3 of name: “a person’s reputation”
22 Evaluation Results (in %) System Evaluation set MFSetAll nouns S1 (best SE2 system) S S WNs1 (WordNet sense 1) SC (SEMCOR) SC+OM (SEMCOR + OMWE) P1 (parallel text) P2 (P1 + noun substitution)
23 JCN Measure Semantic distance measure of Jiang & Conrath (1997), provides a reliable estimate of the distance between two WordNet synsets: Dist(s1,s2) JCN –Information content (IC) of concept c: –Link strength LS(c,p) of edge: –Distance between two synsets:
24 Similarity Measure We used the WordNet Similarity package (Pedersen, Patwardhan & Michelizzi, 2004): –provide a similarity score between WordNet synsets based on jcn measure: jcn(s1,s2) = 1/Dist(s1,s2) –In earlier example, obtain similarity score jcn(s1,s2), where: »s1 = sense 7 of report »s2 = sense 3 of name
25 Incorporating JCN Measure In performing WSD with a naïve Bayes classifier, sense s assigned to example with features f 1, …, f n is chosen so as to maximize: A training example gathered from another English noun based on a common Chinese translation contributes a fractional count to Count(s) and Count(f j,s), based on jcn(s1,s2).
26 Evaluation Results (in %) System Evaluation set MFSetAll nouns S1 (best SE2 system) S S WNs1 (WordNet sense 1) SC (SEMCOR) SC+OM (SEMCOR + OMWE) P1 (parallel texts) P2 (P1 + noun substitution) P2jcn (P2 + jcn)
27 Paired t-test for MFSet SystemS1P1P2P2jcnSCSC+OMWNs1 S1 *~~~>>> P1 *~<<~~>> P2 *<>~>> P2jcn *>>> SC *~>> SC+OM *>> WNs1 * “>>”, “<<”: p-value ≤ 0.01 “>”, “<”: p-value (0.01, 0.05] “~”: p-value > 0.05
28 Paired t-test for All Nouns SystemS1P1P2P2jcnSCSC+OMWNs1 S1 *>~~~~>> P1 *~<~~>> P2 *~~~>> P2jcn *~~>> SC *~>> SC+OM *>> WNs1 * “>>”, “<<”: p-value ≤ 0.01 “>”, “<”: p-value (0.01, 0.05] “~”: p-value > 0.05
29 Conclusion Tackling the data acquisition bottleneck is crucial Gathering examples for WSD from parallel texts is scalable to a large set of nouns Training on parallel text examples can outperform training on manually annotated data, and achieves performance comparable to the best system of SENSEVAL-2 English all-words task