1 Scaling Up Word Sense Disambiguation via Parallel Texts Yee Seng Chan Hwee Tou Ng Department of Computer Science National University of Singapore.

Slides:



Advertisements
Similar presentations
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Advertisements

Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.
Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, Fu- Dong Chiou Computer and Information Science University.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
Word Sense Disambiguation UIUC - 06/10/2004 Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez TALP, LSI,
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Word Sense Disambiguation Reading: Chap 16-17, Jurafsky & Martin Instructor: Rada Mihalcea.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
A resource and tool for Super Sense Tagging of Italian Texts LREC 2010, Malta – 19-21/05/2010 Giuseppe Attardi* Alessandro Lenci* + Stefano Dei Rossi*
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON.
Learning Multilingual Subjective Language via Cross-Lingual Projections Mihalcea, Banea, and Wiebe ACL 2007 NLG Lab Seminar 4/11/2008.
Part 5. Minimally Supervised Methods for Word Sense Disambiguation.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
Intelligent Database Systems Lab Presenter : Kung, Chien-Hao Authors : Yoong Keok Lee and Hwee Tou Ng 2002,EMNLP An Empirical Evaluation of Knowledge Sources.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Processing of large document collections Part 5 (Text summarization) Helena Ahonen-Myka Spring 2005.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
Using Semantic Relatedness for Word Sense Disambiguation
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia.
Learning Phonetic Similarity for Matching Named Entity Translations and Mining New Translations Wai Lam Ruizhang Huang Pik-Shan Cheung Department of Systems.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Finding Predominant Word Senses in Untagged Text Diana McCarthy & Rob Koeling & Julie Weeds & Carroll Department of Indormatics, University of Sussex {dianam,
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
SENSEVAL: Evaluating WSD Systems
Using UMLS CUIs for WSD in the Biomedical Domain
Statistical NLP: Lecture 9
WordNet WordNet, WSD.
A method for WSD on Unrestricted Text
Unsupervised Word Sense Disambiguation Using Lesk algorithm
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

1 Scaling Up Word Sense Disambiguation via Parallel Texts Yee Seng Chan Hwee Tou Ng Department of Computer Science National University of Singapore

2 Supervised WSD Word Sense Disambiguation (WSD) –Identifying the correct meaning, or sense, of a word in context Supervised learning –Successful approach –Collect corpus where each ambiguous word is annotated with the correct sense –Current systems usually rely on SEMCOR, a relatively small manually annotated corpus, affecting scalability

3 Data Acquisition Need to tackle data acquisition bottleneck Manually annotated corpora: –DSO corpus (Ng & Lee, 1996) –Open Mind Word Expert (OMWE) (Chklovski & Mihalcea, 2002) Parallel texts: –Our prior work (Ng, Wang, & Chan, 2003) exploited English-Chinese parallel texts for WSD

4 WordNet Senses of channel Sense 1: A path over which electrical signals can pass Sense 2: A passage for water Sense 3: A long narrow furrow Sense 4: A relatively narrow body of water Sense 5: A means of communication or access Sense 6: A bodily passage or tube Sense 7: A television station and its programs

5 Chinese Translations of channel Sense 1: 频道 (pin dao) Sense 2: 水道 (shui dao), 水渠 (shui qu), 排水 渠 (pai shui qu) Sense 3: 沟 (gou) Sense 4: 海峡 (hai xia) Sense 5: 途径 (tu jing) Sense 6: 导管 (dao guan) Sense 7: 频道 (pin dao)

6 Parallel Texts for WSD … The institutions have already consulted the staff concerned through various channels, including discussion with the staff representatives. … 有关院校已透过不同 的途径征询校内有关 员工的意见,包括与 有关的职员代表磋商 … 途径 (tu jing): “sense tag”

7 Approach 1.Use manually translated English-Chinese parallel texts 2.Parallel text alignment 3.Manually provide Chinese translations for WordNet senses of a word (serve as “sense- tags”) 4.Gather training examples from the English portion of parallel texts 5.Train WSD classifiers to disambiguate English words in new contexts

8 Issues (Ng, Wang, & Chan 2003) evaluated on 22 nouns. Can this approach scale up to a large set of nouns? Previous evaluation was on lumped senses. How would it perform in a fine-grained disambiguation setting? In practice, would any difficulties arise in the gathering of training examples from parallel texts?

9 Size of Parallel Corpora Parallel CorporaEnglish (Mwords/MB) Chinese (Mchars/MB) Hong Kong Hansards39.9 / / Hong Kong News16.8 / / 67.6 Hong Kong Laws9.9 / / 37.5 Sinorama3.8 / / 13.5 Xinhua News2.1 / / 8.9 English Translation of Chinese Treebank 0.1 / / 0.4 Sub-total72.6 / / Total138 / 681.1

10 Parallel Text Alignment Sentence alignment: –Corpora available in sentence-aligned form Pre-processing: –English: tokenization –Chinese: word segmentation Word alignment: –GIZA++ (Och & Ney, 2000)

11 Selection of Translations WordNet 1.7 as sense inventory Chinese translations from 2 sources: –Oxford Advanced Learner’s English-Chinese dictionary –Kingsoft Powerword 2003 (Chinese translation of the American Heritage dictionary) –Providing Chinese translations for all the WordNet senses of a word takes 15 minutes on average. If the same Chinese translation is assigned to several senses, only the least numbered sense will have a valid translation Oxford definition entries for channel Kingsoft Powerword definition entries for channel WordNet sense entries for channel

12 Scope of Experiments Aim: scale up to a large set of nouns Frequently occurring nouns are highly ambiguous. Maximize benefits: –Select 800 most frequent noun types in the Brown corpus (BC) –Represents 60% of noun tokens in BC

13 WSD Used the WSD program of (Lee & Ng, 2002) Knowledge sources: parts-of-speech, surrounding words, local collocations Learning algorithm: Naïve Bayes Achieves state-of-the-art WSD accuracy

14 Evaluation Set Suitable evaluation data set: set of nouns in the SENSEVAL-2 English all- words task

15 Summary Figures Noun setNo. of noun types No. of noun tokens WNs1 accuracy (%) Avg. no. of senses All nouns MFSet All − MFSet

16 Evaluation on MFSet Gather parallel text examples for nouns in MFSet For comparison, what is the accuracy of training on manually annotated examples? –SEMCOR (SC) –SEMCOR + OMWE (SC+OM)

17 Evaluation Results (in %) System Evaluation set MFSet S1 (best SE2 system)72.9 S265.4 S364.4 WNs1 (WordNet sense 1)61.1 SC (SEMCOR)67.8 SC+OM (SEMCOR + OMWE)68.4 P1 (parallel text)69.6

18 Evaluation on All Nouns Want an indication of P1 performance on all nouns Expanded evaluation set to all nouns in SENSEVAL-2 English all-words task Used WNs1 strategy for nouns where parallel text examples are not available

19 Evaluation Results (in %) System Evaluation set MFSetAll nouns S1 (best SE2 system) S S WNs1 (WordNet sense 1) SC (SEMCOR) SC+OM (SEMCOR + OMWE) P1 (parallel text)

20 Lack of Matches Lack of matching English occurrences for some Chinese translations: –Sense 7 of noun report: »“the general estimation that the public has for a person” »assigned translation “ 名声 ” (ming sheng) –In parallel corpus, no occurrences of report aligned to “ 名声 ” (ming sheng) –No examples gathered for sense 7 of report –Affects recall

21 Examples from other Nouns Can gather examples for sense 7 of report from other English nouns having the same corresponding Chinese translations: 名声 (ming sheng) Sense 7 of report: “the general estimation that the public has for a person” Sense 3 of name: “a person’s reputation”

22 Evaluation Results (in %) System Evaluation set MFSetAll nouns S1 (best SE2 system) S S WNs1 (WordNet sense 1) SC (SEMCOR) SC+OM (SEMCOR + OMWE) P1 (parallel text) P2 (P1 + noun substitution)

23 JCN Measure Semantic distance measure of Jiang & Conrath (1997), provides a reliable estimate of the distance between two WordNet synsets: Dist(s1,s2) JCN –Information content (IC) of concept c: –Link strength LS(c,p) of edge: –Distance between two synsets:

24 Similarity Measure We used the WordNet Similarity package (Pedersen, Patwardhan & Michelizzi, 2004): –provide a similarity score between WordNet synsets based on jcn measure: jcn(s1,s2) = 1/Dist(s1,s2) –In earlier example, obtain similarity score jcn(s1,s2), where: »s1 = sense 7 of report »s2 = sense 3 of name

25 Incorporating JCN Measure In performing WSD with a naïve Bayes classifier, sense s assigned to example with features f 1, …, f n is chosen so as to maximize: A training example gathered from another English noun based on a common Chinese translation contributes a fractional count to Count(s) and Count(f j,s), based on jcn(s1,s2).

26 Evaluation Results (in %) System Evaluation set MFSetAll nouns S1 (best SE2 system) S S WNs1 (WordNet sense 1) SC (SEMCOR) SC+OM (SEMCOR + OMWE) P1 (parallel texts) P2 (P1 + noun substitution) P2jcn (P2 + jcn)

27 Paired t-test for MFSet SystemS1P1P2P2jcnSCSC+OMWNs1 S1 *~~~>>> P1 *~<<~~>> P2 *<>~>> P2jcn *>>> SC *~>> SC+OM *>> WNs1 * “>>”, “<<”: p-value ≤ 0.01 “>”, “<”: p-value (0.01, 0.05] “~”: p-value > 0.05

28 Paired t-test for All Nouns SystemS1P1P2P2jcnSCSC+OMWNs1 S1 *>~~~~>> P1 *~<~~>> P2 *~~~>> P2jcn *~~>> SC *~>> SC+OM *>> WNs1 * “>>”, “<<”: p-value ≤ 0.01 “>”, “<”: p-value (0.01, 0.05] “~”: p-value > 0.05

29 Conclusion Tackling the data acquisition bottleneck is crucial Gathering examples for WSD from parallel texts is scalable to a large set of nouns Training on parallel text examples can outperform training on manually annotated data, and achieves performance comparable to the best system of SENSEVAL-2 English all-words task