1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.

Slides:



Advertisements
Similar presentations
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
A new Machine Learning algorithm for Neoposy: coining new Parts of Speech Eric Atwell Computer Vision and Language group School of Computing University.
A Robust Approach to Aligning Heterogeneous Lexical Resources Mohammad Taher Pilehvar Roberto Navigli MultiJEDI ERC
How dominant is the commonest sense of a word? Adam Kilgarriff Lexicography MasterClass Univ of Brighton.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Word Sense Disambiguation UIUC - 06/10/2004 Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez TALP, LSI,
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms EMNLP /5/27 Mamoru Komachi †, Taku Kudo ‡, Masashi Shimbo † and.
1 Scaling Up Word Sense Disambiguation via Parallel Texts Yee Seng Chan Hwee Tou Ng Department of Computer Science National University of Singapore.
Unsupervised Word Sense Disambiguation REU, Summer, 2009.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Detecting a Continuum of Compositionality in Phrasal Verbs Diana McCarthy & Bill Keller & John Carroll University of Sussex This research was supported.
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Using Semantic Relatedness for Word Sense Disambiguation
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Semantic Evaluation of Machine Translation Billy Wong, City University of Hong Kong 21 st May 2010.
Finding Predominant Word Senses in Untagged Text Diana McCarthy & Rob Koeling & Julie Weeds & Carroll Department of Indormatics, University of Sussex {dianam,
Graph-based WSD の続き DMLA /7/10 小町守.
SENSEVAL: Evaluating WSD Systems
Statistical NLP: Lecture 9
WordNet WordNet, WSD.
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
A method for WSD on Unrestricted Text
Unsupervised Word Sense Disambiguation Using Lesk algorithm
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling University of Sussex IJCNLP2008 Jan 10, 2008

2 Word Sense Disambiguation Predominant sense acquisition Exploited as a powerful back-off strategy for word sense disambiguation McCarthy et al (2004): Achieved 64% precision on Senseval2 all- words task Strongly relies on linguistic resources such as WordNet for calculating the semantic similarity  Difficulty: porting it to other languages

IJCNLP2008 Jan 10, Focus How to calculate the semantic similarity score without semantic relations such as hyponym Explore the potential use of the word definitions (glosses) instead of WordNet- style resources for porting McCarthy et al.’s method to other languages

IJCNLP2008 Jan 10, Table of contents 1. Task 2. Related work: McCarthy et al (2004) 3. Gloss-based semantic similarity metrics 4. Experiments WSD on the two datasets: EDR and Japanese Senseval2 task 5. Conclusion and future directions

IJCNLP2008 Jan 10, Word Sense Disambiguation (WSD) task select the correct sense of the word appearing in the context I ate fried chicken last Sunday. sense idgloss 1a common farm bird that is kept for its meat and eggs 2the meat from this bird eaten as food 3informal someone who is not at all brave 4a game in which children must do something dangerous to show that they are brave Supervised approaches have been mainly applied to learn the context

IJCNLP2008 Jan 10, Word Sense Disambiguation (WSD) task (Cont’d) Estimate the most predominant sense of a word regardless of its context English coarse-grained all words task (2007) Choosing most frequent senses: 78.9% Best performing system: 82.5% Systems using a first sense heuristic have relied on sense-tagged data However, sense-tagged data is expensive

IJCNLP2008 Jan 10, McCarthy et al. (2004)’s unsupervised approach Extract top N neighbour words of the target word according to the distributional similarity score (sim ds ) Calculate the prevalent score of each sense Calculate sim ds weighted by the semantic similarity score (sim ss ) Sum up all the weighted sim ds of top N neighbours Semantic similarity: estimated from linguistic resources (e.g. WordNet) Output the sense which has the maximum prevalent score

IJCNLP2008 Jan 10, McCarthy et al. (2004)’s approach: An example neighboursim ds turkey meat tomato sense2: the meat from this bird eaten as food. sense3: informal someone who is not at all brave. chicken prevalence(sense2) = = distributional similarity score sim ss (word, sense2) weighted sim ds = semantic similarity score (from WordNet)

IJCNLP2008 Jan 10, McCarthy et al. (2004)’s approach: An example neighboursim ds turkey meat tomato sim ss (word, sense3) weighted sim ds = sense2: the meat from this bird eaten as food. sense3: informal someone who is not at all brave. chicken prevalence(sense3) = = prevalence(sense2) = prevalence(sense2) > prevalence(sense3)  predominant sense: sense2

IJCNLP2008 Jan 10, Problem While the McCarthy et al.’s method works well for English, other inventories do no always have WordNet-style resources to tie the nearest neighbors to the sense inventory While traditional dictionaries do not organise senses into synsets, they do typically have sense definitions (glosses) associated with the senses

IJCNLP2008 Jan 10, Gloss-based similarity Calculate similarity between two glosses in a dictionary as semantic similarity sim lesk : simply calculate the overlap of the content words in the glosses of the two word senses sim DSlesk : use distributional similarity as an approximation of semantic distance between the words in the two glosses

IJCNLP2008 Jan 10, lesk: Example sim lesk (chicken, turkey) = 2 “meat” and “food” are overlapped in two glosses wordgloss chickenthe meat from this bird eaten as food turkeythe meat from a turkey eaten as food

IJCNLP2008 Jan 10, lesk: Example sim lesk (chicken, tomato) = 0 No overlap in two glosses wordgloss chickenthe meat from this bird eaten as food tomatoa round soft red fruit eaten raw or cooked as a vegetable

IJCNLP2008 Jan 10, sim DSlesk (chicken, tomato) = 1/3 ( ) = DSlesk Calculate distributional similarity scores of any pairs of nouns in two glosses Output the average of the maximum distributional similarity of all the nouns in target word sim ds (meat, fruit) = , sim ds (meat, vegetable) = , sim ds (bird, fruit) = , sim ds (bird, vegetable) = , sim ds (food, fruit) = , sim ds (food, vegetable) =

IJCNLP2008 Jan 10, DSlesk : noun appearing in : gloss of word sense

IJCNLP2008 Jan 10, Apply Gloss-based similarity to McCarthy et al.’s approach neighboursim ds turkey meat tomato sim DSlesk (word, sense2) weighted sim ds = sense2: the meat from this bird eaten as food. sense3: informal someone who is not at all brave. chicken prevalence(sense2) = =

IJCNLP2008 Jan 10, Table of contents 1. Task 2. Related work: McCarthy et al (2004) 3. Gloss-based semantic similarity metrics 4. Experiments WSD on the two datasets: EDR and Japanese Senseval2 task 5. Conclusion and future directions

IJCNLP2008 Jan 10, Experiment 1: EDR Dataset: EDR corpus 3,836 polysemous nouns (183,502 instances) Adopt the similarity score proposed by Lin (1998) as the distributional similarity score 9-years Mainichi newspaper articles and 10- years Nikkei newspaper articles Japanese dependency parser CaboCha (Kudo and Matsumoto, 2002) Use 50 nearest neighbors in line with McCarthy et al. (2004)

IJCNLP2008 Jan 10, Methods Baseline Select one word sense at random for each word token and average the precision over 100 trials Unsupervised: McCarthy et al. (2004) Semantic similarity: Jiang and Conrath (1997) (jcn), lesk, DSlesk Supervised (Majority) Use hand-labeled training data for obtaining the predominant sense of the test words

IJCNLP2008 Jan 10, Results: EDR DSlesk is comparable to jcn without the requirement for semantic relations such as hyponymy recallprecision baseline0.402 jcn lesk DSlesk upper-bound0.745 supervised0.731

IJCNLP2008 Jan 10, Results: EDR (Cont’d) All methods for finding a predominant sense outperform the supervised one for item with little data (≤ 5), indicating that these methods robustly work even for low frequency data where hand-tagged data is unreliable allfreq ≤ 10freq ≤ 5 baseline jcn lesk DSlesk upper-bound supervised

IJCNLP2008 Jan 10, Experiment 2 and Results: Senseval2 in Japanese 50 nouns (5,000 instances) 4 methods lesk, DSlesk, baseline, supervised fine-grainedcoarse-grained baseline lesk DSlesk upper-bound supervised precision = recall sense-id: fine-grained coarse-grained

IJCNLP2008 Jan 10, Conclusion We examined different measures of semantic similarity for finding a first sense heuristic for WSD automatically in Japanese We defined a new gloss-based similarity (DSlesk) and evaluated the performance on two Japanese WSD datasets (EDR and Senseval2), outperforming lesk and achieving a performance comparable to the jcn method which relies on hyponym links which are not always available

IJCNLP2008 Jan 10, Future directions Explore other information in the glosses, such as words of other POS and predicate-argument relations Group fine-grained word senses into clusters, making the task suitable for NLP applications (Ide and Wilks, 2006) Use the results of predominant sense acquisition as a prior knowledge of other approaches Graph-based approaches (Mihalcea 2005, Nastase 2008)