Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Presenter: Cosmin Adrian Bejan Alexander Budanitsky and.

Slides:



Advertisements
Similar presentations
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme Presented by Smitashree Choudhury.
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Improved TF-IDF Ranker
A UTOMATICALLY A CQUIRING A S EMANTIC N ETWORK OF R ELATED C ONCEPTS Date: 2011/11/14 Source: Sean Szumlanski et. al (CIKM’10) Advisor: Jia-ling, Koh Speaker:
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Incorporating Dictionary and Corpus Information into a Context Vector Measure of Semantic Relatedness Siddharth Patwardhan Advisor: Ted Pedersen 07/18/2003.
Measures of Text Similarity
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
Measures of Distributional Similarity Presenter: Cosmin Adrian Bejan Lillian Lee Department of Computer Science Cornell University.
Experiments on Using Semantic Distances Between Words in Image Caption Retrieval Presenter: Cosmin Adrian Bejan Alan F. Smeaton and Ian Quigley School.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Written by Alexander Budanitsky Graeme Hirst Retold by.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Knowledge information that is gained and retained what someone has acquired and learned organized in some way into our memory.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
Key Stage 1 SATs Parent Information Meeting. The National Curriculum All maintained schools must follow the National Curriculum by law. It consists of.
A Random Graph Walk based Approach to Computing Semantic Relatedness Using Knowledge from Wikipedia Presenter: Ziqi Zhang OAK Research Group, Department.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Query Relevance Feedback and Ontologies How to Make Queries Better.
Personalisation Seminar on Unlocking the Secrets of the Past: Text Mining for Historical Documents Sven Steudter.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.
Structured Use of External Knowledge for Event-based Open Domain Question Answering Hui Yang, Tat-Seng Chua, Shuguang Wang, Chun-Keat Koh National University.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Using Semantic Relatedness for Word Sense Disambiguation
Compact Encodings for All Local Path Information in Web Taxonomies with Application to WordNet Svetlana Strunjaš-Yoshikawa Joint with Fred Annexstein and.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
A Critique and Improvement of an Evaluation Metric for Text Segmentation A Paper by Lev Pevzner (Harvard University) Marti A. Hearst (UC, Berkeley) Presented.
ALGORITHMS.
Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Measuring Semantic Similarity between Words Using HowNet ICCSIT 2008 Liuling DAI, Yuning XIA, Bin LIU, ShiKun WU School of Computer Science, Beijing Institute.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST.
Measurement Data Archive GEC10 March 2011 Larry Lannom Corporation for National Research Initiatives
Experiences of (Lexicographers and) Computer Scientists in Validating Estonian Wordnet with Test Patterns Ahti Lohk | Kadri Vare | Heili Orav | Leo Võhandu.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Ontology Evaluation Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.
Automatic Writing Evaluation
Exploring and Navigating: Tools for GermaNet
WordNet: A Lexical Database for English
An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛
Cost Sensitive Evaluation Measures for F-term Classification
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Presentation transcript:

Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Presenter: Cosmin Adrian Bejan Alexander Budanitsky and Graeme Hirst Department of Computer Science University of Toronto

2 Overview  The purpose of the paper is to compare the performance of several measures of semantic relatedness that have been proposed for use in NLP applications.  Three kinds of approaches to the evaluation of measures of similarity or semantic distance:  The first kind is theoretical examination of a given measure for properties though desirable;  The second approach is comparison with human judgments;  The third approach is to evaluate the measures with respect to their performance within a particular NLP application.

3 Network-based measures of semantic distance  Hirst-St-Onge: two lexicalized concepts are semantically close if their WordNet synsets are connected by a path that is not too long and that “does not change direction too often”:  Leacock-Chodorow: also rely on the length len(c 1, c 2 ) of the shortest path between two synsets but they limit their attention to IS-A links and scale the path length by the overall depth D of the taxonomy:

4 Network-based measures of semantic distance  Resnik: defined the similarity between two concepts lexicalized in WordNet to be the information content of their most specific common subsumer lso(c 1, c 2 ):  Jiang-Conrath: also uses information content but in the form of conditional probability of encountering an instance of a child-synset given an instance of a parent-synset.  Lin:

5 Comparison with human ratings of similarity  Rubenstein and Goodenough: 65 pairs of words ranged from “highly synonymous” to “semantically unrelated”. 51 subjects were asked to rate them on a scale of 0.0 to 4.0.  Miller and Charles: extracted 30 pairs from the original 65 (10 from high level = 3-4, 10 from intermediate level = 1-3 and 10 from low level 0-1.

6 An application-based evaluation of measures of relatedness  Evaluate the measures with respect to their performance within a particular NLP application – detection and correction of real world spelling errors in open-class words, that is, malapropisms.  Malapropism detection was viewed as a retrieval task and evaluated in terms of precision, recall and F- measure and is divided in two stages:  For the first stage, a word is suspected of being a malapropism (and the word is a suspect) if it is judged to be unrelated to other words nearby; the word is a true suspect if it is indeed a malapropism.  At the second stage, an alarm is raised when a spelling variation of a suspect is judged to be related to a nearby word; and if an alarm word is a malapropism then the alarm is a true alarm and the malapropism has been detected.

7 Malapropism detection  Method:  500 articles from Wall Street Journal corpus  remove proper nouns and stop-list words  replace one word in every 200 with a spelling variation  For each measure use four different search scopes:  scope 1 – just the paragraph containing the target word  scope 3 and 5 – the paragraph plus one or two adjacent paragraphs on each side  scope MAX – the entire article

8 Suspicion

9 Detection