10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.

Slides:



Advertisements
Similar presentations
Improved TF-IDF Ranker
Advertisements

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
Creating a Similarity Graph from WordNet
TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.
Considering Semantic similarity using ontologies in the realm of web search Name: siming sun UNI: SS3489.
Scott Wen-tau Yih (Microsoft Research) Joint work with Vahed Qazvinian (University of Michigan)
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
A Framework for Ontology-Based Knowledge Management System
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Weighted Link Analysis for Logo and Trademark Image Retrieval on the Web Epimenidis Voutsakis * Euripides G.M. Petrakis * Evangelos Milios ** * Technical.
Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights Kwangcheol Shin, Sang-Yong Han School of CSE, Chung-Ang Univ. Seoul, Korea NLDB.
Experiments on Using Semantic Distances Between Words in Image Caption Retrieval Presenter: Cosmin Adrian Bejan Alan F. Smeaton and Ian Quigley School.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Efficient Concept-Based Mining Model for Enhancing.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Automated Essay Grading Resources: Introduction to Information Retrieval, Manning, Raghavan, Schutze (Chapter 06 and 18) Automated Essay Scoring with e-rater.
Query Relevance Feedback and Ontologies How to Make Queries Better.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Personalisation Seminar on Unlocking the Secrets of the Past: Text Mining for Historical Documents Sven Steudter.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM ASSOCIATION FOR COMPUTING MACHINERY.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Complex Linguistic Features for Text Classification: A Comprehensive Study Alessandro Moschitti and Roberto Basili University of Texas at Dallas, University.
1 Query Operations Relevance Feedback & Query Expansion.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Julia Stoyanovich, William Mee, Kenneth A. Ross New England DB Summit 2010 Semantic Ranking and Result Visualization for Life Sciences Publications.
Chapter 6: Information Retrieval and Web Search
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Using Semantic Relatedness for Word Sense Disambiguation
Hierarchical Clustering for POS Tagging of the Indonesian Language Derry Tanti Wijaya and Stéphane Bressan.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
WordNet: A Lexical Database for English
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
A method for WSD on Unrestricted Text
MedSearch is a retrieval system for the medical literature
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Presentation transcript:

10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou Euripides G.M. Petrakis Evangelos Milios

10/22/2015ACM WIDM'20052 Semantic Similarity  Semantic Similarity relates to computing the conceptual similarity between terms which are not lexicographically similar “car” “automobile”  Map two terms to an ontology and compute their relationship in that ontology

10/22/2015ACM WIDM'20053 Objectives  We investigate several Semantic Similarity Methods and we evaluate their performance  We propose the Semantic Similarity Retrieval Model (SSRM) for computing similarity between documents containing semantically similar but not necessarily lexicographically similar terms

10/22/2015ACM WIDM'20054 Ontologies  Tools of information representation on a subject  Hierarchical categorization of terms from general to most specific terms object  artifact  construction  stadium  Domain Ontologies representing knowledge of a domain e.g., MeSH medical ontology  General Ontologies representing common sense knowledge about the world e.g., WordNet

10/22/2015ACM WIDM'20055 WordNet  A vocabulary and a thesaurus offering a hierarchical categorization of natural language terms  More than 100,000 terms  An ontology of natural language terms  Nouns, verbs, adjectives and adverbs are grouped into synonym sets (synsets)  Synsets represent terms or concepts stadium, bowl, arena, sports stadium – (a large structure for open-air sports or entertainments)

10/22/2015ACM WIDM'20056 WordNet Hierarchies  The synsets are also organized into senses  Senses: Different meanings of the same term  The synsets are related to other synsets higher or lower in the hierarchy by different types of relationships e.g. Hyponym/Hypernym (Is-A relationships) Meronym/Holonym (Part-Of relationships)  Nine noun and several verb Is-A hierarchies

10/22/2015ACM WIDM'20057 A Fragment of the WordNet Is-A Hierarchy

10/22/2015ACM WIDM'20058 Semantic Similarity Methods  Map terms to an ontology and compute their relationship in that ontology  Four main categories of methods: Edge counting: path length between terms Information content: as a function of their probability of occurrence in corpus Feature based: similarity between their properties (e.g., definitions) or based on their relationships to other similar terms Hybrid: combine the above ideas

10/22/2015ACM WIDM'20059 Example  Edge counting distance between “conveyance” and “ceramic” is 2  An information content method, would associate the two terms with their common subsumer and with their probabilities of occurrence in a corpus

10/22/2015ACM WIDM' Semantic Similarity on WordNet  The most popular methods are evaluated  All methods applied on a set of 38 term pairs  Their similarity values are correlated with scores obtained by humans  The higher the correlation of a method the better the method is

10/22/2015ACM WIDM' Evaluation MethodTypeCorrelation Rada 1989Edge Counting0.59 Wu 1994Edge Counting0.74 Li 2003Edge Counting0.82 Leackok 1998Edge Counting0.82 Richardson 1994Edge Counting0.63 Resnik 1999Info. Content0.79 Lin 1993Info. Content0.82 Lord 2003Info. Content0.79 Jiang 1998Info. Content0.83 Tversky 1977Feature Based0.73 Rodriguez 2003Hybrid0.71

10/22/2015ACM WIDM' Observations  Edge counting/Info. Content methods work by exploiting structure information  Good methods take the position of the terms into account  Higher similarity for terms which are close together but lower in the hierarchy e.g., [Li et.al. 2003]  Information Content is measured on WordNet rather than on corpus [Seco2002]  Similarity only for nouns and verbs  No taxonomic structure for other p.o.s

10/22/2015ACM WIDM'

10/22/2015ACM WIDM' Semantic Similarity Retrieval Model (SSRM)  Classic retrieval models retrieve documents with the same query terms  SSRM will retrieve documents which also contain semantically similar terms  Queries and documents are initially assigned tf x idf weights  q=(q 1,q 2,…q N ), d=(d 1,d 2,…d N )

10/22/2015ACM WIDM' SSRM I.Query term re- weighting similar terms reinforce each other I.Query term expansion with synonyms and similar terms II.Document similarity

10/22/2015ACM WIDM' Query Term Expansion

10/22/2015ACM WIDM' Observations  Specification of T ?  Large T may lead to topic drift  Word sense disambiguation for expanding with the correct sense  Expansion with co-concurring terms? SVD, local/global analysis  Semantic similarity between terms of different parts of speech?  Work with compound terms (phrases)

10/22/2015ACM WIDM' Evaluation of SSRM  SSRM is evaluated through intellisearch a system for information retrieval on the WWW intellisearch  1,5 Million Web pages with images  Images are described by surrounding text  The problem of image retrieval is transformed into a problem of text retrieval

10/22/2015ACM WIDM'

10/22/2015ACM WIDM' Methods  Vector Space Model (VSM)  SSRM  Each method is represented by a precision/recall plot  Each point is the average precision/recall over 20 queries  20 queries from the list of the most frequent Google image queries

10/22/2015ACM WIDM' Experimental Results

10/22/2015ACM WIDM' MeSH and MedLine  MeSH: ontology for medical and biological terms by the N.L.M. 22,000 terms  MedLine: the premier bibliographic medical database of N.L.M. 13 Million references

10/22/2015ACM WIDM' Evaluation on MedLine

10/22/2015ACM WIDM' Conclusions  Semantic similarity methods approximated the human notion of similarity reaching correlation up to 83%  SSRM exploits this information for improving the performance of retrieval  SSRM can work with any semantic similarity method and any ontology

10/22/2015ACM WIDM' Future Work  Experimentation with more data sets (TREC) and ontologies  Extend SSRM to work with Compound terms More parts of speech (e.g., adverbs) Co-occurring terms More terms relationships in WordNet More elaborate methods for specification of thresholds

10/22/2015ACM WIDM' Try our system on the Web  Semantic Similarity System:  SSRM: