Creating a Similarity Graph from WordNet

Slides:



Advertisements
Similar presentations
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Advertisements

Improved TF-IDF Ranker
Google Similarity Distance Presented by: Akshay Kumar Pankaj Prateek.
Scott Wen-tau Yih (Microsoft Research) Joint work with Vahed Qazvinian (University of Michigan)
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
 How many pages does it search?  How does it access all those pages?  How does it give us an answer so quickly?  How does it give us such accurate.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
Course G Web Search Engines 3/9/2011 Wei Xu
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Query Relevance Feedback and Ontologies How to Make Queries Better.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Class web page:
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Verbals English 11. Verbals Definition: A word that is formed from a verb but functions as a different part of speech. Verbals can function as nouns,
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Component 4: Introduction to Information and Computer Science Unit 2: Internet and the World Wide Web Lecture 2 This material was developed by Oregon Health.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A semantic approach for question classification using.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Query Operations Relevance Feedback & Query Expansion.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng.
GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Date: 2013/8/27 Author: Shinya Tanaka, Adam Jatowt, Makoto P. Kato, Katsumi Tanaka Source: WSDM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Estimating.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
CIKM Recognition and Classification of Noun Phrases in Queries for Effective Retrieval Wei Zhang 1 Shuang Liu 2 Clement Yu 1
Search Engine Marketing SEM = Search Engine Marketing SEO = Search Engine Optimization optimizing (altering/changing) your page in order to get a higher.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
P2P Concept Search Fausto Giunchiglia Uladzimir Kharkevich S.R.H Noori April 21st, 2009, Madrid, Spain.
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
Algorithmic Detection of Semantic Similarity WWW 2005.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Emerging Trend Detection Shenzhi Li. Introduction What is an Emerging Trend? –An Emerging Trend is a topic area for which one can trace the growth of.
Communicative and Academic English for the EFL Professional.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
WordNet::Similarity Measuring the Relatedness of Concepts Yue Wang Department of Computer Science.
Web Page Clustering using Heuristic Search in the Web Graph IJCAI 07.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Mathematics of the Web Prof. Sara Billey University of Washington.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Finding Replicated web collections
Summarizing Entities: A Survey Report
Web IR: Recent Trends; Future of Web Search
WordNet: A Lexical Database for English
A method for WSD on Unrestricted Text
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
ProBase: common Sense Concept KB and Short Text Understanding
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Prepositions and Prepositional Phrases
Discussion Class 9 Google.
Presentation transcript:

Creating a Similarity Graph from WordNet Lubomir Stanchev

Example Similarity Graph 0.3 Dog Cat 0.3 0.2 0.8 0.2 0.8 Animal

Applications If we type automobile in our favorite Internet search engine, for example Google or Bing, then all top results will contain the word automobile. Most search engines will not return web pages that contain the word car but do not contain the word automobile as one of the top results. The similarity graph will allow us to not only perform semantic search (i.e., search based on the meaning of the words), but it will also help us rank the result. We can use the semantic graph to partition a set of documents based on the meaning of the words in them. The similarity graph can also be used as part of a query-answering system, such as the IBM Watson Computer that competed on the Jeopardy game show and the Siri system for the iPhone.

About WordNet WordNet gives us information about the words in the English language. In our study, we use WordNet 3.0, which contains approximately 150,000 different words. WordNet also contains phrases (or word forms), such as sports utility vehicle. The meaning of a word form is not precise. For example, spring can mean the season after winter, a metal elastic device, or natural flow of ground water, among others. WordNet uses the concept of a sense. For example, spring has the three senses. Every word form has one or more senses and every sense is represented by one or more word forms. A human can usually determine which of the many senses a word form represents by the context in which the word form is used.

About WordNet (cont'd) WordNet contains the definition and example use of each sense. It also contains information about the relationship between senses. The senses in WordNet are divided into four categories: nouns, verbs, adjectives, and adverbs. For example, WordNet stores information about the hyponym and meronym relationship for nouns. The hyponym relationship corresponds to the ``kind-of" relationship (for example, dog is a hyponym of canine). The meronym relationship corresponds to the part-of relationship (for example, window is a meronym of building). Similar relationships are also defined for verbs, adjectives, and adverbs.

Our System structured data natural language descriptions WordNet Similarity Graph System words frequencies University of Oxford British National Corpus words Noise Words

Initial Similarity Graph Create a node for every word form. Create a node for every sense.

Processing the Senses Frequency of use of each sense is given in WordNet.

Adding Definition Edges Position is first word, so we give it greater importance. Forward edge: computeMinMax(0,0.6,ratio). If position appears in only three word form definitions, then we compute backward edge as computeMinMax(0,0.3,1/3).

Processing Hyponyms In the British National Corpus, the frequency of armchair is 657 and the frequency of wheelchair is 551.

Validating the Algorithm Miller and Charles study: 28 pairs of words. Study performed in 1991. Asked humans to write the similarity for pairs of words and recorded the results. WordSimilarity-353 study: 353 pairs of words. Study performed in 2002. Again, asked humans to write the similarity for each of the 353 pairs. We will use these benchmarks to validate our system. Need a way to measure the similarity between two words.

Measuring Semantic Similarity Between Words

Experimental Results Miler and Charles WordSimilarity-353

Conclusion and Future Research We presented an algorithm for building a similarity graph from WordNet. We verified the data quality of the algorithm by showing that it can be used to compute the semantic similarity between word forms and we experimentally verified that the algorithm produces better quality results than existing algorithms on the Charles and Miller and WordSimilarity-353 word pairs benchmarks. We believe that we outperform existing algorithms because our algorithm processes not only structured data, but also natural language. We will present a paper on how to extend the system to use data from Wikipedia at the Eight IEEE International Conference on Semantic Computing in Newport Beach, California later this month.