WordNet–Based Collaborative Weighting for Ranking Web Pages Hyoungil Kim, Juntae Kim Dongguk University, Seoul, Korea Kyeonah Yu Duksung Women ’ s University,

Slides:



Advertisements
Similar presentations
Improved TF-IDF Ranker
Advertisements

Creating a Similarity Graph from WordNet
 Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Class web page:
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Date: 2012/10/18 Author: Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka Source: World Wide Web conference (WWW "12) Advisor: Jia-ling, Koh Speaker: Jiun.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
1 Query Operations Relevance Feedback & Query Expansion.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Document Clustering 文件分類 林頌堅 世新大學圖書資訊學系 Sung-Chien Lin Department of Library and Information Studies Shih-Hsin University.
updated CmpE 583 Fall 2008 Ontology Integration- 1 CmpE 583- Web Semantics: Theory and Practice ONTOLOGY INTEGRATION Atilla ELÇİ Computer.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
 Motivation:  Actor: [awards, height, age, weight, birthdate, birthplace, cause of death, real name]  Painter: [paintings, biography, bibliography,
Data Mining for Web Intelligence Presentation by Julia Erdman.
Link-based and Content-based Evidential Information in a Belief Network Model I. Silva, B. Ribeiro-Neto, P. Calado, E. Moura, N. Ziviani Best Student Paper.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Algorithmic Detection of Semantic Similarity WWW 2005.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
GRAPH BASED MULTI-DOCUMENT SUMMARIZATION Canan BATUR
Sampath Jayarathna Cal Poly Pomona
Information Organization: Overview
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Proposal for Term Project
WordNet WordNet, WSD.
Web Mining Department of Computer Science and Engg.
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Information Organization: Overview
Information Retrieval and Web Design
Dynamic Word Sense Disambiguation with Semantic Similarity
Presentation transcript:

WordNet–Based Collaborative Weighting for Ranking Web Pages Hyoungil Kim, Juntae Kim Dongguk University, Seoul, Korea Kyeonah Yu Duksung Women ’ s University, Seoul, Korea

Dept. of Computer Engineering, Dongguk University Agenda 1. Introduction 2. Background Next Generation Search Engines WordNet 3. The Proposed Method Sense Determination Query Expansion Sense-specific Collaborative Weighting 4. Experiments 5. Conclusion

Dept. of Computer Engineering, Dongguk University Introduction It is hard to extract information from the web by using a general search engine. –Problem is the word sense ambiguity –The number of search results is very large –The keyword-based method cannot discriminate important pages. Suggestion –Using the sense-specific collaborative weighting –Disambiguate query word using WordNet

Dept. of Computer Engineering, Dongguk University Next Generation Search Engines Issue –The problems of the keyword-based method New weighting and ranking schemes –Static reference information(hyperlink structure) To show the global authority –Dynamic reference information(user response) To show the global popularity

Dept. of Computer Engineering, Dongguk University WordNet Development –1985 in Princeton Cognition science team + linguistic psychologists Contents –Vocabulary database –It classified the English vocabulary according to the meaning of each word 95,600 different words 70,100 different meanings Words with same meaning  synset

Dept. of Computer Engineering, Dongguk University WordNet Relationships between words –Synonym / Antonym The similar / opposite meaning between words Ex) rise = ascend, rise  fall –Hyponym / Hypernym The hierarchical relationship of meanings Ex) maple => tree => plant –Meronym / Holonym The inclusive relationship Ex) leaf  tree

Dept. of Computer Engineering, Dongguk University Synset hierarchy of WordNet The synsets in the WordNet are hierarchically organized according to their hypernym relationships. Example {Cattle,Cows,Oxen}  {Bovine}  ….  {Mammal}  {Vertebrate,Craniate}  {Chordate}  {Animal}

Dept. of Computer Engineering, Dongguk University 3 senses of java Sense 1 Java -- (an island in Indonesia S of Borneo; one of the world's most densely populated regions) => island -- (a land mass (smaller than a continent) that is surrounded by water... => land, dry land, earth, ground, solid ground, terra firma -- (the solid part... => object, physical object -- (a physical (tangible and visible) entity;... => entity, something -- (anything having existence (living or nonliving)) Sense 2 coffee, java -- (a beverage consisting of an infusion of ground coffee beans;... => beverage, drink, drinkable, potable -- (any liquid suitable for drinking:... => food, nutrient -- (any substance that can be metabolized by an organism... => substance, matter -- (that which has mass and occupies space;... => object, physical object -- (a physical (tangible and visible)... => entity, something -- (anything having existence (living or nonliving)) Sense 3 Java -- (a simple platform-independent object-oriented programming language... => object-oriented programming language, object-oriented programing language... => programming language, programing language -- ((computer science) a language... => artificial language -- (a language that is deliberately created for... => language, linguistic communication -- (a systematic means of... => communication -- (something that is communicated between... => social relation -- (a relation between living organisms;... => relation -- (an abstraction belonging to or... => abstraction -- (a general concept formed by...

Dept. of Computer Engineering, Dongguk University Sense Determination To determine the sense of the query –Using the synset hierarchy of the WordNet State of the sense of the query –There is ambiguity or no ambiguity. Strategy –Provide an user interface by which the user can select one of the synset.

Dept. of Computer Engineering, Dongguk University Hypernym Synonym Annotation The search query

Dept. of Computer Engineering, Dongguk University Query Expansion Expand query by using: –synonym, hypernym, or annotation –Words from each part are extracted and added (OR) If user selected sense 2 of “Java”, –Using the synonym {Java}  {Java, coffee} –Using the hypernym {Java}  {Java, beverage, drink} –Using the annotation {Java}  {Java, beverage, infusion, coffee, bean}

Dept. of Computer Engineering, Dongguk University Sense-Specific Collaborative Weighting Weighting of Web pages –Using the 26 top-level categories of the noun hierarchy to store 26 sense-specific weights for each Web page Web page Count for {Food} Count for {Location} Count for {Comm.} Total count URL URL

Dept. of Computer Engineering, Dongguk University The Experimental System

Dept. of Computer Engineering, Dongguk University Experiments Data Set –For each query words 200 Web pages were collected from AltaVista. To obtain the collaborative weighting –200 Computer Engineering undergraduate students Evaluation –Compare the # of relevant pages in top-30 –Experimental system vs. AltaVista –Total click count weighting vs. sense-specific weighting

Dept. of Computer Engineering, Dongguk University The query words used for the experiments WordSynset Si Top_level Category Ck Java{coffee, java}{Food} {Java}{Location} {Java}{Communication} Character{character, role,…}{Action} {character, symbol,…}{Communication} Custom{custom, import,…}{Possession} {custom, tradition,…}{Cognition} Horse{horse, heroin,…}{Artifact} {horse, equus…}{Animal}

Dept. of Computer Engineering, Dongguk University Test results of 9 queries QueryNumber of important pages among top-30 WordMeaningExperimental SystemAlta Vista SW/SWT MWSW JavaIsland9540 Coffee13780 Language CharacterRole12961 Symbol CustomTrade8540 Tradition12764 HorseDrug11860 Animal Average accuracy49.6%39.3%34.4%23.7% Improvement to Alta Vista MW+15.2%+4.9%--

Dept. of Computer Engineering, Dongguk University Conclusion An interface using WordNet to resolve the ambiguity of the search query is presented Propose sense-specific collaborative evaluation in ranking Web pages Performance improvement of Web search engine

Dept. of Computer Engineering, Dongguk University References [1] D. Beeferman and A. Berger, Agglomerative clustering of a search engine query log, Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, [2] S. Brin, L. Page. The anatomy of a large-scale hypertextual Web search engine, Proceedings of the 7 th International World Wide Web Conference, [3] C. Fellbaum, WordNet: An Electronic Lexical Database (MIT Press, 1998). [4] W. Frakes, and R.Baeza-Yates, Information Retrieval: Data Structures & Algorithm (Prentice-Hall, 1992). [5] J. M. Kleinberg, Authoritative sources in a hyperlinked environment, The Journal of the ACM, Vol. 46(5), [6] B. Krishna, R. Monika, Improved algorithms for topic distillation in a hyperlinked environment, Proceedings of the 21st ACM SIGIR conference, [7] D. Lewis and K. Jones, Natural language processing for information retrieval, Communications of ACM, Vol. 39, [8] X. Li, S. Szpakowicz and S. Matwin, A WordNet-based algorithm for word sense disambiguation, Proceedings of the International Joint Conference on Artificial Intelligence, [9] G. Miller, WordNet: An on-line lexical database, International Journal of Lexicography, [10] G. Salton, Automatic Text Processing (Addison Wesley, 1989)