Measuring Semantic Similarity between Words Using Web Search Engines WWW 07.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

eClassifier: Tool for Taxonomies
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Improved TF-IDF Ranker
DISTRIBUTIONAL WORD SIMILARITY David Kauchak CS159 Fall 2014.
Google Similarity Distance Presented by: Akshay Kumar Pankaj Prateek.
Creating a Similarity Graph from WordNet
 How many pages does it search?  How does it access all those pages?  How does it give us an answer so quickly?  How does it give us such accurate.
Measuring Semantic Similarity between Words Using Web Search Engines Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka Topic  Semantic similarity measures.
Unsupervised Information Extraction from Unstructured, Ungrammatical Data Sources on the World Wide Web Mathew Michelson and Craig A. Knoblock.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Disambiguation of References to Individuals Levon Lloyd (State University of New York) Varun Bhagwan, Daniel Gruhl (IBM Research Center) Varun Bhagwan,
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
 Person Name Disambiguation by Bootstrapping SIGIR’10 Yoshida M., Ikeda M., Ono S., Sato I., Hiroshi N. Supervisor: Koh Jia-Ling Presenter: Nonhlanhla.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Expressing Implicit Semantic Relations without Supervision ACL 2006.
From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster,
A Personalized Search Engine Based on Web Snippet Hierarchical Clustering Paolo Ferragina, Antonio Gulli Dipartimento di Informatica, Pisa
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Algorithmic Detection of Semantic Similarity WWW 2005.
Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
COMAD 2008Chakrabarti Bridging the Structured-Unstructured Gap Born in New York in 1934, Sagan was a noted astronomer whose lifelong passion was searching.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Learning Taxonomic Relations from Heterogeneous Evidence Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004)
1 Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web Tao Cheng, Kevin Chang University Of Illinois, Urbana-Champaign.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Finding Predominant Word Senses in Untagged Text Diana McCarthy & Rob Koeling & Julie Weeds & Carroll Department of Indormatics, University of Sussex {dianam,
Web Page Clustering using Heuristic Search in the Web Graph IJCAI 07.
THREE COLUMN (Click to edit) Introduction or Abstract Type in or past your text (Click to edit) Materials and Methods Type in or past your text (Click.
Automatic Writing Evaluation
Presentation by: ABHISHEK KAMAT ABHISHEK MADHUSUDHAN SUYAMEENDRA WADKI
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Using lexical chains for keyword extraction
Materials & Methods Introduction Abstract Results Conclusion
إعداد د/زينب عبد الحافظ أستاذ مساعد بقسم الاقتصاد المنزلي
WordNet: A Lexical Database for English
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
Table Cell Search for Question Answering Huan Sun
A method for WSD on Unrestricted Text
Materials & Methods Introduction Abstract Results Conclusion
Mining Anchor Text for Query Refinement
Materials & Methods Introduction Abstract Results Conclusion
Research Paper Overview.
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Using Link Information to Enhance Web Page Classification
Materials & Methods Introduction Abstract Results Conclusion
Materials & Methods Introduction Abstract Results Conclusion
Presentation transcript:

Measuring Semantic Similarity between Words Using Web Search Engines WWW 07

Abstract Semantic web-related applications –Community chain mining –Relation extraction –Automatic meta data extraction –Entity disambiguation results This paper consists of four page-count-based similarity scores and automatically extracted lexico-syntactic patterns from text snippets.

Introduction – 1/2 Page counts and snippets are two useful information sources provided by most Web search engines. Some problems –Page count analyses ignore the position of a word in a page two words appear in a page, they might not be related –Polysemous word (a word with multiple senses) apple as a fruit apple as a company

Introduction – 2/2 Lexico-syntactic patterns –various semantic relations also known as, is a, part of, is an example of

Method (Page-count-based) Page-count-based Similarity Scores (co- occurrence measures) C = 5

Method (Lexico-Syntactic Patterns) – 1/4 Extracting Lexico-Syntactic Patterns from Snippets –is a (X is a Y) –and (X and Y)

Method (Lexico-Syntactic Patterns) – 2/4 Given a set S of synonymous –n-grames : n=2,3,4, and 5

Method (Lexico-Syntactic Patterns) – 3/4 A set S of synonymous word-pairs –5000 word pairs of synonymous nouns from WordNet –4,562,471 unique patterns –80% occur less than 10 times A set of non-synonymous word-pairs –5000 word pairs of non-synonymous nouns from WordNet

Method (Lexico-Syntactic Patterns) – 4/4

Integrating Patterns and Page Counts

Experiments WebOverlap (rank=18,weight=2.45) Web-Jaccard (rank=66, weight=0.618) WebPMI (rank=138,weight=0.0001)

Benchmark Dataset –Rubenstein-Goodenough 28 word-pairs

Experiments

Semantic Similarity

Taxonomy-Based Methods

Community Mining 50 personal names from 5 communities: –tennis players, golfers,actors, politicians scientists –10 names from each community –B-CUBED

Conclusion Semantic web-related applications –Community chain mining –Relation extraction –Automatic meta data extraction –Entity disambiguation results This paper consists of four page-count-based similarity scores and automatically extracted lexico-syntactic patterns from text snippets.