A Knowledge-Based Search Engine Powered by Wikipedia David Milne, Ian H. Witten, David M. Nichols (CIKM 2007)

Slides:



Advertisements
Similar presentations
ELibrary Topic Search Basics eLibrary topic search allows users to locate articles and multimedia resources –Relevant to K-12 curricula and user.
Advertisements

Introduction to Information Retrieval
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
A UTOMATICALLY A CQUIRING A S EMANTIC N ETWORK OF R ELATED C ONCEPTS Date: 2011/11/14 Source: Sean Szumlanski et. al (CIKM’10) Advisor: Jia-ling, Koh Speaker:
Scott Wen-tau Yih (Microsoft Research) Joint work with Vahed Qazvinian (University of Michigan)
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
PROBLEM BEING ATTEMPTED Privacy -Enhancing Personalized Web Search Based on:  User's Existing Private Data Browsing History s Recent Documents 
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Chapter 19: Information Retrieval
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
Information Retrieval
Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.
Chapter 5: Information Retrieval and Web Search
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
A Framework for Examning Topical Locality in Object- Oriented Software 2012 IEEE International Conference on Computer Software and Applications p
1 Chapter 19: Information Retrieval Chapter 19: Information Retrieval Relevance Ranking Using Terms Relevance Using Hyperlinks Synonyms., Homonyms,
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
Searching Tutorial By: Lola L. Introduction:  When you are using a topic, you might want to use “keyword topics.” Using this might help you find better.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Evgeniy Gabrilovich and Shaul Markovitch
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : David Milne *, Ian H. Witten 2012, AI An open-source toolkit for mining Wikipedia.
Three basic areas for consideration: 1.Searching, reading and critically evaluating your literature. 2.Managing your literature – organizing and documenting.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Best pTree organization? level-1 gives te, tf (term level)
Exploiting Wikipedia as External Knowledge for Document Clustering
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Personalized Social Image Recommendation
Multimedia Information Retrieval
Information Retrieval
Extracting Semantic Concept Relations
Searching with context
Citation-based Extraction of Core Contents from Biomedical Articles
Information Retrieval and Web Design
Presentation transcript:

A Knowledge-Based Search Engine Powered by Wikipedia David Milne, Ian H. Witten, David M. Nichols (CIKM 2007)

 Whenever we seek out new knowledge — how can one describe the unknown?  What knowledge seekers need is a bridge between what they know and what they wish to know.  Knowledge seekers can benefit from a thesaurus that covers the terminology of both documents and potential queries, and describes relations that bridge between them. Introduction

KORU

Creating A Relevant Knowledge Base  To work well, Koru relies on a large and comprehensive thesaurus.  Our own technique automatically extracts thesauri from a huge manually defined information structure.  From Wikipedia, we derive a thesaurus that is specific to each particular document collection.

Creating A Relevant Knowledge Base  The basic idea is to use Wikipedia’s articles as building blocks for the thesaurus.  Each article describes a single concept.  Concepts are often referred to by multiple terms and Wikipedia handles these using “redirects”.

Measuring semantic relatedness  Semantic relatedness concerns the strength of the relations between concepts.  The measure that we use quantifies the strength of the relation between two Wikipedia articles by weighting and comparing the links found within them.

 Links are weighted by their probability of occurrence.  Links are less significant for judging the similarity between articles if many other articles also link to the same target.  We simply sum the weights of the links that are common to both articles. Measuring semantic relatedness

Disambiguating unrestricted text  To identify the concepts relevant to a particular document collection, we work through each document in turn, identifying the significant terms and matching them to individual Wikipedia articles.  Disambiguate each term using the context surrounding it.

Identifying relations between concepts  Wikipedia contains many more links than the redirects we use to identify synonymy.  We gather all relations from article and category links, but weight them so that only the strongest are emphasized.

Weighing topics, occurrences and relations  Every occurrence of every topic is weighted within the thesaurus.  1. tf-idf : A significant topic for a document should occur many times.  2. average semantic relatedness measure : A significant topic should relate strongly to other topics in the document.

KORU in action

 2 種版本: Topic browsing :有使用 thesaurus Keyword Searching :沒有使用 thesaurus  12 名參加者、 10 個 task Evaluation