Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010.

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Communicating Information: Web Design. It’s a big net HTTP FTP TCP/IP SMTP protocols The Internet The Internet is a network of networks… It connects millions.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
H YPERLINKING DIGITAL LIBRARIES ON THE WEB Juan Camilo Zapata ITEC – 810 Supervisor Robert Dale 1.
Automatic Web Page Categorization by Link and Context Analysis Giuseppe Attardi Antonio Gulli Fabrizio Sebastiani.
INEX 2009 XML Mining Track James Reed Jonathan McElroy Brian Clevenger.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Integrating Multiple Resources for Diversified Query Expansion Arbi Bouchoucha, Xiaohua Liu, and Jian-Yun Nie Dept. of Computer Science and Operations.
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
SEARCH ENGINES By, CH.KRISHNA MANOJ(Y5CS021), 3/4 B.TECH, VRSEC. 8/7/20151.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
TERM IMPACT- BASED WEB PAGE RAKING School of Electrical Engineering and Computer Science Falah Al-akashi and Diana Inkpen
1 Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Tag Data and Personalized Information Retrieval 1.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Amy Dai Machine learning techniques for detecting topics in research papers.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin & Lawrence Page Presented by: Siddharth Sriram & Joseph Xavier Department of Electrical.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Kevin Mauricio Apaza Huaranca San Pablo Catholic University.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Search & Searchability. Presentation from David Hawking – CSIRO Ineffectual corporate search tools can be the biggest drag on employee productivity. Knowledge.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Search Engine and SEO Presented by Yanni Li. Various Components of Search Engine.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.
Information Retrieval Part 2 Sissi 11/17/2008. Information Retrieval cont..  Web-Based Document Search  Page Rank  Anchor Text  Document Matching.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.
Title Authors Introduction Text, text, text, text, text, text Background Information Text, text, text, text, text, text Observations Text, text, text,
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Date of Presentation Name of Presenter Insert image _________ Toolkit.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰&許名宏.
Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
SEARCH ENGINE by: by: B.Anudeep B.Anudeep Y5CS016 Y5CS016.
Evaluation Anisio Lacerda.
Boolean Retrieval Term Vocabulary and Posting Lists Web Search Basics
Thanks to Bill Arms, Marti Hearst
Searching EIT, Author Gay Robertson, 2017.
Data Mining Chapter 6 Search Engines
Multimedia Information Retrieval
Junghoo “John” Cho UCLA
Query Type Classification for Web Document Retrieval
Title Introduction: Discussion & Conclusion: Methods & Results:
Information Retrieval and Web Design
Information Retrieval and Web Design
Information Retrieval and Web Design
Title Goes Here Title Goes Here Title Goes Here Title Goes Here
Presentation transcript:

Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010

Authors Rianne Kaptein  Pavel Serdyukov  Jaap kamps 

Linking Wikipedia to the Web Introduction External Link Detection Conclusion Antonio Flores Bernal Linking Wikipedia to the Web

Introduction Wikipedia is a natural starting point for information on almost any topic Where to go if we want more information? Only 45% of all Wikipedia pages have an “External Links” section. Antonio Flores Bernal Linking Wikipedia to the Web

Can we automatically find external links for Wikipedia pages? The task INEX Link-the-Wiki consists of finding links between the Wikipedia pages. The task is find links from Wikipedia pages to external web pages Antonio Flores Bernal Linking Wikipedia to the Web

Clueweb category B 2009 TREC entity ranking task Antonio Flores Bernal Linking Wikipedia to the Web

External Link Detection Task and Test collection: Given a topic, a Wikipedia page return the external Web pages It's created a topic set The URLs of the external links are matched with the URLs in the Clueweb collection Antonio Flores Bernal Linking Wikipedia to the Web

External Link Detection External link on entity pages are split in two parts: A home page Informational pages Antonio Flores Bernal Linking Wikipedia to the Web

Link Detection Approaches There are three approaches The baseline approach is a language model with a full-text index. An anchor text index, which has proved to work well for home page finding The third approach exploits information of Delicious Antonio Flores Bernal Linking Wikipedia to the Web

It was send a search request to Delicious and match the first 250 results with the urls in the Clueweb collection to create a ranking. Indri toolkit, Krovetz stemmer and Dirichlet document smoothing Antonio Flores Bernal Linking Wikipedia to the Web

Mean Reciprocal Rank (MRR) Success at 5 Antonio Flores Bernal Linking Wikipedia to the Web

Link Detection Results The anchor text index leads to a much better results than the full-text index. Modern home pages contain less relevant text Antonio Flores Bernal Linking Wikipedia to the Web

Three causes for not finding a relevant page: The external link in Wikipedia isn't a home page The home page is redirected The Wikipedia title contains ambiguous words Antonio Flores Bernal Linking Wikipedia to the Web

Using Delicious It does not return results for all topics Long queries don't return any results Duplicates pages Antonio Flores Bernal Linking Wikipedia to the Web

Conclusion The anchor text index is a very effective method to retrieve home pages. Using Delicious on its own does not lead to very good results, but it does contain valuable information. This kind of system is effective at predicting the external links for Wikipedia pages Antonio Flores Bernal Linking Wikipedia to the Web

Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010