Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010.

Similar presentations


Presentation on theme: "Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010."— Presentation transcript:

1 Linking Wikipedia to the Web Antonio Flores Bernal antoniofb3@gmail.com Department of Computer Sciencies San Pablo Catholic University 2010

2 Authors Rianne Kaptein  kaptein@uva.nl Pavel Serdyukov  p.serdyukov@tudelf.nl Jaap kamps  kamps@uva.nl

3 Linking Wikipedia to the Web Introduction External Link Detection Conclusion Antonio Flores Bernal Linking Wikipedia to the Web

4 Introduction Wikipedia is a natural starting point for information on almost any topic Where to go if we want more information? Only 45% of all Wikipedia pages have an “External Links” section. Antonio Flores Bernal Linking Wikipedia to the Web

5 Can we automatically find external links for Wikipedia pages? The task INEX Link-the-Wiki consists of finding links between the Wikipedia pages. The task is find links from Wikipedia pages to external web pages Antonio Flores Bernal Linking Wikipedia to the Web

6 Clueweb category B 2009 TREC entity ranking task Antonio Flores Bernal Linking Wikipedia to the Web

7 External Link Detection Task and Test collection: Given a topic, a Wikipedia page return the external Web pages It's created a topic set The URLs of the external links are matched with the URLs in the Clueweb collection Antonio Flores Bernal Linking Wikipedia to the Web

8 External Link Detection External link on entity pages are split in two parts: A home page Informational pages Antonio Flores Bernal Linking Wikipedia to the Web

9 Link Detection Approaches There are three approaches The baseline approach is a language model with a full-text index. An anchor text index, which has proved to work well for home page finding The third approach exploits information of Delicious Antonio Flores Bernal Linking Wikipedia to the Web

10 It was send a search request to Delicious and match the first 250 results with the urls in the Clueweb collection to create a ranking. Indri toolkit, Krovetz stemmer and Dirichlet document smoothing Antonio Flores Bernal Linking Wikipedia to the Web

11 Mean Reciprocal Rank (MRR) Success at 5 Antonio Flores Bernal Linking Wikipedia to the Web

12 Link Detection Results The anchor text index leads to a much better results than the full-text index. Modern home pages contain less relevant text Antonio Flores Bernal Linking Wikipedia to the Web

13 Three causes for not finding a relevant page: The external link in Wikipedia isn't a home page The home page is redirected The Wikipedia title contains ambiguous words Antonio Flores Bernal Linking Wikipedia to the Web

14 Using Delicious It does not return results for all topics Long queries don't return any results Duplicates pages Antonio Flores Bernal Linking Wikipedia to the Web

15 Conclusion The anchor text index is a very effective method to retrieve home pages. Using Delicious on its own does not lead to very good results, but it does contain valuable information. This kind of system is effective at predicting the external links for Wikipedia pages Antonio Flores Bernal Linking Wikipedia to the Web

16 Linking Wikipedia to the Web Antonio Flores Bernal antoniofb3@gmail.com Department of Computer Sciencies San Pablo Catholic University 2010


Download ppt "Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010."

Similar presentations


Ads by Google