DBpedia: A Nucleus for a Web of Open Data

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Creating Linked Data Juan F. Sequeda Semantic Technology Conference June 2011.
Linked Data for Libraries, Archives, Museums. Learning objectives Define the concept of linked data State 3 benefits of creating linked data and making.
Chris Bizer, Richard Cyganiak: D2RQ – Lessons Learned ( ) W3C Workshop on RDF Access to Relational Databases October, 2007 — Boston, MA,
Presented By: Kiran Kancharlapalli DBMS - Topics 11 & 12.
Georgi Kobilarov, Chris Bizer, Sören Auer, Jens Lehmann Freie Universität Berlin, Universität Leipzig.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
LINKED DATA COMS E6125 Prof. Gail Kaiser Presented By : Mandar Mohe ( msm2181 )
The Web of Linked Data Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.
1 Chapter 11 Developing Custom Help. 11 Chapter Objectives Use HTML to create customized Help topics for an application Use the HTML Help Workshop to.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Linked Open Data: a new resource for eResearch Dr Anne Cregan eResearch Analyst, Intersect and ANDS
© Copyright 2012 STI INNSBRUCK
Entity Recognition via Querying DBpedia ElShaimaa Ali.
University of Sheffield, NLP Entity Linking Kalina Bontcheva © The University of Sheffield, This work is licensed under the Creative Commons.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 10: Knowledge and The Social Web.
The Semantic Web Web Science Systems Development Spring 2015.
How did the internet develop?. What is Internet? The internet is a network of computers linking many different types of computers all over the world.
Semantic Search: different meanings. Semantic search: different meanings Definition 1: Semantic search as the problem of searching documents beyond the.
Shared innovation Linking Distributed Data across the Web Dr Tom Heath Researcher, Platform Division Talis Information Ltd t
© Copyright 2008 STI INNSBRUCK Media Meets Semantic Web – How the BBC Uses DBpedia and Linked Data to Make Connections.
Publishing and Interacting with Linked Data Roberto Garcia, Josep Maria Brunetti, Antonio López-Muzás, Juan Manuel Gimeno, Rosa Gil WIMS’11 Conference,
Semantic Web Applications GoodRelations BBC Artists BBC World Cup 2010 Website Emma Nherera.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
Christian Bizer: The Web of Linked Data (26/07/2009) SRI International, Artificial Intelligence Center Menlo Park, USA, 24 July 2009 The Emerging Web of.
Ontology-Based Information Extraction: Current Approaches.
Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Boris Villazón-Terrazas, Ghislain Atemezing FI, UPM, EURECOM, Introduction to Linked Data.
NLPainter “Text Analysis for picture/movie generation” David Leoni Eduardo C á rdenas 12/01/2012.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
You sexy beast. Ok, inappropriate. How about: Web of links to Web of Meaning Hello Semantic Web!
A Short Tutorial to Semantic Media Wiki (SMW) [[date:: July 21, 2009 ]] At [[part of:: Web Science Summer Research Week ]] By [[has speaker:: Jie Bao ]]
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Linked Data: Emblematic applications on Legacy Data in Libraries.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 A Sitemap extension to enable efficient interaction with large.
RDF and Relational Databases
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
DBpedia - A Crystallization Point
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes Alexei Bulazel 1 ( ), Dominic Difranzo 1 (
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Linked Data Theatre Slide deck. The challenge Linked Data We love our Linked Data! Turtle representation But it doesn’t look good.
Shared innovation Linking Distributed Data across the Web Dr Tom Heath Researcher, Platform Division Talis Information Ltd t
Shared innovation Linking Distributed Data across the Web Dr Tom Heath Researcher, Platform Division Talis Information Ltd t
Linking Open Drug Data (HCLSIG LODD)
Internet Made Easy! Make sure all your information is always up to date and instantly available to all your clients.
Linked Data Web that can be processed by machines
Cloud based linked data platform for Structural Engineering Experiment
Triple Stores.
Presented at Archives Records 2016, session 510
Yaşar Tonta & Orçun Madran [yasartonta, Hacettepe University
Linked (Open) Data Speaker: 呂瑞麟 國立中興大學資訊管理學系教授
Zachary Cleaver Semantic Web.
Cataloging the Internet
Linking Open Drug Data (HCLSIG LODD)
DBpedia 2014 Liang Zheng 9.22.
LOD reference architecture
Triple Stores.
Linking Open Drug Data (HCLSIG LODD)
Semantic MediaWiki BCHB697.
Linked Data Ryan McAlister.
Presentation transcript:

DBpedia: A Nucleus for a Web of Open Data Original presentation by Christian Bizer, Freie Universität Berlin Sören Auer , Universität Leipzig Georgi Kobilarov, Freie Universität Berlin Jens Lehmann, Universität Leipzig Richard Cyganiak, Freie Universität Berlin Edited by Sangkeun Lee

DBpedia.org is a effort to : extract structured information from Wikipedia make this information available on the Web under an open license interlink the DBpedia dataset with other datasets on the Web

Outline: 1. Extracting Structured Information from Wikipedia 2. The DBpedia Dataset 3. Accessing the DBpedia Dataset over the Web 4. Use Cases: Improving Wikipedia Search Royalty-Free Data Source for other Applications Nucleus for the Emerging Web of Data

Title Abstract Infoboxes Geo-coordinates Categories Images Links Other languages Other wiki pages To the web Redirects Disambiguates

Extracting Structured Information from Wikipedia 􀀟 Wikipedia consists of 􀁺 6.9 million articles 􀁺 in 251 languages 􀁺 monthly growth-rate: 4% 􀀟 Wikipedia articles contain structured information 􀁺 infoboxes which use a template mechanism 􀁺 images depicting the article’s topic 􀁺 categorization of the article 􀁺 links to external webpages 􀁺 intra-wiki links to other articles 􀁺 inter-language links to articles about the same topic in different languages

Overview of the DBpedia component

Traditional Web Browser Web 2.0 Mashups Semantic Web Browsers SPARQL Endpoint Linked Data SNORQL Browser Query Builder Virtuoso Articles MySQL Infobox Categories Wikipedia Dumps DB tables Article texts DBpedia datasets loaded into published via Extraction

Wikitext Syntax:

Extracting Infobox Data (RDF Representation): http://en.wikipedia.org/wiki/Calgary http://dbpedia.org/resource/Calgary dbpedia:native_name Calgary”; dbpedia:altitude “1048”; dbpedia:population_city “988193”; dbpedia:population_metro “1079310”; mayor_name dbpedia:Dave_Bronconnier ; governing_body dbpedia:Calgary_City_Council; ...

How good is the extraction from the markup in Wiki pages? Question: How good is the extraction from the markup in Wiki pages?

􀀟 Short and long abstracts in 10 different languages dbpedia:Calgary dbpedia:abstract “Calgary is the largest ...”@en ; dbpedia:abstract “Calgary ist eine Stadt ...”@de . 􀀟 Categorization information skos:subject dbpedia:Category_Cities_in_Alberta ; skos:subject dbpedia:Host_cities_Olympic_Games . 􀀟 Links to the original Wikipedia articles, pictures and relevant external web pages foaf:page <http://en.wikipedia.org/wiki/Calgary> ; dbpedia:wikipage-de<http://de.wikipedia.org/wiki/Calgary> ; foaf:depiction <http://upload.wikimedia.org/thumb/3/32> ; dbpedia:reference <http://www.calgary.ca> ; dbpedia:reference <http://www.tourismcalgary.com>.

DBpedia Basics : The structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content. The DBpedia.org project uses the   Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web. It uses the  SPARQL query language to query this data. At  Developers Guide to Semantic Web Toolkits you find a development toolkit in your preferred programming language to process DBpedia data.

The DBpedia Dataset 􀀟 1,600,000 concepts 􀀟 including 􀁺 58,000 persons 􀁺 70,000 places 􀁺 35,000 music albums 􀁺 12,000 films 􀀟 described by 91 million triples 􀀟 using 8,141 different properties. 􀀟 557,000 links to pictures 􀀟 1,300,000 links external web pages 􀀟 207,000 Wikipedia categories 􀀟 75,000 YAGO categories

Accessing the DBpedia Dataset over the Web 1. SPARQL Endpoint 2. Linked Data Interface 3. DB Dumps for Download

SPARQL : SPARQL is a query language for RDF. RDF is a directed, labeled graph data format for representing information in the Web. This specification defines the syntax and semantics of the SPARQL query language for RDF. SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware.

The DBpedia SPARQL Endpoint 􀀟 http://dbpedia.org/sparql 􀀟 hosted on a OpenLink Virtuoso server 􀀟 can answer SPARQL queries like 􀁺 Give me all Sitcoms that are set in NYC? 􀁺 All tennis players from Moscow? 􀁺 All films by Quentin Tarentino? 􀁺 All German musicians that were born in Berlin in the 19th century?

Interesting Example: entities Table To know everything Bart wrote on blackboard board in season 12 of Simpson's: The Simpson episode Wikipedia pages are the identified "things" that we would consider as the subjects of our RDF triples. The bottom of the Wikipedia page for the "Tennis the Menace" episode tells us that it is a member of the Wikipedia category "The Simpsons episodes, season 12". The episode's DBpedia page tells us that p:blackboard is the property name for the Wikipedia infobox "Chalkboard" field. entities SELECT ?episode,?chalkboard_gag WHERE { ?episode skos:subject <http://dbpedia.org/resource/Category:The_Simpsons_episodes%2C_season_12>. ?episode dbpedia2:blackboard ?chalkboard_gag } Table

The Linked Data Interface: A large body of information and knowledge is often already available in structured form, yet not accessible as such on the Web. Integrating open data provides real value. It saves the time and effort to re-enter data that is already out there and it leaves the data and editing where it belongs: at its origin. Linked Data on the Web can be accessed using Semantic Web browsers, just as the traditional Web of documents is accessed using HTML browsers. Semantic Web browsers enable users to navigate between different data sources by following RDF links. It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web.

The Linked Data Interface 􀀟 The project follows the Linked Data principles All concepts are identified using Uniform Resource Identifier references. URI is a compact string of characters used to identify or name a resource. 􀁺 The Linked Data interface can be used by Semantic Web Browsers, like - DISCO Hyperdata Browser - Tabulator Browser - OpenLink RDF Browser Semantic Web Crawlers, like - Zitgist (Zitgist LLC, USA) - SWSE (DERI, Ireland) - Swoogle (UMBC, USA )

DBpedia Use Cases 1. Improving Wikipedia Search 2. Royalty-Free Data Source for other Applications 3. Nucleus for the Emerging Web of Data

Improving Wikipedia Search (Various Interfaces)

Query to find all web browser S/W at http://wikipedia.askw.org :

Improving Wikipedia Search

Royalty-Free Data Source for other Applications 􀀟 DBpedia is published under GNU Free Documentation License 􀀟 Example use case: SPARQL generated tables within webpages Royalty-Free Data Source for other ApplicationsRoyalty-Free Data Source for other Applications

Nucleus for the Emerging Web of Data 􀀟 W3C SWEO Linking Open Data Project 􀀟 Over all size of the dataset: over 1 billion RDF triples 􀀟 Out-bound RDF links within DBpedia: 75,000

Proposed Improvements: 􀀟 Better data cleansing required. 􀀟 Improvement in the classification. 􀀟 Interlink DBpedia with more datasets. 􀀟 Improvement in the user interfaces. 􀀟 Performance 􀀟 Scalability 􀀟 More Expressiveness

Discussion DBpedia is the first and largest source of structured data on the Internet covering topics of general knowledge. DBpedia gains new information when it extracts data from the latest Wikipedia dump, whereas Freebase, in addition to Wikipedia extractions, gains new information through its userbase of editors. Which one is better approach? Can Freebase or DBpedia be substitute for Wikipedia? Freebase : Not good in that we have two similar things – Wikipedia, Freebase DBPedia : Not good in that it extracts data from dump How can we interlink Freebase & DBpedia? What can be killer applications using Dbpedia? If there is, okay If there is no, do we really need a large general structured knowledge?