Georgi Kobilarov, Chris Bizer, Sören Auer, Jens Lehmann Freie Universität Berlin, Universität Leipzig
Querying Wikipedia like a Database
Title Description Languages Web Links Categorization Domain specific Data Images Infoboxes
Infobox Extraction dbpedia:Albert_Einstein p:name „Albert Einstein“ dbpedia:Albert_Einstein p:birth_place dbpedia:Ulm dbpedia:Albert_Einstein p:birth_date „ “
Property Synonyms
Structuring Wikipedia‘s Knowledge Structuring actual data, not modeling the world Bound to Wikipedia Templates, parsers handle template values based on rules (property splitting, merging, transformation)
DBpedia Ontology DBpedia Ontology build from scratch 170 classes, 900 properties
No living things
Class Hierarchy „Select all TV Episodes …“
Template Mapping Class TV Episode (Work) Wikipedia Templates: Television Episode UK Office Episode Simpsons Episode DoctorWhoBox
Template Mapping Infobox Cricketer Infobox Historic Cricketer Infobox Recent Cricketer Infobox Old Cricketer Infobox Cricketer Biography => Class Cricketer (Athlete)
People Actors Athlete Journalist MusicalArtist Politician Scientist Writer
Places Airport City Country Island Mountain River
Organisations Band Company Educational Institution Radio Station Sports Team
Event Convention Military Conflict Music Event Sport Event
Work Book Broadcast Film Software Television
More structured data Categories in SKOS Intra-wiki links Disambiguation Redirects Links to Images (and Flickr) Links to external webpages
Data about 2.6 million “things”
274 million pieces of information (RDF triples)
Multilingual Abstracts – English: 2,613,000 – German: 391,000 – French: 383,000 – Dutch: 284,000 – Polish: 256,000 – Italian: 286,000 – Spanish: 226,000 – Japanese: 199,000 – Portuguese: 246,000 – Swedish: 144,000 – Chinese: 101,000
DBpedia as Linked Data Hub
Semantic Web “My document can point at your document on the Web, but my database can't point at something in your database without writing special purpose code. The Semantic Web aims at fixing that.” Prof. James Hendler
Web of Documents Web Browsers Search Engines AB CD HTML hyper links HTML HTTP
Web of Data B C Thing data link A D E Thing Search Engines Linked Data Mashups Linked Data Browsers HTTP
Linked Data Use URIs as names for things Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information. Include links to other URIs. so that they can discover more things. Wikipedia Article URI: DBpedia Resource URI
HTTP URIs Information Resources HTTP GET -> 200 OK Real-World Resources HTTP GET -> 303 See other -> 200 OK
Life Sciences Publications Online Activities Music Geographic Cross-Domain
4.5 billion triples 180 million data links
Use Cases
1.Data Source for Web-Applications 2.Querying Wikipedia like a database 3.Tag Web content with concepts instead of free-text tags 4.Vocabulary and semantic backbone for enterprise linked data integration
DBpedia as data source Embed structured information from Wikipedia into your web applications Build (mobile) maps applications using DBpedia data about places Display multilingual titles & descriptions in 15 languages
DBpedia Mobile
Sparql Endpoint
Wikipedia Query
Annotating Documents Use DBpedia concepts to annotate documents instead of free-text tags Named Entity Extraction Systems already use DBpedia URIs (OpenCalais, Muddy Boots) Social Bookmarking with DBpedia URIs as tags
„Apple“
Annotating Documents BBC editors tag news articles with DBpedia concepts DBpedia Lookup Service
Linking Enterprise Data Take the Linking Open Data approach to the enterprises
Connect data sets with DBpedia as shared vocabulary Enable meaningful navigation paths across BBC websites Browsing Madonna-related information across BBC News, BBC Music, BBC Programmes, … Make use of the rich background information: relate the release of a music album to a news article about the artist Linking Enterprise Data
The Future of DBpedia
Improve Information Extraction
Croud-source Information Extraction
Crowd Sourced Extraction Where‘s the user benefit?
Data Fusion
Cross-Language Data Fusion 264 Wikipedia Editions in different languages – Italian Wikipedians know more about Italian villages – German Wikipedia contains more person infoboxes Augment the infobox dataset with facts from other Wikipedia editions.
Augment DBpedia with External Data Linking Open Data cloud provides more data than Wikipedia – EuroStat provides additional statistical information about countries. – Musicbrainz contains additional information about other bands. – Geonames provides additional information about locations. Idea – Augment DBpedia with additional data from external sources.
Contribute back to Wikipedia Opportunity – Feed data back to Wikipedia Extend the Wikipedia authoring environment with – Suggestions for infobox values – Cross-language consistency checking for infoboxes Currently going on – New maps in Wikipedia based on Dbpedia Mobil Code (OpenStreetMap)
Contribute back to Wikipedia Initialize Wikipedia Clean-Up Cycles – Data-driven search interfaces expose the weaknesses of Wikipedia template system. – Preferred items not showing up in end-user interfaces may motivate Wikipedia editors to use templates more stringently.
Live Update Current Situation – DBpedia update cycle: 3 month – Wikipedia provides us with access to the live update stream Opportunity – Increase the currency of the DBpedia dataset using this update stream Result – DBpedia in synchronization with Wikipedia.
Open Source
Open Data
What is the Wikipedia for Data?
Wikipedia is the Wikipedia for Data
Summary