Finding, Linking and Organizing Resources with Linked Data & Natural Language Processing Paul Buitelaar Unit for Natural Language Processing Digital Enterprise Research Institute - National University of Ireland, Galway Copyright 2010 Digital Enterprise Research Institute. All rights reserved, Paul Buitelaar
What does that mean? What is a (re)source? What is a link? What resources can we link – and how? How to find and organize resources and links? Let’s go through an example… Finding, Linking & Organizing Resources
Linking van Gogh (resources)
Linking van Gogh (links)
Linking van Gogh (objects) March 30 th, 1853 personObj-1 Zundert July 29 th, 1890 Auvers-sur-Oise Vincent van Gogh personRepresentationObj-1 personBirthObj-1 locationObj-1 locationObj-2 personDeathObj-1 personObj-2 personRepresentationObj-2 Theo van Gogh
Finding Resources (and Links) Structured Data (Proprietary databases, thesauri etc.) Open-domain databases, thesauri, etc. … … increasingly turned into ‘Linked Open Data’ Unstructured Data (Proprietary textual descriptions, images, videos etc.) Open-domain textual descriptions, images, videos etc. … … to be turned into & connected with ‘Linked Open Data’
Finding Links in Text
Linking van Gogh - continued personObj-1 artistObj-1 bold brushstrokes Diego Velazquez personRepresentationObj-2 artistObj-2 artistTechniqueObj-1 Vincent van Gogh personRepresentationObj-1
The Remainder of this Talk Linked Open Data (LOD) Some LOD applications & tools LOD and Natural Language Processing
Linked Open Data Turning web of documents into a Web of Data Uniquely identifying web objects (documents, images, named-entities, facts, …) Enabling the discovery & interlinking of web objects through semantic metadata Open access to data
Linked Open Data ‘cloud’
Linked Open Media Data
LOD Applications Search Engine for the Web of Data SIGMA (builds on Contact: Giovanni Tummarello, DERI Music Recommendation Contact: Alexandre Passant, DERI Research Collaboration Support, Expert Finding Contact: Paul Buitelaar, DERI
Search the Web of Data with SIGMA
More Data – but also more issues…
dbrec : the Web of Data recommends…
Mary Black is related to Frances Black …
… and this is why
Saffron : Expert Finding
Expertise Topic Extraction
Publication Browsing
Expert Browsing
Publication Details: Abstract/PDF
Publication Details: Authors/Topics
Expertise Topic Details
Personalization
Personalized Expert Recommendation
Linking Objects in Saffron Author Document Title PDF Topic Affiliation Researcher Picture Researcher ExpertiseTopic
Other LOD Application Areas Linked Open Drug Data (Matthias Samwald, DERI) - W3C WG includes participation by Johnson & Johnson, AstraZeneca Open Government Data (Richard Cyganiak, DERI) - includes data sets from USA, UK, Australia, Canada, Sweden, New Zealand Library Linked Data (Jodi Schneider, DERI) Financial Linked Data (Sean O’Riain, DERI) Linking Enterprise Data for Business Intelligence
Linked with extracted Financial Facts (amounts) – annotated with semantic metadata (financial meaning) according to eXtensible Business Reporting Language (XBRL) Financial Linked Data
Some LOD Tools ‘RDB2RDF’ - mapping relational DBs to RDF (incl. Survey Report) ‘Silk’ (Freie Universitaet Berlin) - specify links to use in discovering relationships between LOD data items Semantic Drupal, ‘sparqlviews’ (Lin Clark, DERI) - easy integration of Linked Data in CMS Drupal EU Projects
Open LOD Issues How to integrate new LOD into the LOD cloud – with addition of information rather than duplication? Entity consolidation dbpedia:JohnSmith owl:sameAs bbcmusic:JohnSmith Vocabulary alignment geonames:location owl:sameAs dbpedia:place How to identify the most fitting LOD resources for a particular application/domain? Estimate application/domain semantics Match application/domain semantics with LOD semantics
Linked Open Data LOD and Natural Language Processing Domain/Application Semantics Linked Open Data for Domain/Application Domain Corpus YZ X YZ X LOD vocabularies YZ X Y1Y1 Z1Z1 LOD instances from domain corpus
Acknowledgements & Further Info DERI colleagues on all things ‘linked open data’, for more info The Saffron team (in alphabetical order) Georgeta Bordea, Fergal Monaghan, Krystian Samp Grant support Science Foundation Ireland Grant No. SFI/08/CE/I1380 for Lion EU FP7 Grant No for the Monnet project on Multilingual Ontologies for Networked Knowledge