URI Disambiguation in the Context of Linked Data Afraz Jaffri, Hugh Glaser, Ian MillardECS, University of Southampton

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Also By The Same Author: AKTiveAuthor, A Citation Graph Approach To Name Disambiguation AKT DTA Colloquium January 23, 2006 Duncan McRae-Spencer.
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
Afraz Jaffri, Hugh Glaser, Ian Millard Electronics and Computer Science University of Southampton.
RKBExplorer.com: A Knowledge Driven Infrastructure for Linked Data Providers Hugh Glaser, Ian C. Millard, Afraz Jaffri ESWC 2008 Demo.
RKBExplorer: Repositories, Linked Data and Research Support Hugh Glaser, Ian Millard & Les Carr At Eprints User Group, Open Repositories 2009.
Prototype Knowledge Base: an on-line information service in dependability and security Hugh Glaser Electronics & Computer Science University of Southampton.
Last update: (2) (3) The Dutch airline.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
How to Create an MLA citation for a web document....
Kunal Narsinghani Ashwini Lahane Ontology Mapping and link discovery.
Current Trends in Databases Network Effects (co-presentation) Gert Nelissen UHasselt
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
SciVal Experts & SciVal Funding Information Sessions.
You Cannot ReSIST Hugh Glaser Electronics & Computer Science University of Southampton DSSE, 28th February 2007.
RDF Kitty Turner. Current Situation there is hardly any metadata on the Web search engine sites do the equivalent of going through a library, reading.
Behshid Behkamal Ferdowsi University of Mashhad Web Technology Lab.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
A Privacy Preserving Efficient Protocol for Semantic Similarity Join Using Long String Attributes Bilal Hawashin, Farshad Fotouhi Traian Marius Truta Department.
Research on Linked Data and Co-reference Resolution Hugh Glaser, Ian Millard: University of Southampton, UK 성원경, 이승우, 김평, 류범종 Won-Kyung Sung, Seungwoo.
How to Use Google Scholar An Educator’s Guide
Entity Recognition via Querying DBpedia ElShaimaa Ali.
Data Quality on the Semantic Web Behshid Behkamal Date: 1388/11/14.
Shared innovation Linking Distributed Data across the Web Dr Tom Heath Researcher, Platform Division Talis Information Ltd t
Citation Searching with Web of Knowledge Roger Mills Catherine Dockerty OULS Bio- and Environmental.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Resource Curation and Automated Resource Discovery.
Learning Patterns on the World Wide Web Andrew Hogue Advisor: David Karger October 17, 2003.
University of Florida CTSI: Consuming and disambiguating publications data from Microsoft Academic Search in VIVO. Nicholas Rejack 1, Erik Schmidt 1, Michael.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Researching the Internet and Plagiarism. When you are given a research project, where is the first place you look for your information?
RDF and XML 인공지능 연구실 한기덕. 2 개요  1. Basic of RDF  2. Example of RDF  3. How XML Namespaces Work  4. The Abbreviated RDF Syntax  5. RDF Resource Collections.
RELATORS, ROLES AND DATA… … similarities and differences.
© Copyright 2013 STI INNSBRUCK “How to put an annotation in HTML?” Ioannis Stavrakantonakis.
Author Name Disambiguation in Medline Vetle I. Torvik and Neil R. Smalheiser August 31, 2006.
UCSD Libraries Portal Project: Building a Database-Driven Web Content Management System Sharecase, 3/28/2001 Esmé Cowles and Laura Galvan-Estrada.
Introduction to the Semantic Web and Linked Data
Using linked data to interpret tables Varish Mulwad September 14,
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
SamePlaceAs.org Linking Locations since 2011 dc:creatorTodd Pehle foaf:mboxtpehle at orbistechnologies.com foaf:memberOrbis Technologies dc:dateDecember.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Metadata – A Bedrock for Official Statistics Dr. S. M. Tam, Chief Methodologist, ABS.
Citation Searching Isabel Holowaty Juliet Ralph
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
University of Florida’s dchecker: Software for ensuring semantic data integrity Nicholas Rejack, MS 1, Christopher P. Barnes 1, Michael Conlon, PhD 2
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Uncertainty reasoning for Linked.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Shared innovation Linking Distributed Data across the Web Dr Tom Heath Researcher, Platform Division Talis Information Ltd t
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Warren Shen, Xin Li, AnHai Doan Database & AI Groups University of Illinois, Urbana Constraint-Based Entity Matching.
Shared innovation Linking Distributed Data across the Web Dr Tom Heath Researcher, Platform Division Talis Information Ltd t
Open Research Data and Open Access publications: How do they sit in the Web of Science? Guillaume Rivalle, Manager, Europe solution specialists
Cloud based linked data platform for Structural Engineering Experiment
Elsevier Activity Range
Big Data Quality the next semantic challenge
9/21 Find and cite a website source
PREMIS Tools and Services
Rui Yang, Wei Hu and Yuzhong Qu
Big Data Quality the next semantic challenge
Introduction to Information Retrieval
WISER: Citiation searching
CS639: Data Management for Data Science
Big Data Quality the next semantic challenge
Presentation transcript:

URI Disambiguation in the Context of Linked Data Afraz Jaffri, Hugh Glaser, Ian MillardECS, University of Southampton

Presentation Outline Linked Data Repositories Coreference on the Semantic Web Author Disambiguation DBLP Linked Data DBLP Author Disambiguation Disambiguation Results DBpedia Possible Solutions Summary URI Disambiguation in the Context of Linked Data LDOW Beijing, China2

RKBexplorer.com Contains URIs for more than 10 million entities Over 25 Linked Data sites, including: Data relating to people, projects, papers and institutions A single entity has a number of URIs (even within the same repository) Entities are linked using CRSes URI Disambiguation in the Context of Linked Data LDOW Beijing, China3 DBLP

Linked Data Repositories Existing databases on the Web are being exposed as Linked Data (D2R, Virtuoso) Databases contain inconsistencies and require constant curation Datasets such as Wikipedia are being continually checked and updated, especially in the case of disambiguation (WikiProject_Disambiguation) Linked Data repositories should also provide consistent data URI Disambiguation in the Context of Linked Data LDOW Beijing, China4

Disambiguation on the Semantic Web Coreference on the Semantic Web is defined as being the situation where two or more URIs are used for a single non-information resource URI usage can change with context Non-Information resource equality is hard to define precisely Examples ‘Hugh Glaser’ at Southampton vs. ‘Hugh Glaser’ at Imperial ‘Harry Potter and the Order of the Phoenix’ in Hardback vs. Softback ISBN: URI Disambiguation in the Context of Linked Data 5 LDOW Beijing, China

URI Multiplicity URIs for ‘Spain’: URIs for ‘Hugh Glaser’: URI Disambiguation in the Context of Linked Data 6 LDOW Beijing, China

Author Disambiguation A known problem in the Information Science field How to determine: Hugh Glaser/H. Glaser/Glaser, H. are the same person? How to determine: Tom Anderson – Newcastle University Tom Anderson – University of Washington are different people? URI Disambiguation in the Context of Linked Data 7 LDOW Beijing, China

Existing Approaches String Metrics - Name Equivalence identification - Record Linkage - Citation Matching Web Assisted - Look up publications on author’s home page - Use search engine results on publication title Machine Learning - k-way spectral clustering - Use author name, co-author frequency and publication venue URI Disambiguation in the Context of Linked Data 8 LDOW Beijing, China

DBLP Linked Data Converted from an XML dump of DBLP database Publications Authors 28 million triples Updated Weekly Linked to other datasets including RDF Book Mashup and RKBExplorer.com URI Disambiguation in the Context of Linked Data 9 LDOW Beijing, China

DBLP Author Disambiguation 49 names - 10 most common English surnames with 5 common first names Authors disambiguated by looking at homepage, web publication, search engine results and institution When in doubt, authors assumed to be the same if: - The co-authors of any publication are the same - The publication venue was the same - The area of research was the same URI Disambiguation in the Context of Linked Data 10 LDOW Beijing, China

It’s all about Identity 8LDOW2008 – Beijing, China URI Disambiguation in the Context of Linked Data Tom Anderson – Is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dblp:editor of dblp:editor Vice President O-in Design Automation inc. USAProfessor, University of NewcastleProfessor, Heriot Watt UniversityUniversity of WashingtonUniversity of California, BerkelyTom Andersen - University of DenmarkLucent Technologies, Illinois

DBLP Author Disambiguation Results 92% of authors with common names had publications incorrectly merged Worst case - 15 different authors with 1 URI Many authors who are the same have publications under different names (Cliff Jones, C.B. Jones) Inconsistency in data means inconsistency with linked data It is incorrect to use owl:sameAs to link different authors who have the same URI URI Disambiguation in the Context of Linked Data 12 LDOW Beijing, China

DBpedia DBpedia 3.0 improves disambiguation management by including the ‘disambiguates’ property owl:sameAs linkage still inconsistent: owl:sameAs. owl:sameAs. URI Disambiguation in the Context of Linked Data 13 LDOW Beijing, China

Possible Solutions CRS: Consistent Reference Service - Groups similar URIs into ‘bundles’ - Bundles can be made according to context - Each KB can have one or more CRSes OKKAM - Coming up soon! URI Disambiguation in the Context of Linked Data 14 LDOW Beijing, China

Summary Linked Data providers need to think about data consistency in the same way as database providers Failure to manage coreference within datasets leads to incorrect linkage with other datasets The network effect of the Web of Data means coreference needs to be even more carefully managed than in the Web of Documents Systems are being developed to help manage coreference, the community needs to decide how to handle the problem URI Disambiguation in the Context of Linked Data 15 LDOW Beijing, China

Questions? Further questions: a.o.jaffri icm URI Disambiguation in the Context of Linked Data 16 LDOW Beijing, China