Aidan Hogan, Antoine Zimmermann, Jürgen Umbrich, Axel Polleres, Stefan Decker Presented by Joseph Park SCALABLE AND DISTRIBUTED METHODS FOR ENTITY MATCHING,

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Chapter 8 Flashcards.
OSLC Resource Shape: A Linked Data Constraint Language Arthur Ryman & Achille Fokoue, IBM W3C RDF Validation Workshop, Cambridge,
Chronos: A Tool for Handling Temporal Ontologies in Protégé
Semantic Web Thanks to folks at LAIT lab Sources include :
April 15, 2004SPIE1 Association in Level 2 Fusion Mieczyslaw M. Kokar Christopher J. Matheus Jerzy A. Letkowski Kenneth Baclawski Paul Kogut.
Semantic Web Introduction
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
OWL TUTORIAL APT CSA 3003 OWL ANNOTATOR Charlie Abela CSAI Department.
1 Publishing Linked Sensor Data Semantic Sensor Networks Workshop 2010 In conjunction with the 9th International Semantic Web Conference (ISWC 2010), 7-11.
International Workshop Linked Open Data & the Jewish Cultural Heritage Rome, 20 th January 2015 International Workshop Linked Open Data & the Jewish Cultural.
Dewey Summaries as Multilingual Linked Data Dewey Breakfast/Update ALA Annual July 11, 2009.
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Context Dependent Reasoning.
Ontologies and the Semantic Web by Ian Horrocks presented by Thomas Packer 1.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
1 An Introduction To The Semantic Web. 2 Information Access on the Web Find an mp3 of a song that was on the Billboard Top Ten that features a cowbell.
Chapter 8: Web Ontology Language (OWL) Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley, 2005.
LINKED DATA COMS E6125 Prof. Gail Kaiser Presented By : Mandar Mohe ( msm2181 )
Semantic Web Presented by: Edward Cheng Wayne Choi Tony Deng Peter Kuc-Pittet Anita Yong.
1 Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies.
Leveraging Names with Linked Data Karen Smith-Yoshimura Ralph LeVan 2010 RLG Partnership Annual Meeting Chicago, IL 9 June 2010.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards a Social Notion of Provenance on the Web Andreas Harth,
13 Dec. 2006CmpE 583 Fall 2006 OWL Lite- Property Char’s. 1 OWL Lite: Ch. 13- Property Characteristics Atilla ELÇİ.
Knowledge based Learning Experience Management on the Semantic Web Feng (Barry) TAO, Hugh Davis Learning Society Lab University of Southampton.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
Biomedical Informatics Introduction to Ontology Liqin Wang, MS SWE Workshop 2011 Aug 10 th, 2011.
Web Standards and Technical Challenges for Publishing and Processing Open Data Axel Polleres web:
URI Disambiguation in the Context of Linked Data Afraz Jaffri, Hugh Glaser, Ian MillardECS, University of Southampton
Master Thesis Defense Jan Fiedler 04/17/98
KIT – University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association Institute of Applied Informatics.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Michael Eckert1CS590SW: Web Ontology Language (OWL) Web Ontology Language (OWL) CS590SW: Semantic Web (Winter Quarter 2003) Presentation: Michael Eckert.
 Copyright 2007 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Scalable Authoritative OWL.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
UNCERTML - DESCRIBING AND COMMUNICATING UNCERTAINTY WITHIN THE (SEMANTIC) WEB Matthew Williams
Problems in Semantic Search Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu 1.
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Reasoning and Querying for.
“Automating Reasoning on Conceptual Schemas” in FamilySearch — A Large-Scale Reasoning Application David W. Embley Brigham Young University More questions.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Dr. Lowell Vizenor Ontology and Semantic Technology Practice Lead Alion Science and Technology Semantic Technology: A Basic Introduction.
Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009
Toward a framework for statistical data integration Ba-Lam Do, Peb Ruswono Aryan, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, A Min Tjoa Linked Data Lab,
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Yahoo! BOSS Open up Yahoo!’s Search data via web services Developer & Custom Tracks Big Goal – If you’re in a vertical and you perform a search, you should.
The Relational Model © Pearson Education Limited 1995, 2005 Bayu Adhi Tama, M.T.I.
© Copyright 2015 STI INNSBRUCK PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Uncertainty reasoning for Linked.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Subjects in the FR family
User Modeling for Personal Assistant
Linked Data Web that can be processed by machines
Data.gov: Web, Data Web, Social Data Web 7/22/2010 #health2stat.
Map Reduce.
Saisai Gong, Wei Hu, Yuzhong Qu
Big Data Quality the next semantic challenge
Lifting Data Portals to the Web of Data
Research at Open Systems Lab IIIT Bangalore
ece 720 intelligent web: ontology and beyond
NJVR: The NanJing Vocabulary Repository
Property consolidation for entity browsing
Big Data Quality the next semantic challenge
Leverage Consensus Partition for Domain-Specific Entity Coreference
Big Data Quality the next semantic challenge
Presentation transcript:

Aidan Hogan, Antoine Zimmermann, Jürgen Umbrich, Axel Polleres, Stefan Decker Presented by Joseph Park SCALABLE AND DISTRIBUTED METHODS FOR ENTITY MATCHING, CONSOLIDATION AND DISAMBIGUATION OVER LINKED DATA CORPORA

 Linked Data best practices:  Use URIs as names for things (not just documents)  Make those URIs dereferenceable via HTTP  Return useful and relevant RDF content upon lookup of those URIs  Include links to other datasets  Linked Open Data project  Goal of providing dereferenceable machine readable data in RDF  Emphasis on reuse of URIs and inter-linkage between remote datasets  Web of Data  30 billion published RDF triples INTRODUCTION

 Focus on finding equivalent entities  E.g. people, places, musicians, proteins  Two entities are equivalent if they are coreferent  Interest in identifying coreferences and merge knowledge contributions provided by distinct parties (consolidation) AIMS & GOALS

 owl:sameAs  A core OWL property that defines equivalences between individuals  Two individuals related by owl:sameAs are coreferent  Inferring new owl:sameAs relations:  Inverse-functional properties (e.g :biologicalMotherOf)  Functional properties (e.g :hasBiologicalMother)  Cardinality and max-cardinality restrictions OWL:SAMEAS

CONSTRAINTS TO OWL:SAMEAS

 billion quadruples  Crawled from million web documents  billion are unique  947 million are unique triples  9 machines linked by Gigabit ethernet EXPERIMENT

 Extracted million raw owl:sameAs quadruples  Only 3.77 million unique triples  1000 randomly chosen pairs hand-checked  Trivially same (661 times)  Same (301 times)  Different (28 times)  Unclear (10 times) BASELINE – OWL:SAMEAS

 No documents used owl:maxQualifiedCardinality  434 functional properties  57 inverse-functional properties  109 cardinality restrictions with a value of 1  million memberships of inverse-functional properties  million asserted  million memberships of functional properties  1.17 million asserted  2.56 million cardinality triples  533 thousand asserted CONSTRAINT COUNTS

 Zero owl:sameAs inferences through cardinality rules  thousand owl:sameAs through functional-property reasoning  8.7 million owl:sameAs through inverse-functional-property reasoning  Resulted in a total of million owl:sameAs statements REASONING USING CONSTRAINTS

 From the million owl:sameAs quadruples  1000 randomly chosen and hand-checked:  Trivially same (145 times)  Same (823 times)  Different (23 times)  Unclear (9 times) RESULTS FROM CONSTRAINTS

 Entity concurrence—sharing of outlinks, inlinks, and attribute values  Higher score means more discriminating shared characteristics STATISTICAL CONCURRENCE

RUNNING EXAMPLE

 Observed cardinality (e.g. Card_G_ex (foaf:maker; dblp:AliceB10) = 2)  Observed inverse-cardinality (e.g. ICard_G_ex (foaf:gender; "female") = 2)  Average inverse-cardinality (e.g. AIC_G_ex (foaf:gender) = 1.5)  Can also be viewed as average non-zero cardinalities  For example, foaf:gender; 1 for “male”, 2 for “female” QUANTIFYING CONCURRENCE

ADJUSTED AVERAGE INVERSE- CARDINALITY

CONCURRENCE COEFFICIENTS

COEFFICIENT EXAMPLE

AGGREGATED CONCURRENCE SCORE

 Average cardinality of about 1.5  Average inverse-cardinality of about 2.64  Total of million weighted concurrence pairs  Mean concurrence weight of about  Highly concurring entities were in many cases not coreferent RESULTS FROM CONCURRENCE

EXAMPLE OF CONCURRENCE