Download presentation
Presentation is loading. Please wait.
Published byAlexander Gregory Modified over 9 years ago
1
Aidan Hogan, Antoine Zimmermann, Jürgen Umbrich, Axel Polleres, Stefan Decker Presented by Joseph Park SCALABLE AND DISTRIBUTED METHODS FOR ENTITY MATCHING, CONSOLIDATION AND DISAMBIGUATION OVER LINKED DATA CORPORA
2
Linked Data best practices: Use URIs as names for things (not just documents) Make those URIs dereferenceable via HTTP Return useful and relevant RDF content upon lookup of those URIs Include links to other datasets Linked Open Data project Goal of providing dereferenceable machine readable data in RDF Emphasis on reuse of URIs and inter-linkage between remote datasets Web of Data 30 billion published RDF triples INTRODUCTION
3
Focus on finding equivalent entities E.g. people, places, musicians, proteins Two entities are equivalent if they are coreferent Interest in identifying coreferences and merge knowledge contributions provided by distinct parties (consolidation) AIMS & GOALS
4
owl:sameAs A core OWL property that defines equivalences between individuals Two individuals related by owl:sameAs are coreferent Inferring new owl:sameAs relations: Inverse-functional properties (e.g :biologicalMotherOf) Functional properties (e.g :hasBiologicalMother) Cardinality and max-cardinality restrictions OWL:SAMEAS
5
CONSTRAINTS TO OWL:SAMEAS
6
1.118 billion quadruples Crawled from 3.985 million web documents 1.106 billion are unique 947 million are unique triples 9 machines linked by Gigabit ethernet EXPERIMENT
7
Extracted 11.93 million raw owl:sameAs quadruples Only 3.77 million unique triples 1000 randomly chosen pairs hand-checked Trivially same (661 times) Same (301 times) Different (28 times) Unclear (10 times) BASELINE – OWL:SAMEAS
8
No documents used owl:maxQualifiedCardinality 434 functional properties 57 inverse-functional properties 109 cardinality restrictions with a value of 1 52.93 million memberships of inverse-functional properties 22.14 million asserted 11.09 million memberships of functional properties 1.17 million asserted 2.56 million cardinality triples 533 thousand asserted CONSTRAINT COUNTS
9
Zero owl:sameAs inferences through cardinality rules 106.8 thousand owl:sameAs through functional-property reasoning 8.7 million owl:sameAs through inverse-functional-property reasoning Resulted in a total of 12.03 million owl:sameAs statements REASONING USING CONSTRAINTS
10
From the 12.03 million owl:sameAs quadruples 1000 randomly chosen and hand-checked: Trivially same (145 times) Same (823 times) Different (23 times) Unclear (9 times) RESULTS FROM CONSTRAINTS
11
Entity concurrence—sharing of outlinks, inlinks, and attribute values Higher score means more discriminating shared characteristics STATISTICAL CONCURRENCE
12
RUNNING EXAMPLE
13
Observed cardinality (e.g. Card_G_ex (foaf:maker; dblp:AliceB10) = 2) Observed inverse-cardinality (e.g. ICard_G_ex (foaf:gender; "female") = 2) Average inverse-cardinality (e.g. AIC_G_ex (foaf:gender) = 1.5) Can also be viewed as average non-zero cardinalities For example, foaf:gender; 1 for “male”, 2 for “female” QUANTIFYING CONCURRENCE
14
ADJUSTED AVERAGE INVERSE- CARDINALITY
15
CONCURRENCE COEFFICIENTS
16
COEFFICIENT EXAMPLE
17
AGGREGATED CONCURRENCE SCORE
18
Average cardinality of about 1.5 Average inverse-cardinality of about 2.64 Total of 636.9 million weighted concurrence pairs Mean concurrence weight of about 0.0159 Highly concurring entities were in many cases not coreferent RESULTS FROM CONCURRENCE
19
EXAMPLE OF CONCURRENCE
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.