An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛 2016-04-18
Contents Introduction Survey Measure
Introduction In the semantic web, properties are used to describe various entities Like word collocation in natural language, collocation also exists among these properties Property collocation: a combination of properties that happens very often and more frequently than would happen by chance
Example Wikidata browser Freebase browser Property collocation is natural and common in real usage !
Introduction Various applications Entity browsing Vocabulary search and recommendation Ontology analysis and assessment … Previous work lacks comprehensive and dedicated investigation in property collocation Equivalent properties General relatedness of concepts or vocabularies In this paper, we present an empirical study of property collocation in large scale knowledge base
Survey Q1: Whether property collocation in the ontology exists and how common it is. Q2: How people agree on property collocation
Survey 31 users Setup: select 10 popular classes for Dbpedia, Wikidata and Freebase resp. Each user randomly browse 2 entities of each class and identify collocated properties
Survey Basic statistics Dbpedia Wikidata Freebase Entity Num 321 316 315 Covered property Num 18.2%, 513 in 2819 21.7%, 461 in 2128 12.1%, 810 in 6679 Covered property num with direction 832 694 1541 Group num 713 440 644
Cumulative percentage of properties for Q1 Dbpedia Wikidata Freebase
Cumulative number of groups for Q2 Dbpedia 11.7% for 3+ Wikidata 8% for 3+ Dbpedia: 11.7%, wikidata 8% , Freebase : 10% Freebase 10% for 3+
Cumulative number of collocated pairs for Q2 Wikidata 4758 pairs Dbpeida 6265 pairs Freebase 5907 pairs
Measure 1 Statistical association P(pi): the probability that a resource is described by pi in some context Phi Coefficient Symmetrical Uncertainty Coefficient Jaccard Coefficient
Measure 2 Semantic collocation Domain, range, property hierarchy dmin/rmin : minimal classes in domain/range SetSimd, SetSimr: similarity of minimal class set HRel: relatedness based on property hierarchy , shortest path
Measure 3 Lexical Similarity I-sub JaroWinkler Levenshtein similarity Wordnet
Measure evaluation Sort collocated property pairs based on the number of users Sort these property pairs based on different measure values Compute Spearman's rank correlation coefficient