An Interactive Approach to Collectively Resolving URI Coreference Saisai Gong, Wei Hu, Gong Cheng, Yuzhong Qu
Contents Background Related Work Overview of our Approach Evolvement of Individual Partition Computing Consensus Partition Evaluation Conclusion
Background owl:sameAs URICoreference …… http://advogato.org/person/timbl/foaf.rdf#me http://www.w3.org/People/Berners-Lee/card#i URICoreference http://data.semanticweb.org/person/tim-berners-lee …… http://dbpedia.org/resource/Tim_Berners-Lee http://dblp.l3s.de/d2r/resource/authors/Tim_Berners-Lee
Related Work Fully automatic approaches OWL semantics Similarities between descriptions Self –training … Automatic approaches remain far from prefect (see Ferrara et al. 2013 )
Related Work (Cont.) Semi-automatic approaches Active learning Micro-task crowdsourcing … Assumptions made by semi-automatic approaches Users act as “oracle” One single right answer Not always hold Users may have different opinions Disagreement among users happen Distinguish a user's individual URI coreference from the mass Resolve disagreement among users
Our Approach iReC iReC: an interactive approach to resolve collectively URI coreference with user involvement Basic idea: achieve a good partition of the URI universe Maintain individual partition for each user Form consensus partition aggregated from individual ones Evolve partitions through user interaction Two goals Alleviate user involvement Reflect the collective power of masses
Overview of our Approach
Candidate Selector Generating Candidates Find potential coreference from various sources owl:sameAs links existing resolution services such as sameas.org, keyword-based entity search engines such as Falcons Object Search the user's individual partition the consensus partition Merge URIs belonging to the same equivalent class into a candidate entity
Learning Binary Classifier To reduce user involvement Learning model: averaged perceptron (See Collins 02) Online learning algorithm Learn individual classifier both online and offline, learn global one offline
Learning Binary Classifier Training data Online : latest URI pairs from user feedback Offline training examples Positive : URIs pairs from equivalent classes Negative URI pairs from user feedback URI pairs from different equivalent classes sharing types URI pairs Falcons search result
Learning Binary Classifier Training algorithm Feature : the cartesian product of the two candidates' properties Feature value: for each property pair, compute maximum similarity of the given two properties’ values URIs: vsim=1 iff identical or in equivalent class Numeric literals: vsim=1 iff difference less than threshold Boolean literals: vsim=1 iff value equal Other literals: Jaccard similarity
Learning Binary Classifier Training algorithm
Selecting Most Beneficial Candidate Combine individual classifier and global one by their weights (α_+ β = 1) Confidence of coreference based on margin The larger the absolute value of margin is, the higher the confidence is Uncertainty: the absolute value of margin Select candidate with minimum absolute value of margin
Comparative Snippets To facilitate user interaction Coreferent (non-coreferent resp.): values of discriminative property pairs signicantly similar (dissimilar resp.) Discriminability of property pairs: absolute values of weight in combined classifier
Comparative Snippets Compute maximum weighted matching on the bipartite graph from property pairs Get topk property value pairs based on maximum similarity of property values
Computing Consensus Partition Minimize disagreements between individual partitions In our approach, using symmetric difference distance Maximizing NP-complete
Computing Consensus Partition Approximation algorithm clustering-based Compute a partition on the union of individual partitions’ domains first initialize a similarity matrix Mtrx=( ij ) begin with each URI forming an equivalence class separately for each class pair (i, j) , where > 0, merge together classes i,j , and update Mtrx
Computing Consensus Partition
Evaluation Build link between NYT and Dbpedia of OAEI benchmark 10 fold cross validation
Evaluation F-Measure
Evaluation Examination Choose 50 popular URIs from falcons Invite 10 people to resolve URIcoreference on the 50 URIs using SView In average, 290.1 times verification, 32.0 accepted as positive 53.9 pair of URIs in individual partitions
Evaluation User study SUS Vs sigma 72 vs 68
Conclusion Averaged Perceptron is feasible User involvement is reduced
Reference A. Ferrara, A. Nikolov, J. Noessner, and F. Schare. Evaluation of instance matching tools: the experience of OAEI. Journal of Web Semantics, 21:49-60, 2013. M. Collins. Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In Proc. of EMNLP, pages 1-8, 2002.