Trank: Ranking Entity Types Using the Web of Data Alberto Tonon1 Michele Catasta, Gianluca Demartini Philippe Cudré-Mauroux Karl Aberer
Motivation Entity-centric Type is import More than one type The most relevant types
Task Definition Task: entity type ranking Context types: Entity e, document d, types Te={t1,…tn} relevance to textual context ce from d <rdfs:type>, <owl:sameAs> Context types: Three paragraphs One paragraph Sentence Entity itself
Approaches
Approaches Entity-Centric Context-Aware Hierarchy-Based FREQ,WIKILINK,LABEL Context-Aware SAMETYPE,PATH Hierarchy-Based DEPTH,ANCESTORS,ANC_DEPTH
Experiments NYT,Feb 21- Mar 7 2013 A ground truth: crowdsource 128 articles: each 12 entities, average 10.2 types A ground truth: crowdsource relevant Relevance score
Experiments Evaluation Measures: 4 datasets MAP NDCG Cumulated gain-based evaluation of IR techniques 4 datasets 770 distinct entities Sentence : 419, average 32 words, 2.45 entities Paragraph: 339, average 66 words, 2,72 entities 3-Paragraph: 339, average 165 words, 11.8 entities
Results
Results TRank MapReduce 71TB 8 servers, 12 cores 2.33GB, 32GB
Conclusions Type hierarchy Regression modle Interaction among entities User impact
Thanks!