Type-directed Topic Segmentation of Entity Descriptions 龚赛赛 2015-03-16
Contents Background and Motivation Related Work Method Framework
Background The descriptions of an entity in Linked Data are usually about more than one topic e.g. family and academic info of a researcher Topic segmentation: split entity descriptions into a sequence of topically coherent segments Useful for many tasks, including entity browsing and entity summarization
Background Manually segmentation spends humans large amount of time and energy Several works leverage a variety of cues for automatic segmentation e.g., Property name relatedness, property value overlap, relatedness derived from property axioms, distributional relatedness
Motivation Existing works mainly follow the paradigm: firstly characterize relatedness between/among descriptions using various measures, and then derive segmentations by using clustering algorithms The performances rely heavily on the setting of the clustering algorithms and remain far from perfect
Motivation Entity types contain important cues for segmentation E.g. types of Arnold Schwarzenegger: person, politician, artist, body builder, etc. In our work, we propose a new approach to text segmentation for entity browsing, which use the entity types to guide segmentation Select a subset of entity types that are sensible, have a high coverage rate of descriptions, and allocate entity descriptions to these types to form the initial segmentations Split the large segmentations in the initial ones
Related work Various property relatedness measures Manually segmentation Haystack, Marble, Fresnel Spend users a lot of time and energy Various property relatedness measures E.g. Property name, wordnet based, wikipedia based, search engine based, value overlap, property axiom based, and so on Cues to determine segmentation Lloyd Rutledge et al1. mainly use property value overlap and formal concept analysis to generate FACES2 use topic segmentation to improve entity summarization Making RDF Presentable. www FACES: Diversity-Aware Entity Summarization using Incremental Hierarchical Conceptual Clustering. aaai
Method Framework Entity description: a property with its value set Segmentation: a set of descriptions Given a set of entity descriptions of an entity , get a sequence of segmentations
Method Framework Cover: a property’s domain is superclass of a type Sensible subset with high coverage The size of subset is limited, at most k A type covering suitable number of properties is preferred, i.e. 1/k A type in the deeper position of the type hierarchy is preferred Each property is allocated to a type covering it
Method Framework EBMC Property weight:1 The grade of membership: based on the distance of the property domain and the type
Method Framework Split large initial segmentations Linear combination of measures and clustering Measures: property name I-sub, wordnet relatedness, property value overlap, distributional relatedness Clustering algorithm: DBSCAN