Download presentation
Presentation is loading. Please wait.
1
Danyun Xu, Gong Cheng*, Yuzhong Qu
Generating and Characterizing Gold-Standard Entity Summaries: A Study of DBpedia Danyun Xu, Gong Cheng*, Yuzhong Qu
2
Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion
3
Introduction Why Entity-centric structured data: Google’s Knowledge Graph Entity summarization Lack gold-standard entity summaries in evaluation What Present and evaluate several algorithms for automatically generating (near-) gold-standard summaries Characterize the generated gold-standard summaries
4
Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion
5
Related Work Algorithms Evaluation Rank properties Rank features
Intrinsic method Extrinsic method
6
Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion
7
Data Set Dbpedia English version of DBpedia 3.7(wiki.dbpedia.org/Downloads37) 42.3 million RDF triples, 3.77 million entities Class 10 classes Almost pairwise disjoint
8
Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion
9
Generating Gold-Standard Entity Summaries
Basic Idea (Extended Abstracts) Automatically identifies the features of an entity that are mentioned in its textual abstract Algorithms Evaluation
10
Generating Gold-Standard Entity Summaries
Algorithms Preprocess Remove “;” … and “the”… Split phrases: PopulatedPlace Populated Place Lowercase Optional stemming Identify SEQ: a sequence SET_ALL: a set, all the words SET_ANY: a set, any word
11
Generating Gold-Standard Entity Summaries
Evaluation 10 entities from each class, each entity has more than 10 features Manually construct gold-standard entity summaries
12
Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion
13
Characterizing Gold-Standard Entity Summaries
Lengths Preference for Properties Preference for Diverse Properties Preference for Property Pairs Preference for Property Values
14
Characterizing Gold-Standard Entity Summaries
Length Set maximum length Length varies widely Ratio in a narrower range
15
Characterizing Gold-Standard Entity Summaries
Preference for Properties Name length Popularity Variety
16
Characterizing Gold-Standard Entity Summaries
Preference for Properties Name length Properties with short names are preferable
17
Characterizing Gold-Standard Entity Summaries
Preference for Properties Popularity Web (Bing) Data set Properties frequently seen in the data set are considerably preferable, Web-based popularity of a property seems not a strong indicator of preference Data set Web
18
Characterizing Gold-Standard Entity Summaries
Preference for Properties Variety “familyName” vs “gender” the variety and popularity of a property in the data set are equally effective indicators of preference
19
Characterizing Gold-Standard Entity Summaries
Preference for Diverse Properties diversify a summary the number of distinct properties in the summary/the number of distinct properties in the original description the number of distinct properties/number of features in summary gold-standard entity summaries are highly diversified
20
Characterizing Gold-Standard Entity Summaries
Preference for Property Pairs String Similarity Co-occurrence
21
Characterizing Gold-Standard Entity Summaries
Preference for Property Pairs String Similarity string similarity is not an effective indicator Co-occurrence a pair of properties that frequently co-occur in the data set also tend to be selected, Web-based degree of co-occurrence is a notable indicator of preference
22
Characterizing Gold-Standard Entity Summaries
Preference for Property Values informativeness confirm the effectiveness of selecting rarely seen features into a summary
23
Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion
24
Conclusion Contribution Shortage Future work
can hardly be applied to another data set that provides no textual abstract Future work optimizing the algorithm to generate more natural summaries Explore other factors
25
Thanks!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.