Presentation is loading. Please wait.

Presentation is loading. Please wait.

Danyun Xu, Gong Cheng*, Yuzhong Qu

Similar presentations


Presentation on theme: "Danyun Xu, Gong Cheng*, Yuzhong Qu"— Presentation transcript:

1 Danyun Xu, Gong Cheng*, Yuzhong Qu
Generating and Characterizing Gold-Standard Entity Summaries: A Study of DBpedia Danyun Xu, Gong Cheng*, Yuzhong Qu

2 Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion

3 Introduction Why Entity-centric structured data: Google’s Knowledge Graph Entity summarization Lack gold-standard entity summaries in evaluation What Present and evaluate several algorithms for automatically generating (near-) gold-standard summaries Characterize the generated gold-standard summaries

4 Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion

5 Related Work Algorithms Evaluation Rank properties Rank features
Intrinsic method Extrinsic method

6 Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion

7 Data Set Dbpedia English version of DBpedia 3.7(wiki.dbpedia.org/Downloads37) 42.3 million RDF triples, 3.77 million entities Class 10 classes Almost pairwise disjoint

8 Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion

9 Generating Gold-Standard Entity Summaries
Basic Idea (Extended Abstracts) Automatically identifies the features of an entity that are mentioned in its textual abstract Algorithms Evaluation

10 Generating Gold-Standard Entity Summaries
Algorithms Preprocess Remove “;” … and “the”… Split phrases: PopulatedPlace Populated Place Lowercase Optional stemming Identify SEQ: a sequence SET_ALL: a set, all the words SET_ANY: a set, any word

11 Generating Gold-Standard Entity Summaries
Evaluation 10 entities from each class, each entity has more than 10 features Manually construct gold-standard entity summaries

12 Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion

13 Characterizing Gold-Standard Entity Summaries
Lengths Preference for Properties Preference for Diverse Properties Preference for Property Pairs Preference for Property Values

14 Characterizing Gold-Standard Entity Summaries
Length Set maximum length Length varies widely Ratio in a narrower range

15 Characterizing Gold-Standard Entity Summaries
Preference for Properties Name length Popularity Variety

16 Characterizing Gold-Standard Entity Summaries
Preference for Properties Name length Properties with short names are preferable

17 Characterizing Gold-Standard Entity Summaries
Preference for Properties Popularity Web (Bing) Data set Properties frequently seen in the data set are considerably preferable, Web-based popularity of a property seems not a strong indicator of preference Data set Web

18 Characterizing Gold-Standard Entity Summaries
Preference for Properties Variety “familyName” vs “gender” the variety and popularity of a property in the data set are equally effective indicators of preference

19 Characterizing Gold-Standard Entity Summaries
Preference for Diverse Properties diversify a summary the number of distinct properties in the summary/the number of distinct properties in the original description the number of distinct properties/number of features in summary gold-standard entity summaries are highly diversified

20 Characterizing Gold-Standard Entity Summaries
Preference for Property Pairs String Similarity Co-occurrence

21 Characterizing Gold-Standard Entity Summaries
Preference for Property Pairs String Similarity string similarity is not an effective indicator Co-occurrence a pair of properties that frequently co-occur in the data set also tend to be selected, Web-based degree of co-occurrence is a notable indicator of preference

22 Characterizing Gold-Standard Entity Summaries
Preference for Property Values informativeness confirm the effectiveness of selecting rarely seen features into a summary

23 Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion

24 Conclusion Contribution Shortage Future work
can hardly be applied to another data set that provides no textual abstract Future work optimizing the algorithm to generate more natural summaries Explore other factors

25 Thanks!


Download ppt "Danyun Xu, Gong Cheng*, Yuzhong Qu"

Similar presentations


Ads by Google