Danyun Xu, Gong Cheng*, Yuzhong Qu

Slides:



Advertisements
Similar presentations
Generation of Referring Expressions: Managing Structural Ambiguities I.H. KhanG. Ritchie K. van Deemter University of Aberdeen, UK.
Advertisements

Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
WRITING RESEARCH PAPERS Puvaneswary Murugaiah. INTRODUCTION TO WRITING PAPERS Conducting research is academic activity Research must be original work.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Ontology Summarization Based on RDF Sentence Graph Written by: Xiang Zhang, Gong Cheng, Yuzhong Qu Presented by: Sophya Kheim.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Language Identification in Web Pages Bruno Martins, Mário J. Silva Faculdade de Ciências da Universidade Lisboa ACM SAC 2005 DOCUMENT ENGENEERING TRACK.
In Situ Evaluation of Entity Ranking and Opinion Summarization using Kavita Ganesan & ChengXiang Zhai University of Urbana Champaign
STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.
From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster,
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Playing GWAP with strategies - using ESP as an example Wen-Yuan Zhu CSIE, NTNU.
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
Text Summarization using Lexical Chains. Summarization using Lexical Chains Summarization? What is Summarization? Advantages… Challenges…
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Gleaning Types for Literals in RDF Triples with Application to Entity Summarization 1 Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis),
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Print Reference Resources Gathering the Information You Need.
GRAPHS AND CHARTS ..
An Efficient Algorithm for Incremental Update of Concept space
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Effects of User Similarity in Social Media Ashton Anderson Jure Leskovec Daniel Huttenlocher Jon Kleinberg Stanford University Cornell University Avia.
Assessing Students' Understanding of the Scientific Process Amy Marion, Department of Biology, New Mexico State University Abstract The primary goal of.
Text Based Information Retrieval
An Empirical Study of Learning to Rank for Entity Search
Presented by: Hassan Sayyadi
and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Summarizing Entities: A Survey Report
Generative Model To Construct Blog and Post Networks In Blogosphere
Social Knowledge Mining
Applying Key Phrase Extraction to aid Invalidity Search
Wikitology Wikipedia as an Ontology
A Schema and Instance Based RDF Dataset Summarization Tool
NJVR: The NanJing Vocabulary Repository
Gong Cheng, Yanan Zhang, and Yuzhong Qu
Presentation 王睿.
Liang Zheng and Yuzhong Qu
Extracting Semantic Concept Relations
Property consolidation for entity browsing
Introduction Task: extracting relational facts from text
[jws13] Evaluation of instance matching tools: The experience of OAEI
An Interactive Approach to Collectively Resolving URI Coreference
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
Qingxia Liu Interactive Hierarchical Tag Clouds for Summarizing Spatiotemporal Social Contents [ICDE 2014] Kang, Wei, Anthony KH Tung,
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
Towards Exploratory Relationship Search: A Clustering-Based Approach
Information Networks: State of the Art
Qingxia Liu A Generative Interpretation of RDF Dataset  and its Application in Summarization Qingxia Liu 2019/4/6.
Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru.
Summarization for entity annotation Contextual summary
Gong Cheng,Danyun Xu,Yuzhong Qu
Data Pre-processing Lecture Notes for Chapter 2
Embedding based entity summarization
Introduction Dataset search
Connecting the Dots Between News Article
Presentation transcript:

Danyun Xu, Gong Cheng*, Yuzhong Qu Generating and Characterizing Gold-Standard Entity Summaries: A Study of DBpedia Danyun Xu, Gong Cheng*, Yuzhong Qu

Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion

Introduction Why Entity-centric structured data: Google’s Knowledge Graph Entity summarization Lack gold-standard entity summaries in evaluation What Present and evaluate several algorithms for automatically generating (near-) gold-standard summaries Characterize the generated gold-standard summaries

Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion

Related Work Algorithms Evaluation Rank properties Rank features Intrinsic method Extrinsic method

Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion

Data Set Dbpedia English version of DBpedia 3.7(wiki.dbpedia.org/Downloads37) 42.3 million RDF triples, 3.77 million entities Class 10 classes Almost pairwise disjoint

Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion

Generating Gold-Standard Entity Summaries Basic Idea (Extended Abstracts) Automatically identifies the features of an entity that are mentioned in its textual abstract Algorithms Evaluation

Generating Gold-Standard Entity Summaries Algorithms Preprocess Remove “;” … and “the”… Split phrases: PopulatedPlace Populated Place Lowercase Optional stemming Identify SEQ: a sequence SET_ALL: a set, all the words SET_ANY: a set, any word

Generating Gold-Standard Entity Summaries Evaluation 10 entities from each class, each entity has more than 10 features Manually construct gold-standard entity summaries

Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion

Characterizing Gold-Standard Entity Summaries Lengths Preference for Properties Preference for Diverse Properties Preference for Property Pairs Preference for Property Values

Characterizing Gold-Standard Entity Summaries Length Set maximum length Length varies widely Ratio in a narrower range

Characterizing Gold-Standard Entity Summaries Preference for Properties Name length Popularity Variety

Characterizing Gold-Standard Entity Summaries Preference for Properties Name length Properties with short names are preferable

Characterizing Gold-Standard Entity Summaries Preference for Properties Popularity Web (Bing) Data set Properties frequently seen in the data set are considerably preferable, Web-based popularity of a property seems not a strong indicator of preference Data set Web

Characterizing Gold-Standard Entity Summaries Preference for Properties Variety “familyName” vs “gender” the variety and popularity of a property in the data set are equally effective indicators of preference

Characterizing Gold-Standard Entity Summaries Preference for Diverse Properties diversify a summary the number of distinct properties in the summary/the number of distinct properties in the original description the number of distinct properties/number of features in summary gold-standard entity summaries are highly diversified

Characterizing Gold-Standard Entity Summaries Preference for Property Pairs String Similarity Co-occurrence

Characterizing Gold-Standard Entity Summaries Preference for Property Pairs String Similarity string similarity is not an effective indicator Co-occurrence a pair of properties that frequently co-occur in the data set also tend to be selected, Web-based degree of co-occurrence is a notable indicator of preference

Characterizing Gold-Standard Entity Summaries Preference for Property Values informativeness confirm the effectiveness of selecting rarely seen features into a summary

Introduction Related Work Data Set Generating Gold-Standard Entity Summaries Characterizing Gold-Standard Entity Summaries Conclusion

Conclusion Contribution Shortage Future work can hardly be applied to another data set that provides no textual abstract Future work optimizing the algorithm to generate more natural summaries Explore other factors

Thanks!