Download presentation
Presentation is loading. Please wait.
Published byRoland Knight Modified over 9 years ago
1
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation Rensselaer Polytechnic Institute @MarshallXMamax7@rpi.edu
2
TWC Challenge and objective Vast amount of dark data are hidden in geoscience literature Illuminate the knowledge framework in documents –Entities and relationships Use knowledge bases to facilitate entity recognition and linking –Ontologies and vocabularies 2 Images from: sciencemag.org and gravity.com
3
TWC Approach An un-supervised collective inference approach –Link entity mentions in texts to entities in a knowledge base 3 Mention ExtractionContext Analysis Knowledge Base Surface Form Dictionary Document Graph Entity Mentions Document Candidate Retrieval Non-collective Ranking Collective Inference Final Entities Candidate Entities (Zheng et al., 2014)
4
TWC Extract entity mentions Mention Extraction –Uses publicly available name tagger and regular expressions to extract entity mentions 4 Entity Mentions Mention Extraction Document
5
TWC Retrieve entity candidates Surface form Textual appearance of entities / mentions Surface Form Dictionary: –Structure: f: a surface form {e1, e2, e3 …}: entities has that surface form e.g. Candidate Retrieval –Retrieve all entities with surface form similar to the mention’s surface form 5 Mention Extraction Entity Mentions Document Candidate Retrieval Candidate Entities Knowledge Base Surface Form Dictionary
6
TWC Non-collective ranking of candidate entities Pre-rank candidate entities retrieved from the knowledge base –An entropy-based non-collective approach Use properties and objects associated with the candidate entities –Assign entities with higher popularity a higher score 6 Mention Extraction Knowledge Base Surface Form Dictionary Entity Mentions Document Candidate Retrieval Candidate Entities Non-collective Ranking
7
TWC Collective inference of candidate entities Context Analysis –Sentence level: Terms appearing in a same sentence are related to each other –Paragraph level: Terms appearing in a same paragraph are related to each other Collective Approach –Analyze several mentions in a context simultaneously to determine the best reference entities –Both document graph and graph of candidate entities contain important contextual information about mentions and entities 7 Mention Extraction Knowledge Base Surface Form Dictionary Entity Mentions Document Candidate Retrieval Non-collective Ranking Candidate Entities Collective Inference Final Entities Context Analysis Document Graph
8
TWC A recent review article on entity linking studies Candidate entity generation –Name dictionary based techniques –Surface form expansion from the local document –Methods based on search engines Candidate entity ranking Unlinkable mention prediction 8 (Shen et al., 2015) Supervised ranking methods Unsupervised ranking methods Independent ranking methods Collective ranking methods Collaborative ranking methods
9
TWC Summary and Future Work Highlight –The work automatically identifies and links prominent entity mentions in unstructured texts to a knowledge base Future work –Semantic parsing: to improve the result of collective inference –Semantic reasoning: to improve the quality of linking Needs –Enrich the knowledge base: More ontologies and vocabularies in the field of Earth and environmental sciences 9 Thanks for listening @MarshallXMamax7@rpi.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.