TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.

Slides:



Advertisements
Similar presentations
An Ontology Creation Methodology: A Phased Approach
Advertisements

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Extracting Semantic Relationships Between Wikipedia Articles Lowell Shayn Hawthorne Suzette Stoutenburg Supervisor: Jugal Kalita University of Colorado.
TWC Why Data Science Matters Xiaogang (Marshall) Ma Tetherless World Constellation Rensselaer Polytechnic Institute
Complexity must become Linear or Decrease Smart data infrastructure: The sixth generation of mediation for data science Peter Fox 1
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Information Retrieval in Practice
TWC Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies Xiaogang (Marshall) Ma Tetherless World Constellation.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Some studies on Vietnamese multi-document summarization and semantic relation extraction Laboratory of Data Mining & Knowledge Science 9/4/20151 Laboratory.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
DBrev: Dreaming of a Database Revolution Gjergji Kasneci, Jurgen Van Gael, Thore Graepel Microsoft Research Cambridge, UK.
References: [1] [2] [3] Acknowledgments:
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Semantic Cyberinfrastructure for Knowledge and Information Discovery (SCiKID) Proposal Principle Investigator: Eric Rozell Tetherless World Constellation.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
TWC Ontology Development for Provenance Tracing in National Climate Assessment of the US Global Change Research Program Xiaogang Ma a, Jin Guang Zheng.
Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
VIVO Conference 2013 Panel on VIVO Use-Cases for Collaborative Science: From Researcher Networks to Semantic User Interfaces for Data Patrick West – Tetherless.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
TWC A use case-driven iterative method for building a provenance-aware GCIS ontology Xiaogang Ma a, Jin Guang Zheng a, Justin Goldstein b,c, Linyun Fu.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Automatically Labeled Data Generation for Large Scale Event Extraction
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Applications of Text Mining
Machine Learning Ali Ghodsi Department of Statistics
Ontology Evolution: A Methodological Overview
Stephan Zednik, Patrick West, Peter Fox Tetherless World Constellation
Thanks to Bill Arms, Marti Hearst
Stephan Zednik, Patrick West, Peter Fox Tetherless World Constellation
Data types and persistent identifiers in
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Text Mining & Natural Language Processing
Adoption of RDA DTR and PIT in the Deep Carbon Observatory Data Portal
Summarization for entity annotation Contextual summary
Extracting Information from Diverse and Noisy Scanned Document Images
Modeling Data Set Versioning Operations
Web archives as a research subject
Extracting Information from Diverse and Noisy Scanned Document Images
Topic: Semantic Text Mining
Presentation transcript:

TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation Rensselaer Polytechnic Institute

TWC Challenge and objective Vast amount of dark data are hidden in geoscience literature Illuminate the knowledge framework in documents –Entities and relationships Use knowledge bases to facilitate entity recognition and linking –Ontologies and vocabularies 2 Images from: sciencemag.org and gravity.com

TWC Approach An un-supervised collective inference approach –Link entity mentions in texts to entities in a knowledge base 3 Mention ExtractionContext Analysis Knowledge Base Surface Form Dictionary Document Graph Entity Mentions Document Candidate Retrieval Non-collective Ranking Collective Inference Final Entities Candidate Entities (Zheng et al., 2014)

TWC Extract entity mentions Mention Extraction –Uses publicly available name tagger and regular expressions to extract entity mentions 4 Entity Mentions Mention Extraction Document

TWC Retrieve entity candidates Surface form Textual appearance of entities / mentions Surface Form Dictionary: –Structure: f: a surface form {e1, e2, e3 …}: entities has that surface form e.g. Candidate Retrieval –Retrieve all entities with surface form similar to the mention’s surface form 5 Mention Extraction Entity Mentions Document Candidate Retrieval Candidate Entities Knowledge Base Surface Form Dictionary

TWC Non-collective ranking of candidate entities Pre-rank candidate entities retrieved from the knowledge base –An entropy-based non-collective approach Use properties and objects associated with the candidate entities –Assign entities with higher popularity a higher score 6 Mention Extraction Knowledge Base Surface Form Dictionary Entity Mentions Document Candidate Retrieval Candidate Entities Non-collective Ranking

TWC Collective inference of candidate entities Context Analysis –Sentence level: Terms appearing in a same sentence are related to each other –Paragraph level: Terms appearing in a same paragraph are related to each other Collective Approach –Analyze several mentions in a context simultaneously to determine the best reference entities –Both document graph and graph of candidate entities contain important contextual information about mentions and entities 7 Mention Extraction Knowledge Base Surface Form Dictionary Entity Mentions Document Candidate Retrieval Non-collective Ranking Candidate Entities Collective Inference Final Entities Context Analysis Document Graph

TWC A recent review article on entity linking studies Candidate entity generation –Name dictionary based techniques –Surface form expansion from the local document –Methods based on search engines Candidate entity ranking Unlinkable mention prediction 8 (Shen et al., 2015) Supervised ranking methods Unsupervised ranking methods Independent ranking methods Collective ranking methods Collaborative ranking methods

TWC Summary and Future Work Highlight –The work automatically identifies and links prominent entity mentions in unstructured texts to a knowledge base Future work –Semantic parsing: to improve the result of collective inference –Semantic reasoning: to improve the quality of linking Needs –Enrich the knowledge base: More ontologies and vocabularies in the field of Earth and environmental sciences 9 Thanks for listening