Mapping the NCI Thesaurus and the Collaborative Inter-Lingual Index Amanda Hicks University of Florida HealthInsight Workshop, Oslo, Norway 20 May 2016
Overview Ontologies versus wordnets Inter-Lingual Index Future work to map National Cancer Institute Thesaurus to the Collaborative Inter-Lingual Index In collaboration with Francis Bond, Nanyan Technological University, Singapore Selja Seppälä, University Florida, USA 2
ONTOLOGIES VERSUS WORDNETS 3
Semantic Networks Words, concepts, or classes that are arranged in a network Provide a framework for machine readable meaning and context for what would otherwise be uninterpreted syntax Simultaneously logical objects and mathematical objects, subject to inference and graph theoretical analysis 4
Wordnets versus Ontologies Wordnets are semantic networks that represent how we use language. Meanings are stated in natural language definitions and relatively sparse semantic relations. The word ‘cat’ in context Ontologies are semantic networks that represent properties of things in the world. Meanings are encoded in logical form. What it is to be a cat 5
Comparative Strengths Wordnets NLP applications that require word sense discrimination Cross-lingual comparison of lexical categories Distance or related measures of concepts 6 Ontologies Provide a coherent, stable and unified frame of reference for the interpretation of concepts and specification of classes May support interoperability of data sets Support deductive reasoning over structured data
INTER-LINGUAL INDEX 7
Current State of Mapping Wordnets Wordnets exist for many languages. – 33 open wordnets in the Global Wordnet Grid Mapping often occurs through English WordNet. – English centric – English does not have a word for every concept. Some wordnets are mapped to each other directly. 8
Mapping wordnets to each other directly gets messy. Vossen, GlobalWordNet Conference,
The Collaborative Inter-Lingual Index (CILI) en es no pt 10
CILI Flat list of concepts with a persistent Semantic Web compliant IRI Synsets from wordnets mapped directly to the ILI IRI English WordNet 3.0, 3.1 and Dutch Open Wordnet currently mapped Unique English definitions are associated with each ILI to support mapping (but no English words or labels) Not imposed on linked wordnets Open, anyone can contribute 11
NCI THESAURUS AND CILI 12
National Cancer Institute Thesaurus (NCIt) An English medical reference terminology Definitions crafted by teams of medical experts and terminologists Covers vocabulary for clinical care, translational and basic research, and public information and administrative activities Widely used in biomedical and health informatics in the USA 13
Why map NCIt to CILI? Specialized terminology in CILI should be defined by subject matter experts, not linguists. There is currently no prototype for mapping specialized vocabulary to CILI. More resources may lead to improved formal semantics to be integrated with CILI. To support integration of health knowledge extracted from linguistically heterogeneous sources – Multi-lingual – Layperson/specialized vocab 14
“Patella” in WN and NCIt WordNet patella, kneecap, kneepan A small flat triangular bone in front of the knee that protects the knee joint Part_holonym – knee Hypernym – Sesamoid bone NCI T BONE, PATELLA A small flat triangular bone in front of the knee that articulates with the femur and protects the knee joint. subClassOf – Bone of the Lower Extremity – Short Bone Semantic Type – Anatomical Structure Additional synonyms Additional definitional knowledge: potentially useful for formalizing semantics Additional formal semantic information 15
Semantic Modeling One of the goals of CILI is to have ontologies that provide formal semantics for the indexed concepts. Different semantic resources encode different semantic information. NCIt can be used to enrich the common semantic model. 16
Our Planned Approach To the Project Map NCI Thesaurus to CILI Convert NCI Thesaurus to Lexical Markup Framework Partially automate mapping NCIt to CILI using string matching on WordNet synsets and NCIt names an similarity measures on definitions There will still be false negatives that will need to be identified by hand. Formalize the semantics of NCIt related CILIs with other ontologies. KYOTO BFO 17
… smokes hubbly-bubbly on weekends … Smoking status| L ILI Concept Use for Knowledge Integration 18
THANK YOU 19