Betsy L. Humphreys Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS NLM, NIH, HHS National Library of Medicine CENDI Staff Workshop CENDI Staff Workshop Knowledge Organization Systems: Current and Future Uses September 16, 2004
2 NLM “Knowledge Organization Systems” u Name and Series/Journal Authority Files u Library Materials Classification u Individual Controlled Vocabularies l MeSH, MedlinePlus Health Topics, NCBI Taxonomy, RxNorm clinical drug vocabulary u Unified Medical Language System (UMLS) Knowledge Sources l Metathesaurus – many vocabularies in a common, integrated format l Semantic Network l Lexicon l Associated tools
3 NLM “Knowledge Organization Systems” u Common Characteristics l Searchable on the Web, often interlinked with other NLM resources l Distributed in one or more electronic formats l Used within NLM for: n Information retrieval and display n Data creation n Natural language interpretation l Heavily used outside NLM for wide range of applications l Most built and maintained with custom systems
4
5
6 Medical Subject Headings (MeSH) u Structure of MeSH upgraded in 2000 l Descriptor Class – closely related concepts grouped to enhance retrieval l Concept – distinct meaning l Term – concept name
7 Known Translations of MeSH u In UMLS - Dutch, Finnish, French, German, Italian, Japanese, Portuguese, Russian, Spanish, Swedish u Other Complete Translations l Arabic, Chinese, Czech, Greek, Thai, Turkish u In Progress or Planned or Hoped For l Korean, Slovenian, Vietnamese, Lithuanian, Polish, Slovakian, Norwegian, Kiswahili
8 Coordinating Translations How? u Single Database - Web Interface u Add Language as a Term Property u Translated Terms added to Concept u Non-English Concepts added to Descriptor
9
10
11 Status of Use u Current Active Groups l German, French, Italian, Vietnamese u Groups Beginning Work with MTMS l Dutch, Finnish, Japanese, Polish, Slovakian u Groups Starting Soon l Czech, Portuguese, Korean, Norwegian, Russian, Spanish
12
13
14
15
16
17
18
19 The UMLS in practice u Database l Series of relational files u Interfaces l Web interface: Knowledge Source Server (UMLSKS) l Application programming interfaces (Java and XML-based) u Applications l lvg (lexical programs) l MetamorphoSys (installation and customization) l SOON: Metathesaurus browser The UMLS is not an end-user application
20 UMLS 3 components u Metathesaurus l Concepts l Inter-concept relationships u Semantic Network l Semantic types l Semantic network relationships u Lexical resources l SPECIALIST Lexicon l Lexical tools
21 Metathesaurus Source Vocabularies u 134 source vocabularies l 126 contributing concept names u 73 families of vocabularies l multiple translations (e.g., MeSH, ICPC, ICD-10) l variants (American-English equivalents, Australian extension/adaptation) l subsequent editions usually considered distinct families (ICD: 9-10; DSM: IIIR-IV) u Broad coverage of biomedicine u Common presentation (2004AB)
22 Metathesaurus Concepts u Concept(> 1M)CUI l Set of synonymous concept names u Term(> 3.8 M)LUI l Set of normalized names u String(> 4.3M)SUI l Distinct concept name u Atom(> 5.1M)AUI l Concept name in a given source (2004AB) A headache (source 1) A headache (source 2) S A Headache (source 1) A Headache (source 2) S L A Cephalgia (source 1) S L C
23 Metathesaurus Relationships u Symbolic relations:~9 M pairs of concepts u Statistical relations :~7 M pairs of concepts (co-occurring concepts) u Mapping relations:100,000 pairs of concepts u Categorization: Relationships between concepts and semantic types from the Semantic Network
24 Why you might care about the UMLS u Content with applicability outside of biomedicine u Tools generally useful in NLP, datamining u New Metathesaurus Rich Release Format l Potentially useful as format for distribution of any set of vocabularies/ontologies and for robust purpose- specific mappings between such systems l May well lead to development of a variety of tools that can output or ingest the format