Download presentation
Presentation is loading. Please wait.
Published byAlexina Stokes Modified over 9 years ago
2
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo http://ontology.buffalo.edu/smith
3
Multiple kinds of data in multiple kinds of silos Lab / pathology data Electronic Health Record data Clinical trial data Patient histories Medical imaging Microarray data Protein chip data Flow cytometry Mass spec Genotype / SNP data 2
4
How to find your data? How to find other people’s data? How to reason with data when you find it? How to work out what data does not yet exist? 3
5
4 how solve the problem of data re-use to address NIH mandates? part of the solution must involve: standardized terminologies and coding schemes
6
5 NLM’s proposal: the Unified Medical Language System collection of separate terminologies built by trained experts useful for legacy information retrieval and information integration UMLS Metathesaurus a system of post hoc mappings between overlapping source vocabularies
7
6 SNOMED DEMONS U M L S
8
New York State Center of Excellence in Bioinformatics & Life Sciences R T U New York State Center of Excellence in Bioinformatics & Life Sciences R T U
9
8 for UMLS local usage respected regimentation frowned upon cross-framework consistency not important no concern to establish consistency with basic science different grades of formal rigor, different degrees of completeness, different update policies
10
In the olden days people measured lengths using inches, ulnas, perches, king’s feet, Swiss feet, leagues of Paris, etc., etc. 9
11
Data was not comparable 10 1 2
12
on June 22 1799 everything changed 11
13
we now have the International System of Units 12
14
UMLS Can we create something like the SI system of units for biomedical terminology? 13 1 2
15
Uses of ‘ontology’ in PubMed abstracts 14
16
15
17
By far the most successful: GO (Gene Ontology) 16
18
17
19
Hierarchical view of GO representing relations between represented types 18
20
Gene Ontology $100 mill. invested in literature and database curation using the Gene Ontology (GO) based on the idea of annotation over 11 million annotations relating gene products (proteins) described in the UniProt, Ensembl and other databases to terms in the GO multiple secondary uses – because the ontology was not built to meet one specific set of requirements 19
21
GO provides a controlled system of terms for use in annotating (describing, tagging) data multi-species, multi-disciplinary, open source contributing to the cumulativity of scientific results obtained by distinct research communities 20
22
where in the cell ? what kind of molecular function ? semantic annotation of data what kind of biological process? 21
23
natural language labels + definitions to make the data cognitively accessible to human beings and algorithmically accessible to computers 22
24
RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) OBO (Open Biomedical Ontology) Foundry proposal (Gene Ontology in yellow) 23
25
compare: legends for maps 24
26
ontologies are legends for data 25
27
ontologies are legends for databases MouseEcotope GlyProt DiabetInGene GluChem sphingolipid transporter activity 26
28
annotation using common ontologies yields integration of databases MouseEcotope GlyProt DiabetInGene GluChem Holliday junction helicase complex 27
29
annotation using common ontologies can support comparison of data 28
30
The goal: virtual science consistent (non-redundant) annotation cumulative (additive) annotation yielding, by incremental steps, a virtual map of the entirety of reality that is accessible to computational reasoning 29
31
This goal is realizable if we have a common ontology framework data is retrievable data is comparable data is error-checkable data is integratable data is capable of being reasoned with only to the degree that it is annotated using a common controlled vocabulary 30
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.