What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1
A good solution to the silo / stovepipe problem must be: modular incremental bottom-up (not all standards are equal) evidence-based (thoroughly tested) revisable and evolutionary incorporate a strategy for motivating potential developers and users cost effective work with existing ways of collecting data 2
You’re interested in which genes control heart muscle development 17,536 results 3
attacked time control Puparial adhesion Molting cycle hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Immune response Toll regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Microarray data shows changed expression of thousands of genes. How will you spot the patterns? 4
5 You’re interested in which of your hospital’s patient data is relevant to understanding how genes control heart muscle development
6 Lab / pathology data EHR data Clinical trial data Family history data Medical imaging Microarray data Model organism data Flow cytometry Mass spec Genotype / SNP data How will you spot the patterns? How will you find the data you need?
Controlled Vocabularies and Common Data Elements will provide a way to capture and represent some of this knowledge in a form that is usable by other clinicians and researchers by you yourself but not by computers 7
EPIC will provide a way to capture and represent some of this knowledge in a form that is usable by computers (somewhat) by you yourself but not by other clinicians and researchers 8
Ontologies will (prospectively) provide a way to capture and represent all this knowledge in a form that is usable by other clinicians and researchers by you yourself and by computers Ontologies provide semantic interoperability 9
Uses of ‘ontology’ in PubMed abstracts 10
11 By far the most successful: GO (Gene Ontology)
12
Definitions 13
Gene products involved in cardiac muscle development in humans 14
How does the Gene Ontology work? 15
1. It provides a controlled vocabulary contributing to the cumulativity of scientific results achieved by distinct research communities multi-national, multi-disciplinary, open source (if we all use kilograms, meters, seconds …, our results are callibrated) 16
17 2. It provides a tool for algorithmic reasoning
Hierarchical view representing relations between represented types 18
The massive quantities of annotations linking GO terms to gene products (proteins) is allowing a new kind of clinical research 19
Uses of GO in studies of pathways associated with heart failure development correlated with cardiac remodeling (PMID ) molecular signature of cardiomyocyte clusters derived from human embryonic stem cells (PMID ) contrast between cardiac left ventricle and diaphragm muscle in expression of genes involved in carbohydrate and lipid metabolism. (PMID ) immune system involvement in abdominal aortic aneurisms in humans (PMID ) 20
A value proposition for the clinical terminologies of the future using the GO terminology standard enables you to do better, fundable, translational research champions of good practice in terminology will test the tools and resources to the point where they will become reliable and easily usable by those who follow 21
GO is amazingly successful – but covers only three sorts of biological entities: –cellular components –molecular functions –biological processes and does not provide representations of disease-related phenomena 22
23 People are extending the GO methodology to other domains of biology and of clinical and translational medicine
24 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry
25 CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) initial OBO Foundry coverage, ontologies automatically semantically coupled GRANULARITY RELATION TO TIME
Jeff Rose “if you have the structure and the model correct you don’t have to do it all at once” start with some test disease domains: –Cardiovascular Gene Ontology –Infectious Disease Ontology –Congenital Heart Defect Ontology 26
OBO Foundry provides tested guidelines enabling new groups to develop the ontologies they need in ways which counteract forking and dispersion of effort an incremental bottoms-up approach to evidence-based terminology practices in medicine that is rooted in basic biology automatic web-based linkage between medical terminologies and biological knowledge resources 27
But there are multiple kinds of standardization for biomedical data, and they do not work well together Terminologies (SNOMED, UMLS) CDEs (Clinical research) Information Exchange Standards (HL7 RIM) LIMS (LOINC) MGED standards for microarray data, etc. top-down grid frameworks (caBIG) 28
29 most successful, thus far: UMLS Unified Medical Language System collection of separate terminologies built by trained experts massively useful for information retrieval and information integration UMLS Metathesaurus a system of post hoc mappings between overlapping source vocabularies developed according to different and sometimes conflicting standards
30 for UMLS local usage respected regimentation frowned upon cross-framework consistency not important no concern to establish consistency with basic science different grades of formal rigor, different degrees of completeness, different update policies, capricious policies for empirical testing
A good solution to the silo problem must be: modular incremental bottom-up evidence-based revisable incorporate a strategy for motivating potential developers and users 31
It is easier to write useful software if one works with a simplified model (“…we can’t know what reality is like in any case; we only have our concepts…”) This looks like a useful model to me (One week goes by:) This other thing looks like a useful model to him Data in Pittsburgh does not interoperate with data in Vancouver Science is siloed The standard engineering methodology
33 an analogue of the UMLS problem proliferation of tiny ontologies by different groups with urgent annotation needs
35 the solution establish common rules governing best practices for creating ontologies in coordinated fashion, with an evidence- based pathway to incremental improvement
36 a shared portal for (so far) 58 ontologies (low regimentation) NCBO BioPortal First step (2001)
37
OBO builds on the principles successfully implemented by the GO recognizing that ontologies need to be developed in tandem 38
The methodology of cross-products compound terms in ontologies to be defined as cross-products of simpler terms: E.g elevated blood glucose is a cross-product of PATO: increased concentration with FMA: blood and CheBI: glucose. = factoring out of ontologies into discipline- specific modules (orthogonality) 39
The methodology of cross-products enforcing use of common relations in linking terms drawn from Foundry ontologies serves to ensure that the ontologies are maintained and revised in tandem logically defined relations serve to bind terms in different ontologies together to create a network 40
41 The OBO Foundry Third step (2006)
42 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Building out from the original GO
43 CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) initial OBO Foundry coverage GRANULARITY RELATION TO TIME
44 CRITERIA opennness common formal language. collaborative development evidence-based maintenance identifiers versioning textual and formal definitions CRITERIA
Orthogonality = modularity one ontology for each domain no need for mappings (which are in any case too expensive, too fragile, too difficult to keep up-to-date as mapped ontologies change) everyone knows where to look to find out how to annotate each kind of data 45
46 COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the Basic Formal Ontology (BFO) CRITERIA
OBO Foundry provides guidelines (traffic laws) to new groups of ontology developers in ways which can counteract current dispersion of effort
48 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Building out from the original GO
49 CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) GRANULARITY RELATION TO TIME
Basic Formal Ontology continuant occurrent biological processes independent continuant cellular component dependent continuant molecular function
BFO: The Very Top continuant independent continuant dependent continuant quality function role disposition occurrent
function - of liver: to store glycogen - of birth canal: to enable transport - of eye: to see - of mitochondrion: to produce ATP - of liver: to store glycogen not optional; reflection of physical makeup of bearer
role optional: exists because the bearer is in some special natural, social, or institutional set of circumstances in which the bearer does not have to be
role - bearers can have more than one role person as student and staff member - roles often form systems of mutual dependence husband / wife first in queue / last in queue doctor / patient host / pathogen
role of some chemical compound: to serve as analyte in an experiment of a dose of penicillin in this human child: to treat a disease of this bacteria in a primary host: to cause infection
A good solution to the silo problem must be: modular incremental bottom-up evidence-based revisable incorporate a strategy for motivating potential developers and users 56
Because the ontologies in the Foundry are built as orthogonal modules which form an incrementally evolving network scientists are motivated to commit to developing ontologies because they will need in their own work ontologies that fit into this network users are motivated by the assurance that the ontologies they turn to are maintained by experts 57
More benefits of orthogonality helps those new to ontology to find what they need to find models of good practice ensures mutual consistency of ontologies (trivially) and thereby ensures additivity of annotations 58
More benefits of orthogonality it rules out the sorts of simplification and partiality which may be acceptable under more pluralistic regimes thereby brings an obligation on the part of ontology developers to commit to scientific accuracy and domain-completeness 59