OBO Foundry Principles BFO RO Barry Smith 1
OBO Foundry Principles open common formal language (OBO Format, OWL DL, CL) commitment to collaboration maintenance in light of scientific advance unique identifier space (Alan) naming conventions (Susanna / EBI) – metadata for changes versioning 2
OBO Foundry Principles common architecture (= RO + BFO) clearly delineated content (redundant – overlaps with orthogonality) the ontology is well-documented (– overlaps with rules for definitions; needs expanding, for developers, for users, minimal metadata) plurality of independent users single locus of authority, trackers, help desk 3
OBO Foundry Principles textual definitions plus formal definitions all definitions should be of the genus-species form, utilizing cross-products therefore: single is_a inheritance (= each ontology should be conceived as consisting of a core of asserted single inheritance with further is_a relations inferred) 4
Orthogonality For each domain, there should be convergence upon a single ontology that is recommended for use by those who wish to become involved with the Foundry initiative Compare what happens in other parts of science: for each domain, there should be convergence upon a single theory Preventing silos on the side of annotated data = preventing forking of the ontologies used for annotation 5
Strategy to ensure orthogonality If the Foundry already has an ontology O1 covering a domain D, and an outside group creates a second ontology O2 covering D (or part of D), we need to ask: – is it in every respect better? (then replace O1 with O2) – is it in some respects better? (then negotiate an improved synthesis, O3) ASSUMPTION: ontologies are always comparable PROBLEM: need better measures of ontology quality) 6
Benefits of orthogonality Offers a solution to the problem of silos that is – modular – incremental – empirically based – incorporates a strategy for motivating potential developers and users 7
Orthogonality = non-redundancy for the reference ontologies inside the Foundry CARO-Mammal will not be orthogonal to CARO IDO-Malaria will not be orthogonal to IDO IDO will not be orthogonal to DO DO will be orthogonal to CL 8
Absolute redundancy for application ontologies = all terms in application ontologies should be taken from orthogonal reference ontologies within the Foundry 9
Benefits of orthogonality Modularity brings benefits of division of labor, division of authority, minimizes redundancy 10
Benefits of orthogonality Scientists become motivated to commit themselves to developing an ontology falling within their domain of expertise because they themselves will need to use this ontology in their own work in the future. Forking would erode this motivation 11
Benefits of orthogonality Incrementality means that the strategy will still work even if ontologies are still only partial this allows adoption and application at early stages 12
Benefits of orthogonality Empirically based means that we can always go back and start again if some ontology module does not work (compare the problem of non-modular approaches like SNOMED CT, where it is all or nothing) 13
Benefits of orthogonality Modularity brings ownership, motivates on scientist-developers to commit themselves long term to developing the ontology This in turn motivates users to commit themselves to adoption – they see strong positive network effects from use of the ontology) – they gain reassurance from long-term commitment 14
Benefits of orthogonality It helps those new to ontology who need to know where to look in finding an ontology relating to their subject-matter it obviates the need for ‘mappings’ between ontologies, which are – difficult to create and use – error-prone – hard to keep up-to-date when mapped ontologies change 15
Benefits of orthogonality modularity (orthogonality) ensures the mutual consistency of ontologies, and thereby also the additivity of the annotations created with their aid by different groups of annotators describing common bodies of data. thereby contributes to the cumulativity of science and allows new forms of unmanaged collaboration. 16
Benefits of orthogonality brings grave responsibilities to those in charge of ensuring for each domain that the Foundry includes an ontology for that domain they must commit to perpetual striving for scientific accuracy and domain-completeness in their work orthogonality rules out the sorts of simplification and partiality which may be acceptable under more pluralistic regimes 17
Benefits of orthogonality it supports the strategy of utilizing cross- products in composing terms and definitions this strategy will work only if we can – minimize the degree of arbitrariness involved in selecting the terms to be composed – and thereby maximize the degree to which the Foundry ontologies are networked together through the cross-product links 18
Misunderstandings of Orthogonality Orthogonality does not mean that all ontologies must be developed within the Foundry framework We welcome the development of competing approaches to open-access ontology development – which can only make the Foundry stronger 19
Problems with Orthogonality what if researchers need purpose-built ontologies to meet their own specific needs? OBO Foundry provides orthogonal reference ontologies, so that they can as far as possible build their application ontologies using terms composed as cross-products thereby avoid silos and contributing new terms back to the Foundry in case of need 20
Problems with Orthogonality For each domain, there should be convergence upon a single ontology that is recommended for use by those who wish to become involved with the Foundry initiative Q: WHAT DOES ORTHOGONALITY MEAN? minimally: two ontologies are not orthogonal if they share a single term with the same meaning Q: WHAT DOES DOMAIN MEAN? 21
22 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Initial OBO Foundry Reference Ontologies (jigsaw)
Homesteading Recommendation: Ontology developers should register their claim on territory not yet unoccupied, as soon as possible, because the Foundry is designed to serve as an attractor for collaboration 23
24 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Orthogonality = Westphalian principles of national sovereignty for reference ontologies no shared territory
Varieties of application ontology cross-border national parks Slims Fractal ontologies Cross-product ontologies – Template ontologies (CARO, IDO, GDO …) 25
26 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) cross-border national parks: an ontology for studying the effects of viral infection on cell function in shrimp
27 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Slims = an ontology of dendritic cells
28 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Slims = an ontology of dendritic cells, with definitions composed using terms from other ontologies
29 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) fractal ontologies, employing small portions of many ontologies (e.g. MSO Multiple Sclerosis Ontology)
30 CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RNAO, PRO) Molecular Function (GO) Molecular Process (GO) rationale of OBO Foundry coverage + BFO GRANULARITY RELATION TO TIME
31 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) types plus instances
Continuants (aka endurants) –have continuous existence in time –preserve their identity through change –exist in toto whenever they exist at all Occurrents (aka processes) –have temporal parts –unfold themselves in successive phases –exist only in their phases Fundamental Dichotomy
Functions are continuants Functionings are occurrents
ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (placehol der) Phenotypic Quality (PATO) Disease (DO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE (ChEBI, SO, RNAO, PRO) Molecular Function (GO) Molecular Process (GO) Biomedical Investigations (OBI)
ORGAN AND ORGANISM Organism (NCBI Taxonomy / placeholder) Anatomical Entity (FMA, CARO) Organ Function (placehold er) Phenotypic Quality (PATO) Disease (DO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE (ChEBI, SO, RNAO, PRO) Molecular Function (GO) Molecular Process (GO)
ORGAN AND ORGANISM Organism (NCBI Taxonomy / placeholder) Anatomical Entity (FMA, CARO) Organ Function (placehold er) Phenotypic Quality (PATO) Disease (DO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Pathology ???? MOLECULE (ChEBI, SO, RNAO, PRO) Molecular Function (GO) Molecular Process (GO)
ORGAN AND ORGANISM Organism (NCBI Taxonomy / placeholder) Anatomical Entity (FMA, CARO) Organ Function (placehold er) Phenotypic Quality (PATO) Disease (DO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function ???? (GO???) Cellular Pathology ???? MOLECULE (ChEBI, SO, RNAO, PRO) Molecular Function (GO) Molecular Process (GO)
ORGAN AND ORGANISM Organism (NCBI Taxonomy / placeholder) Anatomical Entity (FMA, CARO) Organ Function (placehold er) Phenotypic Quality (PATO) Disease (DO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE (ChEBI, SO, RNAO, PRO) Molecular Function (GO) Molecular Process (GO)
ORGAN AND ORGANISM Organism (NCBI Taxonomy / placeholder) Anatomical Entity (FMA, CARO) Organ Function (placehold er) Phenotypic Quality (PATO) Disease (DO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE 2- and 3-D Structure (RNAO) (PRO) Molecular Function (GO) Molecular Process (GO) Small Molecule (ChEBI) 1-D Sequence (SO)
ORGAN AND ORGANISM Organism (NCBI Taxonomy / placeholder) Anatomical Entity (FMA, CARO) Organ Function (placehold er) Phenotypic Quality (PATO) Disease (DO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE 2- and 3-D Structure (RNAO) (PRO) Molecular Function (GO) Molecular Process (GO) ????? Small Molecule (ChEBI) 1-D Sequence (SO) Molecular Pathway
ORGAN AND ORGANISM Organism (NCBI Taxonomy / placeholder) Anatomical Entity (FMA, CARO) Organ Function (placehold er) Phenotypic Quality (PATO) Disease (DO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE 2- and 3-D Structure (RNAO) (PRO) Molecular Function (GO) Phenotypic Quality of Molecule ???? Molecular Process (GO) ????? Small Molecule (ChEBI) 1-D Sequence (SO) Reactome
Orthogonality can be preserved by expanding the territory (land reclamation) 42
43 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) GO already started to deal with biological processes involving multiple organisms
44 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Family, Community, Deme, Population Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO)
45 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO)
46 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cell Com- ponent (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) E N V I R O N M E N T
47 RELATION TO TIME GRANULARITY CONTINUANT INDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Environment of population ORGAN AND ORGANISM Organism (NCBI Taxonomy) (FMA, CARO) Environment of single organism CELL AND CELLULAR COMPONENT Cell (CL) Cell Com- ponent (FMA, GO) Environment of cell MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular environment E N V I R O N M E N T
48 RELATION TO TIME GRANULARITY CONTINUANT INDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Environment of population ORGAN AND ORGANISM Organism (NCBI Taxonomy) (FMA, CARO) Environment of single organism* CELL AND CELLULAR COMPONENT Cell (CL) Cell Com- ponent (FMA, GO) Environment of cell MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular environment E N V I R O N M E N T * The sum total of the conditions and elements that make up the surroundings and influence the development and actions of an individual.
49 RELATION TO TIME GRANULARITY CONTINUANT INDEPENDENT COMPLEX OF ORGANISMS biome / biotope, territory, habitat, neighborhood,... work environment, home environment; host/symbiont environment;... ORGAN AND ORGANISM CELL AND CELLULAR COMPONENT extracellular matrix; chemokine gradient;... MOLECULE hydrophobic surface; virus localized to cellular substructure; active site on protein; pharmacophore... E N V I R O N M E N T
50 CONTINUANTOCCURRENT INDEPENDENTDEPENDENT Organism NCBI Taxonomy Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotyp ic Quality (PaTO) Biological Process (GO) Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Template ontologies (CARO, IDO, CL?) X Organism Taxonomy
51
The case of IDO Human Disease OntologyInfectious Disease Ontology unitary hierarchy with root node: human disease refers only to dependent realizable continuants draws terms from all BFO categories template exists in many copies: specializing to different hosts, pathogens, vectors, etc.
We have data TBDB: Tuberculosis Database, including Microarray data VFDB: Virulence Factor DB TropNetEurop Dengue Case Data ISD: Influenza Sequence Database at LANL PathPort: Pathogen Portal Project... 53
We need common controlled vocabularies to describe these data in ways that will assure comparability and cumulation What content is needed to adequately cover the infectious domain? –Host-related terms (e.g. carrier, susceptibility) –Pathogen-related terms (e.g. virulence) –Vector-related terms (e.g. reservoir, –Terms for the biology of disease pathogenesis (e.g. evasion of host defense) –Population-level terms (e.g. epidemic, endemic, pandemic, ) 54
We need to annotate this data to allow retrieval and integration of –sequence and protein data for pathogens –case report data for patients –clinical trial data for drugs, vaccines –epidemiological data for surveillance, prevention –... Goal: to make data deriving from different sources comparable and computable 55
IDO needs to work with Disease Ontology (DO) + SNOMED CT Gene Ontology Immunology Branch Phenotypic Quality Ontology (PATO) Protein Ontology (PRO) Sequence Ontology (SO)... 56
IDO provides a common template IDO works like CARO. It contains terms (like ‘pathogen’, ‘vector’, ‘host’) which apply to organisms of all species involved in infectious disease and its transmission Disease- and organism-specific ontologies then built as specifications of the IDO core 57
Proposed additions to list of OBO Foundry Principles INSTANTIABILITY: Terms in an ontology should correspond to instances in reality Even disposition terms correspond to instances in reality There are no absent nipples There are no cancelled studies
Proposed additions to list of OBO Foundry Principles INSTANTIABILITY: Terms in an ontology should represent types all of which have instances in reality types = what are described in textbooks instances = (roughly) what are described in data 59
Proposed additions to list of OBO Foundry Principles Ontologies consist of representations of types in reality – therefore, their terms should consist entirely of singular nouns (preferred terms blah blah) Ontologies should use singular nouns and noun phrases belonging to ordinary English as extended by technical terms already established in the relevant discipline – they should not use phrases like ‘EV-EXP-IGI’, no lab slang, no ellipses 60
Proposed additions to list of OBO Foundry Principles EVALUATION each ontology should be subject to evaluation (as far as possible quantitative): software (conversion OBO format OWL) specialist review (OWL natural language) when one version is used for a given purposes later versions should be applied to the same purpose and results compared 61
Proposed additions to list of OBO Foundry Principles each ontology should be built on the basis of BFO top-level distinctions (common top level): continuants vs. occurrents independent continuants (molecules, cells, organisms …) specifically dependent continuants (qualities, functions, roles …) generically dependent continuants (information artifacts, sequences …) 62