The OBO Foundry approach to ontologies and standards with special reference to cytokines Barry Smith ImmPort Science Talk / Discussion June 17, 2014.

Slides:



Advertisements
Similar presentations
Species-Neutral vs. Multi-Species Ontologies Barry Smith.
Advertisements

PRIME Program for Research on Immune Modeling and Experimentation PI: Stuart Sealfon, Mount Sinai School of Medicine.
On the Future of the NeuroBehavior Ontology and Its Relation to the Mental Functioning Ontology Barry Smith
Goal and Status of the OBO Foundry Barry Smith. 2 Semantic Web, Moby, wikis, crowd sourcing, NLP, etc.  let a million flowers (and weeds) bloom  to.
1 Introduction to Biomedical Ontology Barry Smith University at Buffalo
1 The OBO Foundry Towards Gold Standard Terminology Resources in the Biomedical Domain Thomas Bittner (based on a presentation by Barry Smith)
1 How Ontologies Create Research Communities Barry Smith
1 The OBO Foundry Barry Smith University at Buffalo
1 How Ontologies Create Research Communities Barry Smith University at Buffalo
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
1 The OBO Foundry 2 A prospective standard designed to guarantee interoperability of ontologies from the very start (contrast.
The Problem of Reusability of Biomedical Data OBO Foundry & HL7 RIM Barry Smith.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
National center for ontological research. Part One: The History of NCOR and ECOR Part Two: How to Establish JCOR: The Japanese Consortium.
Using Ontologies to Represent Immunological Networks Lindsay G. Cowell, Anne Lieberman, Anna Maria Masci Duke University Center for Computational Immunology.
1 Logical Tools and Theories in Contemporary Bioinformatics Barry Smith
The Future of Ontology in Buffalo Barry Smith 1.
Room for Lunch: Arlington Room Room for Evening Reception: Grand Prairie Room.
Why a Credit Card Number is Not a Number Barry Smith 1.
The RNA Ontology RNAO Colin Batchelor Neocles Leontis May 2009 Eckart, Colin and Jane In Cambridge.
1 BIOLOGICAL DOMAIN ONTOLOGIES & BASIC FORMAL ONTOLOGY Barry Smith.
1 The OBO Foundry Barry Smith Center of Excellence in Bioinformatics & Life Sciences, University at Buffalo IFOMIS, Saarland University
CoE Ontology Research Group (ORG) Barry Smith Center of Excellence in Bioinformatics and Life Sciences Ontology Research Group Department of Philosophy.
How to Organize the World of Ontologies Barry Smith 1.
Comprehensive Annotation System for Infectious Disease Data Alexander Diehl University at Buffalo/The Jackson Laboratory IDO Workshop /9/2010.
New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in Buffalo Part I: The Gene Ontology Barry Smith and Werner Ceusters.
1 How Ontologies Create Research Communities Barry Smith
NEUROSCIENCE INFORMATION FRAMEWORK antibodyregistry.org An antibody registry for biological sciences Illuminating dark data, one antibody at a time Image.
NEUROSCIENCE INFORMATION FRAMEWORK scicrun.ch/resources The case of the missing research resources Illuminating dark data, one antibody at a time Image.
Immune Cell Ontology for Networks (ICON) Immunology Ontologies and Their Applications in Processing Clinical Data June 11-13, Buffalo, NY.
Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015.
Limning the CTS Ontology Landscape Barry Smith 1.
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations.
The CROP (Common Reference Ontologies for Plants) Initiative Barry Smith September 13,
Ontology of Sensors: Some Examples from Biology
Ontological realism as a strategy for integrating ontologies Ontology Summit February 7, 2013 Barry Smith 1.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
Identifying research resources should be easy Anita Bandrowski, UCSD.
Intelligence Ontology A Strategy for the Future Barry Smith University at Buffalo
Ontology for General Medical Science Overview and OBO Foundry Criteria Albert Goldfain Blue Highway / University at Buffalo ICBO.
Imports, MIREOT Contributors: Carlo Torniai, Melanie Courtot, Chris Mungall, Allen Xiang.
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
Resurrecting SOWG BS, Baltimore, CTS Ontology Workshop April
1 How Ontologies Create Research Communities Barry Smith University at Buffalo
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
Leveraging Ontologies for Human Immunology Research Barry Smith, Alexander Diehl, Anna- Maria Masci Presented at Leveraging Standards and Ontologies to.
Alan Ruttenberg PONS R&D Task force Alan Ruttenberg Science Commons.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Introduction to Biomedical Ontology for Imaging Informatics Barry Smith, PhD, FACMI University at Buffalo May 11, 2015.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
How to integrate data Barry Smith. The problem: many, many silos DoD spends more than $6B annually developing a portfolio of more than 2,000 business.
2 3 where in the body ? where in the cell ?
Ontology and the Semantic Web Barry Smith August 26,
PRO and the NIF / ImmPort Antibody Registries Alexander Diehl Protein Ontology Workshop 6/18/14.
Need for common standard upper ontology
Towards a Top-Domain Ontology for Linking Biomedical Ontologies Holger Stenzhorn a,b Elena Beißwanger c Stefan Schulz a a Department of Medical Informatics,
New Foundations for Social Ontology Barry Smith 1.
Introduction to Biomedical Ontology for Imaging Informatics Barry Smith, PhD, FACMI University at Buffalo May 11, 2015.
OBO Foundry Workshop 2009 Cell Ontology (CL) Preliminary review.
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
Immunology Ontology Rho Meeting October 10, 2013.
OBO Foundry Principles BFO RO Barry Smith 1. OBO Foundry Principles  open  common formal language (OBO Format, OWL DL, CL)  commitment to collaboration.
Basic Formal Ontology Barry Smith August 26, 2013.
Immunology Ontology Workshop Buffalo, NY June 11-13, 2012.
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
Intelligence Ontology: A Strategy for the Future
Why do we need upper ontologies? What are their purported benefits?
OBO Foundry Principles
Ontology of biomedical investigations (OBI)
OBO Foundry Update: April 2010
Presentation transcript:

The OBO Foundry approach to ontologies and standards with special reference to cytokines Barry Smith ImmPort Science Talk / Discussion June 17, 2014

A common problem: Use of terminology in immunology research is poorly standardized Two approaches to this problem: text-mining, creation of synonym lists ontology-based standardization Both have the same goal: common IDs for entity types described in immunology literature and data Thesis: the two approaches can benefit from collaboration within ImmPort

Synonym lists: three examples NIF Antibody Registry ImmPort Antibody Registry Cytokine Names and Synonyms List

ImmPort Science talk, 1/16/2014 Anita Bandrowski The NIF Antibody Registry, fighting dark data, one antibody at a time

The problem Paz et al, J Neurosci, 2010 with thanks to Anita Bendrowski

Now, go find the antibody Nov 12, 2010 with thanks to Anita Bendrowski

Classes of problems Insufficient data –Which of the antibodies in the catalog was used in the research reported in the paper? Time dependency of data –only some antibodies are listed in the catalog, others that were sold last year no longer listed Vendor transition –If the vendor goes out of business tomorrow, will anyone be able to reproduce the findings in the paper? Text mining tools of little use under these conditions with thanks to Anita Bendrowski

To solve these problems Hitherto: authors have identified antibodies by means of company name, city and state Authors need to change their ways and identify the antibodies themselves! Publishers, journal editors, funders need to change their ways by requiring such identification (bullying) But what does it mean to identify the antibodies themselves? with thanks to Anita Bendrowski

antibodyregistry.org Gather all available data from vendors Assign unique identifiers and keep them stable antibodyregistry.org/AB_12345 Remove redundant identifiers Propagandize to bring about a situation where authors use AB_ IDs in papers with thanks to Anita Bendrowski

Organisms Antibodies Software Tools How to extend this idea across all areas of immunology of importance to ImmPort? For example: cytokines

Problems to avoid redundancy – how many cytokine lists are being created? roach motel – how do we ensure the work we’ve done so far does not lose its value over time denetworking – A builds a protein list …; B builds a cytokine list – can’t reason from one to the other; can’t use data from one to help you build / validate the other forking – A builds a tissue list for cancer research, B builds a tissue list for enzyme research …

Thesis If the Antibody Registry is build against an ontology background, then these problems will be mitigated An ontology will tell us what it means ‘to identify the antibodies themselves’ But not just any ontology will do. Many ontologies are themselves based on term mining – and they just recreate the same problems (of redundancy, forking, …)

15 This holds for many of the ontologies in the NCBO Bioportal and in the UMLS Metathesaurus Each justifies its acceptance of high levels of redundancy by arguing that redundant entries will be linked by post-hoc mappings  but maintaining mappings is expensive  where the sources on either side of the mapping develop independently mappings are fragile and over time forking is inevitable

An alternative approach the OBO (Open Biomedical Ontologies) Foundry rooted in the Gene Ontology 16

By far the most successful: GO (Gene Ontology) 19

GO shares some of the features of the SI System of Units: it provides the base for coordination Gene Ontology (b. 1998) only covers three kinds of entity biological processes molecular functions cellular components How build on the success of the GO to other domains? 20

21

RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Building out from the GO (2005)

CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Hierarchical organization along two dimensions GRANULARITY RELATION TO TIME

The strategy for creating the OBO Foundry First: Establish a small number of starter ontologies built in tandem with each other around the GO = the initial OBO Library (GO, CL, …) Second: advertize the existence of this library as an attractor for re-use Third: invite others to join, but only if they accept certain principles 24

25 OBO Foundry Principles 1. Must be open 2. Formulated in a recognized formal syntax (OWL …) 3. Have a unique ID space. 4. Terms are logically defined. 5. Must be orthogonal to other Foundry ontologies to ensuring community convergence on a single controlled vocabulary for each domain

26 Principles (contd.) 6. Collaborative development (treaty negotations to resolve border disputes) 7. Consistent versioning principles based on permanent URLs 8. Common top-level architecture (BFO) 9. Locus of authority for editorial decisions, help desk, submission of term requests and error and bug reports

Anatomy Ontology (FMA*, CARO) Environment Ontology (EnvO) Disease, Disorder and Treatment (OGMS) Biological Process Ontology (GO*) Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Phenotypic Quality Ontology (PaTO) CHEBI Sequence Ontology (SO*) Molecular Function (GO*) Protein Ontology (PRO*) Strategy of Downward Population 27 top level mid-level domain level Information Artifact Ontology (IAO) Ontology for Biomedical Investigations (OBI) Basic Formal Ontology (BFO)

OGMS Cardiovascular Disease Ontology Genetic Disease Ontology Cancer Disease Ontology Genetic Disease Ontology Immune Disease Ontology Environmental Disease Ontology Oral Disease Ontology Infectious Disease Ontology IDO Staph Aureus IDO MRSA IDO Australian MRSA IDO Australian Hospital MRSA …

Minimum Information Checklists  MIBBI: ‘a common resource for minimum information checklists’ analogous to OBO / NCBO BioPortal  MIBBI Foundry: will create ‘a suite of self- consistent, clearly bounded, orthogonal, integrable checklist modules’ * Taylor, et al. Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project, Nature Biotechnology 26 (8),Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project

30 Transcriptomics (MIAME Working Group / MGED) Proteomics (Proteomics Standards Initiative) Metabolomics (Metabolomics Standards Initiative) Genomics and Metagenomics (Genomic Standards Consortium) In Situ Hybridization and Immunohistochemistry (MISFISHIE Working Group) Phylogenetics (Phylogenetics Community) RNA Interference (RNAi Community) Toxicogenomics (Toxicogenomics WG) Environmental Genomics (Environmental Genomics WG) Nutrigenomics (Nutrigenomics WG) Flow Cytometry (Flow Cytometry Community) MIBBI Foundry communities

cROP (Common Reference Ontologies for Plants)

How reproduce this approach to building registries to support terminology standardization in immunology research First, survey the field and create a draft list of primary targets – Antibodies – Cells – Proteins Cytokines – what else?

Principles for inclusion 1.Public domain. 2.Unique ID space. 3.Orthogonality with other registries. 4.Collaborative development. 5.Consistent versioning principles based on permanent URLs. 6.Include IDs from overarching ontologies. 7.Include IDs from official sources where they exist 8.Include immunology-relevant synonyms 9.Include PMIDs documenting usage 10.Establish locus of authority for editorial decisions, help desk, submission of term requests and error and bug reports 11.Include links to ImmPort Study Numbers 12.Advertise the existence of the registry as widely as possible

Problems to avoid redundancy – for each type of biological entity there must be at most one registry and one ID for that type roach motel – all researchers must be able to easily identify what registries exist; registries must provide a single target for bullying, bargaining, training; versioning denetworking – sibling types under a given parent are maintained together; representations of child types build on representations of parent types forking – registries must be maintained consistently over time under a single authority, the approach to registries must be consistent

The Cytokine Names and Synonyms List

Principles 1.Public domain. 2.Unique ID space. 3.Orthogonality with other registries. 4.Collaborative development. 5.Consistent versioning principles based on permanent URIs. 6.Include IDs from overarching ontologies. 7.Include IDs from official sources where they exist 8.Include immunology-relevant synonyms 9.Include PMIDs documenting usage 10.Establish locus of authority for editorial decisions, help desk, submission of term requests and error and bug reports 11.Include links to ImmPort Study Numbers 12.Advertise the existence of the registry as widely as possible

4. Collaborative Development Benefits already from interaction with the PRO – PRO cytokine representation improved – PRO helped to identify other cytokine lists How many cytokine synonym lists have been created so far?

4. Orthogonality with other registries

SUBDICTIONARIES: Angiogenesis | Apoptosis | CD Antigens | Cell lines | Eukaryotic cell types | Chemokines | CytokineTopics | Cytokine Concentrations in Body Fluids | Cytokines Interspecies Reactivities | Dual identity proteins | Hematology | Innate Immunity Defense Proteins | Metalloproteinases | Modulins | Protein domains | Regulatory peptide factors | Virokines | Viroceptors | Virulence Factors SUBDICTIONARIES:

5. Consistent versioning principles based on permanent URIs. “CID_6” needs to be part of a URI that points always to the most updated version – Cytokine pages? All previous versions should continue to exist and to be retrievable via version-specific URIs

6. Include IDs from overarching ontologies Consider developing an Inter-Cellular Signaling Ontology as a joint effort of Buffalo and Tel Aviv

ideally via reuse – as ImmPort Antibody ontology reuses the Reagent Ontology

7. Include IDs from official sources where they exist Recommendation: Use HGNC and MGI rather than Entrez Gene HGNC provides official names: including cytokines from Copewithcytokines resource, tumor necrosis factor superfamily cytokines, etc. MGI provides coordinated official nomenclature for mouse Protein names not standardized in the same way – PRO / UniProt collaboration on-going

12. Advertise the existence of the registry as widely as possible Give the artifact a name which will give people confidence that this is a target which will survive through time (e.g. ‘Registry’ rather than ‘List’) How to create a vehicle suitable for bullying publishers, editors, authors …?