Linking Multiple Ontologies: The OBO Foundry Approach Chris Mungall NIAID Cell Ontology Workshop May 2008.

Slides:



Advertisements
Similar presentations
Bridging GO, Uberon and multiple species specific anatomy ontologies.
Advertisements

Homology.
More than one way to dissect an animal Melissa Haendel ZFIN Scientific Curator.
Confessions/Disclaimers Ontologies and REDfly CARO SO OBO Foundry.
Representing Part Relationships Between Developing Structures.
On the Future of the NeuroBehavior Ontology and Its Relation to the Mental Functioning Ontology Barry Smith
Ontologies in the Fish Tank: Using the Zebrafish Anatomy Ontology with other OBO Ontologies to Annotate Expression and Phenotype Yvonne Bradford*, Ceri.
Application of OBO Foundry Principles in GO Chris Mungall Lawrence Berkeley Labs NCBO GO Consortium.
Automated tools to help construction of Trait Ontologies Chris Mungall Monarch Initiative Gene.
Iowa State University Animal Science Department Bioinformatics & Computational Biology Program - 01/16/06 1 Overview of Animal Trait Ontology and PATO.
1 An Ontology of Relations for Biomedical Informatics Barry Smith 10 January 2005.
The Role of Foundational Relations in the Alignment of Biomedical Ontologies Barry Smith and Cornelius Rosse.
1 Ontology in 15 Minutes Barry Smith. 2 Main obstacle to integrating genetic and EHR data No facility for dealing with time and instances (particulars)
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
FMA: a domain reference ontology Comments on Cornelius Rosse’s talk Anita Burgun WG6 meeting, Rome 29 Apr- 2 May 2005.
Abstract The Cell Ontology (CL) is a candidate OBO Foundry 1 ontology for the representation of in vivo cell types. As part of our work in redeveloping.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
1 Logical Tools and Theories in Contemporary Bioinformatics Barry Smith
Phenotype annotation using ontologies Chris Mungall (+ BS) Berkeley Bioinformatics and Ontologies Project (BBOP) National Center for Biomedical Ontology.
The RNA Ontology RNAO Colin Batchelor Neocles Leontis May 2009 Eckart, Colin and Jane In Cambridge.
How to Organize the World of Ontologies Barry Smith 1.
GO Ontology Editing Workshop: Using Protege and OWL Hinxton Jan 2012.
PATO An ontology for phenotypes. The development of PATO is the work of George Gkoutos, supported by the NCBO, working in Cambridge.
Integration of PRO and UniProtKB Amherst, NY May 16, 2013 Cathy H. Wu, Ph.D. PRO-PO-GO Meeting.
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations.
Ontological realism as a strategy for integrating ontologies Ontology Summit February 7, 2013 Barry Smith 1.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
OBOL Open Bio-Ontology Language GO Meeting Stanford Jan 2004.
Amo amos amot amomus amotis amont. Happy birthday Swiss-Prot Fortaleza August 2006.
Terry Meehan Scientific Curator Mouse Genome Informatics The Jackson Laboratory Logical Definitions for Hematopoietic Cell Terms.
Imports, MIREOT Contributors: Carlo Torniai, Melanie Courtot, Chris Mungall, Allen Xiang.
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
Principles and Practice of Ontology Development: Making Definitions Computable Chris Mungall LBL.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Gene Ontology Consortium
Cell Ontology 2.0 Elimination of multiple is_a inheritance through instantiation of relationships to terms in outside ontologies, such as the GO cellular.
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Ontology of Disease and the OBO Foundry Chris Mungall NCBO GO Nov 2006.
Alan Ruttenberg PONS R&D Task force Alan Ruttenberg Science Commons.
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NIFSTD Maryann Martone University of California, San Diego.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Ontological Foundations of Biological Continuants Stefan Schulz, Udo Hahn Text Knowledge Engineering Lab University of Jena (Germany) Department of Medical.
The “über-ontology” (Uberon) Melissa Häendel, Chris Müngall, George Gkoütos Cell Ontology Workshop May, 2010.
GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.
The Plant Ontology: Development of a Reference Ontology for all Plants Plant Ontology Consortium Members and Curators*: Laurel D.
To Boldly GO… Amelia Ireland GO Curator EBI, Hinxton, UK.
What is an Ontology? A representation of knowledge in a domain In theory Thomas Gruber (1993) “An ontology is a formal, explicit specification of a shared.
About ontologies Melissa Haendel. And who am I that I am giving you this talk? Melissa Haendel Anatomist, developmental neuroscientist, molecular biologist,
Anatomy Ontology Community Melissa Haendel. The OBO Foundry More than just a website, it’s a community of ontology developers.
Expanding species-specific anatomy ontologies to include the cell ontology Melissa Haendel (1), Ceri Van Slyke (1), Chris Mungall (2), Peiran Song (1),
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
Need for common standard upper ontology
+ From OBO to OWL and back again – a tutorial David Osumi-Sutherland, Virtual Fly Brain/FlyBase Chris Mungall – GO/LBL.
OBO Foundry Workshop 2009 Cell Ontology (CL) Preliminary review.
TRANSITION FROM BFO 1.1 TO BFO 2.0 (OWL FORMAT) Jie Zheng Department of Genetics University of Pennsylvania May 13 th, 2013.
Big Data that might benefit from ontology technology, but why this usually fails Barry Smith National Center for Ontological Research 1.
Basic Formal Ontology Barry Smith August 26, 2013.
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
The Gene Ontology Project
The Teleost Anatomy Ontology: computable evolutionary morphology for teleost fishes Wasila Dahdul University of South Dakota & National Evolutionary Synthesis.
Outline Motivation: data mining Ontologies and all-some relationships
Many GO terms are implicitly composite
Ontology in 15 Minutes Barry Smith.
ro.owl and shortcut relations
OBI – Standard Semantic
Ontology in 15 Minutes Barry Smith.
OBO Foundry Update: April 2010
Presentation transcript:

Linking Multiple Ontologies: The OBO Foundry Approach Chris Mungall NIAID Cell Ontology Workshop May 2008

Outline Introduction to ontologies –The OBO perspective –Case study in the Gene Ontology The OBO Foundry: goals and principles The OBO relation ontology Organization of ontologies in OBO Modularity –An example from CL Linking CL to the OBO Foundry

What is an ontology? A computable representation of some domain –What kinds of things exists –What are the relations that hold between them? Mitral valveAortic valve Heart Cavitated organ Cardiovascular System part_of is_a

Aspects of an ontology Identifiers –Uniquely identify a class / term E.g. CL: is ID for the term “ hematopoietic stem cell ” –Identifier metadata Terminological aspects –Names and synonyms/alternate labels CL: has “ hemopoietic progenitor cell ” as a related synonym and “ hemopoietic stem cell ” as exact synonym Logical aspects –Relations –Definitions Provenance

Some ontologies and their uses The Gene Ontology –Annotation of gene products –Analyzing high-throughput datasets Anatomical ontologies (including CL) –Experimental metadata –Image annotation –Indicating location of gene expression –Creating Phenotypic descriptions Others –NLP –Annotating information models –Database integration

Origins of OBO: The Gene Ontology (GO) 3 ontologies for annotating genes and gene products These ontologies are organised as a collection of related terms, constituting nodes in a graph –Gradually incorporating other logical axioms Ontology# terms# links Molecular function Biological process Cellular component

Annotation and GO GO Annotations: –Associations between genes and GO terms, with evidence –Met17 : “methionine metabolism” GO: ,000 genes and gene products have high quality annotations to GO terms –3.4m including automated predictions –66,000 publications curated Variety of analysis tools –

GO::TermFinder Sherlock et al GO and high-throughput biology: Over-representation of GO terms for gene sets

GO and the need for OBO GO terms implicitly reference kinds of entities outwith the scope of GO –Methionine biosynthesis –Neural crest cell migration –Cardiac muscle morphogenesis –Regulation of vascular permeability OBO was born from the need to create source ontologies for GO term ‘cross-products’ –Define composite classes in terms of simpler ones chemical cell anatomy quality

The Open Biomedical Ontologies (OBO) Foundry A collection of orthogonal reference ontologies in the biological/biomedical domain The OBO Foundry: Each is committed to an agreed upon set of principles governing best practices in ontology development

Some OBO ontologies Gene Ontology ChEBI - chemical entities OBI - investigations PATO, MP - phenotypes CL - cells ENVO - environment and habitat DO - Human diseases CARO - common anatomy FMA - human anatomy SO - sequence features Model organism anatomy –ZFA –Fly_anat –Dicty_anat –Mouse_anat –… OBO Relation Ontology

OBO Foundry: criteria, v1 Open Well-defined exchange format E.g. OBO or OWL Uses identifiers according to OBO ID policy Ontology Life-cycle / versioning Has clearly specified and delineated content Has unambiguous definitions Uses or extends relations in the OBO Relation Ontology Well documented Has a plurality of users (and a mail list & issue tracker) Developed collaboratively Orthogonal, modular

OBO Relation Ontology Edges can link nodes… –Within ontologies –Across ontologies The precise meaning of the relation is important –Relations have formal definitions –Rules for composing relations together –

Is_a X is_a Y –If something is an instance of X (at time t), then it is also an instance of Y (at t) Transitive –B1 B cell is_a B cell –B cell is_a lymphocyte –Therefore B1 B cell is_a lymphocyte

Part_of Instance level part_of relation is primitive Between classes: –X part_of Y : Every instance of X is part_of some instance of Y Paneth cell part_of intestine : YES Nucleus part_of Cell : YES Neuron part_of brain : NO –(there are some neurons that are part of others parts of the nervous system) Transitive –X part_of Y, Y part_of Z Therefore, X part_of Z

Has_part Instance level inverse of part_of X has_part Y –Every X has some Y as part –Cell has_part nucleus : NO –Nucleate erythrocyte has_part nucleus : YES

Develops_from X develops_from Y –Every instance of X was once a Y, or inherited a significant portion of its matter from a Y Example: erythrocyte develops_from reticulocyte Transitive –erythrocyte develops_from reticulocyte –reticulocyte develops_from orthochromatic erythroblast => –erythrocyte develops_from orthochromatic erythroblast

Transformation and derivation Develops_from relation can be refined into two cases: –Transformation_of X transformation_of Y : –Any instance of X was previously an instance of Y –Example: erythrocyte transformation_of reticulocyte –Derives_from X derives_from Y : –Holds between distinct instances where Y inherits matter from X Most OBO ontologies just use the develops_from relation

Other relations Inherence –Between a quality and an object –E.g. between a specific shape and a cell Participation –Between a process and an object –E.g. between a B cell and an immune process

Definitions state necessary and sufficient conditions Links in the ontology graph state necessary conditions for a class E.g. erythroid progenitor cell develops_from megakaryocyte erythroid progenitor –These characteristics may not be unique A definition should state necessary and sufficient conditions for a class –The characteristics must be unique to the defined class E.g. “progenitor cell that is committed to the erythroid lineage” Definition should be precise and (as far as possible) translated / translatable to logical computable form

Genus differentia definitions Of the form –An X is a G that D –G should be in the same ontology –D is discriminating characteristics that differentiate (in the classification sense) Xs from other Gs. Relations to terms in an ontology (the same ontology or a different one) Example: –A B cell is a lymphocyte that expresses an immunoglubulin complex

Orthogonality of ontologies No two ontologies should represent the same kind of entity –E.g. “B-cell” should only be represented in one ontology –Related entities should be coordinated across ontologies GO: “B-cell differentiation” Exceptions: –The term “cell” connects GO Cellular Component (cell parts) and CL (cells) Advantages: –Reduces redundancy and work –Easier to make the union consistent

oenocyte hepatocyte liver fat body glycogen glucose hepatic artery bile insulin obesity carbohydrate metabolism liver development increased circulating glucose level oenocyte differentiation hepatoma Some OBO terms..

oenocyte hepatocyte liver fat body glycogen glucose hepatic artery bile insulin obesity carbohydrate metabolism liver development increased circulating glucose level CHEBI FBbt CL PRO MA (mouse)(fly) FMA (adult human) MP (mammal phenotype) GO (biological process) oenocyte differentiation hepatoma DO

oenocyte hepatocyte liver fat body glycogen glucose hepatic artery bile insulin obesity carbohydrate metabolism liver development increased circulating glucose level CHEBI FBbt CL PRO MA (mouse)(fly) FMA (adult human) MP (mammal phenotype) GO (biological process) oenocyte differentiation hepatoma DO

oenocyte hepatocyte liver fat body glycogen glucose hepatic artery bile insulin obesity carbohydrate metabolism liver development increased circulating glucose level CHEBI FBbt CL PRO MA (mouse)(fly) FMA (adult human) MP (mammal phenotype) GO (biological process) oenocyte differentiation hepatoma DO How should we organize this?

Top-level organisation (BFO: Basic Formal Ontology) General categories –3D things (continuants) Independent –Cells, organs, molecules Dependent –Shapes, sizes, concentrations, … –4D things (processes) Processes Useful organisational principle for OBO is_a and part_of should not cross top level categories Levels of granularity (scale) –Population –Organism –Organ –Cell –Molecule part_of relations can cross levels

oenocyte hepatocyte liver fat body glycogen glucose hepatic artery bile insulin obesity carbohydrate metabolism liver development increased circulating glucose level CHEBI FBbt CL PRO MA (mouse)(fly) FMA (adult human) MP (mammal phenotype) GO (biological process) oenocyte differentiation hepatoma DO ObjectsQualities etcProcesses

The OBO Foundry can help with modular ontology design Biology is complex –So our ontologies will be complex –Multiple purposes –Multiple means of classifying Separate out different aspects –Modular approach –Avoid multiple inheritance (>1 is_a parent) Don’t over-use is_a Don’t cross aspects with is_a Make complex descriptions from simpler parts –Polyhierarchies arise from composition

Cysteine biosynthesis (trimmed) GO Tangled polyhierarchy

Cysteine biosynthesis (trimmed) Process axis

Cysteine biosynthesis (trimmed) Chemical structure axis

Cysteine biosynthesis (trimmed) ChEBI (trimmed)

Cysteine biosynthesis (trimmed) ChEBI (trimmed)

Cysteine biosynthesis (trimmed) ChEBI (trimmed)

Cysteine biosynthesis (trimmed) ChEBI (trimmed) We can do more than simply link terms: Cross-products (aka logical definitions, Computable genus- differentia definitions)

Cysteine biosynthesis (trimmed) ChEBI (trimmed) Cysteine biosynthesis GO: = a biosynthetic process GO: that results_in_creation_of cysteine CHEBI:13536 } genus differentia }

Cysteine biosynthesitic process = biosynthetic process that results_in_change_to cysteine results_in_change_to

Let the computer do the work.. Given cross-products, A reasoner can add all links Underlying representation is normalized

Example of is_a-overloading: OBO Cell Ontology (current) CL

Try not to assert too many is_a parents X CL

Reuse existing ontologies Non-is_a relation X ? CL GO Has function

How CL can use other OBO ontologies GO Cellular component –Mononuclear phagocyte –B cell (expresses immunoglubulin complex) GO Biological process –Photosynthetic cell PATO Qualities –Spiny neuron CHEBI Chemical entities –X secreting cell Anatomy Ontologies –CNS neuron Molecular function, PRO - CD4 positive cell

How CL is used by other ontologies OntologyExampleGenusDifferentia GO-BP T cell differentiationCell differentiation Results_in_acquisition_of_features_of T cell GO-CC Germ cell nucleusNucleus Part_of germ cell MP Abnormal macrophage morphology Abnormal morphology Inheres_in macrophage ZFA (zebrafish) erythrocyte In_organism Danio Has_part nucleus OBI DO (disease) OntologyExampleRelationship Fly anatomy R8 photoreceptor cell Part_of ommatidium

Results Biological process x CL cess_xp_cellhttp://wiki.geneontology.org/index.php?XP:biological_pro cess_xp_cell –Uncovered inconsistencies between GO and CL –Oenocyte differentiation is_a columnar/cuboidal epithelial cell differentiation MP x CL henotype_xphttp://wiki.geneontology.org/index.php/XP:mammalian_p henotype_xp –Resulted in various fixes to MP

OBD: Ontology Annotation Database

Summary The cell ontology is a representation of the types of cell that exist The OBO Foundry provides –Principles –A framework for connecting ontologies There are many points of coordination between CL and other OBO ontologies CL could benefit from the gradual introduction of a modular approach

The Gene Ontology; and beyond Curation of genes and gene products –Molecular function –Biological process –Cellular component GO Multiple databases using the same ontology

The Gene Ontology; and beyond Curation of genes and gene products –Molecular function –Biological process –Cellular component What about curation of other data types? –Expression, transcriptomics –Genetics, phenotypes and disease –Many others.. OBO –Open Bio-Ontologies –Arose partly in response to requirements outside scope of GO GO

Islands of biological data GO Anatomy ontologies Phenotype ontologies

Connecting the islands

Bada et al : GO to ChEBI Amino acid cross-products in GO:

GO approach is retrospective –Text based approaches to ‘decompose’ terms Obol Bada/Hunter –Born of necessity OBO did not exist when GO started –Hard work New ontologies should take the prospective approach –Separate out aspects from the outset –No heuristic parsing necessary

Prospective approach: Sequence Ontology Separate hierarchies created from the outset - cross-products made from the beginning

OBI: Ontology for Biomedical Investigations Successor to MGED/FuGO Represents the realm of investigations –Biomaterials –Equipment –Protocols –Data transformations Makes maximal use of OBO –PATO: –ChEBI: Primary representation language is OWL –Uses OWL translations at

Social Insect Behavior Ontology 4 distinct hierarchies –Anatomical entity –Behavior –Chemical entity –Species Links –derives_from, between chemical and anatomical entity Future plans –Submit chemical terms to ChEBI –Upper level behavior ontology?

Anatomy GO is relevant for all kingdoms of life Development of anatomical ontologies has been less coordinated –Cell & subcellular: one ontology applicable to all –Gross Anatomy: multiple ontologies Vertebrate: –MA + EMAP: Mouse –FMA: Human (adult) –EHDA: Human –ZFA: Zebrafish –TAO: teleost anatomy –XAO: Xenopus Invertebrate: –FBbt: Drosophila anatomy –Tick anatomy –Mosquito anatomy

Anatomy: Ongoing work CARO –Upper level shared anatomical ontology –Very general terms Teleost anatomy ontology –Broader than zebrafish anatomy ontology –Will include homology links Linking cells to gross anatomical entity –Purkinje cell part_of cerebellum –Spans ontologies (CL + ssAO) BIRNLex Stages and development poster talk

Using multiple ontologies: Pre vs post composition Complex descriptions (aka cross-products) can be composed from 2 or more terms –By ontology editors (pre) –By curators (post) Example: –Liver hyperplasia Precomposed phenotype ontology – MP: “liver hyperplasia” increased size of liver due to increased hepatocyte cell number Post-composition at time of genotype curation –PATO: “hyperplastic” –MA: “liver” Which strategy to choose?

Either strategy can be used Or mixed and matched –Caveat: Pre-composed terms must have computable definitions (cross-products) Currently created retrospectively Current progress : –MP (Mammalian Phenotype): 4136/5760 xp defs, partially vetted Caveat: species-specificity –WormPhenotype: 350/1569 xp defs –PlantTrait: 340/765 xp defs, partially vetted

Other ontologies Envo + GAZ –Environmental ontology and gazetteer –Habitats: Host (anatomy) Geographical features (eg hydrothermal vents) –Qualities, chemical entities BIRNLex Protein Ontology –Links to/from GO Complexes Functions of ancestral proteins

Envo-based annotation in Phenote

Technical consequences of modular approach Dependencies –Technical issues Dependence on network? Formats - converters –Social & management issues –Change and versioning Managing dependencies –Stable URLs for downloading ontologies in obo or owl –OBO Identifier policy

Conclusions Be modular –Distinct hierarchies –Avoid is_a overloading –Link to existing ontologies Rewards –Standards –Increases value of curated data –Reduces duplication of effort and maximises curation effort –Ontologies are long term infrastructure It’s worth getting them right

Learning more –National Center for Biomedical Ontology –Browse and search OBO –Coming soon: inter-ontology links –Principles and recommendations –Participation Mailing lists Trackers

Restructuring Cell.obo

OBO Cell Ontology Current version –Overloading of is_a hierarchy –Difficult to maintain –Leads to “true path” violations Refactoring –Replace is links with has_function –Keep main axis structure-based (but not religiously so)

For every term immediately under cell-by-function, we made a new function term propagation of genome to circulate to secrete to metabolise to contract Electrical absorption Barrier Motility Structural to accumulate stuff signaling (mitogenic) to die Defense Transport to photosynthesize to support Valve to fix nitrogen Also create grouping terms

Replaced is_a links to cell-by-function terms with has_function links to corresponding function terms

What do we do about the old cell-by-function terms? We can eliminate them.. OR we can support them, but infer the ‘tangled DAG’ Requires xp defs: –Nitrogen fixing cell = cell THAT has_function nitrogen-fixing

Future work / ongoing issues: Redundancy between cell functions & GO biological process? Cell-by-lineage

Synchronizing ssAOs and CL Fly_anat, zfa, plant_anat all represent cell types –Part_of links from cells to gross anatomy E.g. purkinje_cell part_of cerebellum Methodology –Xrefs from ssAOs to CL IDs –Treat as ss subtypes –Use reasoner to stay in sync – es-specific_anatomy_ontologies_with_CLhttp:// es-specific_anatomy_ontologies_with_CL –Examples:

Transformation_of Class-level relation between continuant types Transitive Relation between two classes, in which instances retain their identity yet change their classification by virtue of some kind of transformation. Formally: C transformation_of C' if and only if given any c and any t, if c instantiates C at time t, then for some t', c instantiates C' at t' and t' earlier t, and there is no t2 such that c instantiates C at t2 and c instantiates C' at t2

Derives_from Holds between continuants transitive Derivation on the instance level (*derives_from*) holds between distinct material continuants when one succeeds the other across a temporal divide in such a way that at least a biologically significant portion of the matter of the earlier continuant is inherited by the later We say that one class C derives_from class C' if instances of C are connected to instances of C' via some chain of instance- level derivation relations. Examples: –osteocyte derives_from osteoblast