eye what kinds of things exist? what are the relationships between these things? ommatidium sense organeye disc is_a part_of develops from A biological ontology is: A machine interpretable representation of some aspect of biological reality
Following basic rules helps make better ontologies Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking) Unintuitive rules for classification lead to entry errors (problematic links) Facilitate training of curators Overcome obstacles to alignment with other ontology and terminology systems Enhance harvesting of content through automatic reasoning systems
Animal models Mutant Gene Mutant or missing Protein Mutant Phenotype Animal disease models
HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models
HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models
HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models
SHH -/+ SHH -/- shh -/+ shh -/-
Phenotype (clinical sign) = entity + attribute
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied PATO: hypoteloric hypoplastic hypertrophied ZFIN: eye midface kidney +
Phenotype (clinical sign) = entity + attribute Anatomical ontology Cell & tissue ontology Developmental ontology Gene ontology biological process molecular function cellular component + PATO (phenotype and trait ontology)
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied Syndrome = P 1 + P 2 + P 3 (disease) = holoprosencephaly
Human holo- prosencephaly Zebrafish shh Zebrafish oep
EA model entityattribute finshapeirregular shape eyecolor hueblue mesenchymerelative thicknessthin brainstructurefused retinal cellsrelative orientationdisoriented
Association = Genotype Phenotype Environment Assay Phenotype = Stage* Entity Attribute Value Entity = OBOClassID Attribute = PATOVersion2ClassID Proposed schema
Monadic and relational attributes Monadic: the quality/attribute inheres in a single entity Relational: the quality/attribute inheres in two or more entities sensitivity of an organism to a kind of drug sensitivity of an eye to a wavelength of light can turn relational attributes into cross-product monadic attributes e.g. sensitivityToRedLight better to use relational attributes avoids redundancy with existing ontologies
Association = Genotype Phenotype Environment Assay Phenotype = Stage* Entity Attribute Entity* Entity = OBOClassID Attribute = PATOVersion2ClassID Incorporating relational attributes Example data record: Phenotype = “organism” sensitiveTo “puromycin”
Measurable attributes Some attributes are inexact and implicitly relative to a wild-type or normal attribute relatively short, relatively long, relatively reduced easier than explicitly representing: this tail length shorter-than ‘canonical mouse’ wild-type tail length Some attributes are determinable use a measure function unit, value, {time} this tail has length L measure(L, cm) = 2 Keep measurements separate from (but linked to) attribute ontology
Incorporating measurements Association = Genotype Phenotype Environment Assay Phenotype = Stage* Entity Attribute Entity* Measurement* Measurement = Unit Value (Time) Entity = OBOClassID Attribute = PATOVersion2ClassID Example data record: Phenotype = “gut” “acidic” Measurement = “pH” 5
Composite phenotype classes Mammalian phenotype has composite phenotype classes e.g. “reduced B cell number” Compose at annotation time or ontology curation time? False dichotomy Core 2 will help map between composite class based annotation and EA annotation
Interpreting annotations Annotations are data records typically use class IDs implicitly refer to instances How do we map an annotation to instances? Important for using annotations computationally
Interpreting annotations (1) What does an EA (or EAV) annotation mean? Annotation: Genotype=“FBal00123” E=“brain” A=“fused” presumed implied meaning: this organism has_part x, where x instance_of “brain” x has_quality “fused” or in natural language: “this organism has a fused brain” Various built-in assumptions
Interpreting annotations (II) What does this mean: annotation: Genotype=“FBal00123” E=“wing” A=“absent” using same mapping as annotation I: fly98 has_part x, where x instance_of “wing” x has_quality “absent” or in natural language: this fly has a wing which is not there ! What we really intend: NOT (this organism has_part x, where x instance_of “wing”)
Interpreting annotations (II) What does this mean: annotation: Genotype=“FBal00123” E=“wing” A=“absent” using same mapping as annotation I: this organism has_part x, where x instance_of “wing” x has_quality “absent” or in natural language: this fly has a wing which is not there ! What we really intend: this organism has_quality “wingless” “wingless” = the property of having count(has_part “wing”)=0
Are our computational representations intended to capture linguistic statements or reality?
Does this matter? Logical reasoners will compute incorrect results unless explicitly provided with specific rules for certain attributes such as “absent” What are the consequences? Basic search will be fine e.g. “find all wing phenotypes” But computers will not be able to reason correctly
Interpreting annotations (III) What does this mean: annotation: E=“digit” A=“supernumery” using same interpretation as annotation I: this organism has_part x, where x instance_of “digit” x has_quality “supernumery” or in natural language: this organism has a particular finger which is supernumery What we really intend: this person has_quality “supernumery finger” “supernumery finger” = the property of having count(has_part “digit”) > wild-type” !!!
Interpreting annotations (IV) What does this mean: annotation: Gt=“mp001” E=“brown fat cell” A=“increased quantity” using same mapping as annotation I: this organism has_part x, where x instance_of “brown fat cell” x has_quality “increased quantity” or in natural language: this organism has a particular brown fat cell which is increased in quantity What we really intend: this organism has_part population_of(“brown fat cell”) which has_quality increased size
Other use cases spermatocyte devoid of asters Homeotic transformations increased distance between wing veins Some vs all
Alternate perspectives process vs state regulatory processes: acidification of midgut has_quality reduced rate midgut has_quality low acidity development vs behavior wing development has_quality abnormal flight has_quality intermittent granularity (scale) chemical vs molecular vs cell vs tissue vs anatomical part
Summary Define attributes in terms of instances Evaluate proposed new schema measurement proposal relational attribute proposal Complexity trade-off create library of use cases Core2 will create tools to present user-friendly layer Alternate perspective annotations are useful
Before: domain knowledge is embedded in the db schema Gene table RNA table Exon table Protein table
After: domain knowledge is embedded in the ontology feature table
Ontology driven db schema is less expensive to maintain The logical description and the physical database description of the biology are developed independently Therefore new biological knowledge will only require: Ontology changes: e.g. new terms GUI changes: display No schema changes No query changes No middleware changes
Database: UIDs serving as proxies for instances Step 1: Build an ontology that reflects reality Step 2: Data capture Step 3: Classify data using the ontology
Ontologies must adapt over time Getting it right It is impossible to get it right the 1st (or 2nd, or 3rd, …) time. What we know about biology is continually growing This “standard” requires versioning. Improve Collaborate and Learn
Image Ontologies A unified language for radiology information sources (e.g. teaching files, research data, and radiology reports). Will describe all the salient aspects of an imaging examination (e.g., modality, technique, visual features, anatomy, and pathology). Will emphasize adoption or linkage to established terminology and standards when possible, such as the ACR Index, SNOMED, the Unified Medical Language System (UMLS), the Fleischner Society Glossaries, and DICOM. Will be used to organize and retrieve radiology images. Matthew Fielding From RadLex to RadiO
Image Ontologies A common technology that will capture data from all of the major experimental systems generating biological data. Implementing it for gel electrophoresis, microarrays, fluorescence-activated cell sorting, mass spectrometry and optical microscopy. Coordinating with the Interoperable Informatics Infrastructure Consortium (I3C) Will be used to organize and interrogate these experimental data C. Forbes Dewey Experibase
Image Ontologies Bill Lorensen
Image Ontologies Linking databases created at multiple centers concerned with human disease and associated animal models. BIRN Ontology Task Force (OTF) reviews different ontological reference interpretations by its audience: anatomists, clinicians, genomics, pathologists, diagnosticians, and neurologists Using existing ontologies, tools, and formalisms wherever possible and extend them only as necessary. Any ontology work performed by BIRN should be aligned with other efforts and provided back to the maintainers Developing a set of ontologies that are approved for use and a set of policies and procedures for extensions William Bug Image Ontology Requirements
Image Ontologies What different approaches are available for spatial, temporal, and spatio-temporal representation and reasoning formalisms used in computer applications? What is the expressive power of those formalisms Formalizations for commonsense reasoning about space and time. Formalisms for the representation of vagueness Louis Goldberg On Reasoning with Images