Download presentation
Presentation is loading. Please wait.
Published byBernice Booker Modified over 9 years ago
1
eye what kinds of things exist? what are the relationships between these things? ommatidium sense organeye disc is_a part_of develops from A biological ontology is: A machine interpretable representation of some aspect of biological reality
2
Three practical considerations Acknowledge the substantive issues of sociology and operations to build community Follow the principles of ontology construction hygiene Annotation is the rate-limiting step. It can be facilitated by improved ontologies
3
Drosophila Meeting the goal: Drawing inferences ABCD SP:1234SP:8723SP:19345 ? PMID:5555PMID:4444 B SP:48392 BC SP:48291SP:38921 Direct evidence Indirect evidence PMID:8976 PMID:9550 PMID:3924 Human Xenopus
4
Overview Following basic rules helps make better ontologies We will work through the principles- based treatment of relations in ontologies, to show how ontologies can become more reliable and more powerful
5
Why do we need rules for good ontology? Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error- checking) Unintuitive rules for classification lead to entry errors (problematic links) Facilitate training of curators Overcome obstacles to alignment with other ontology and terminology systems Enhance harvesting of content through automatic reasoning systems
6
Tactile sense Taction Tactition ? The Challenge of Univocity: People call the same thing by different names
7
Tactile sense Taction Tactition perception of touch ; GO:0050975 Univocity: GO uses 1 term and many characterized synonyms
8
= bud initiation The Challenge of Univocity: People use the same words to describe different things
9
Bud initiation? How is a computer to know?
10
= bud initiation sensu Dentists = bud initiation sensu yeasties = bud initiation sensu agronomists Univocity: GO adds “sensu” descriptors to discriminate among organisms
11
The Importance of synonyms for utility: How do we represent the function of tRNA? Biologically, what does the tRNA do? Identifies the codon and inserts the amino acid in the growing polypeptide Molecular_function Triplet_codon amino acid adaptor activity GO Definition: Mediates the insertion of an amino acid at the correct point in the sequence of a nascent polypeptide chain during protein synthesis. Synonym: tRNA
12
The Challenge of Positivity Some organelles are membrane-bound. A centrosome is not a membrane bound organelle, but it still may be considered an organelle.
13
The Challenge of Positivity: Sometimes absence is a distinction in a Biologist’s mind non-membrane-bound organelle GO:0043228 membrane-bound organelle GO:0043227
14
Positivity Note the logical difference between “non-membrane-bound organelle” and “not a membrane-bound organelle” The latter includes everything that is not a membrane bound organelle!
15
The Challenge of Objectivity: Database users want to know if we don’t know anything (Exhaustiveness with respect to knowledge) We don’t know anything about a gene product with respect to these We don’t know anything about the ligand that binds this type of GPCR
16
Objectivity How can we use GO to annotate gene products when we know that we don’t have any information about them? Currently GO has terms in each ontology to describe unknown An alternative might be to annotate genes to root nodes and use an evidence code to describe that we have no data. Similar strategies could be used for things like receptors where the ligand is unknown.
17
Single Inheritance GO has a lot of is_a diamonds Some are due to incompleteness of the graph Some are due to a mixture of dissimilar classes within the graph at the same level
18
Is_a diamond in GO Process behavior locomotory behaviorlarval behavior larval locomotory behavior
19
Is_a diamond in GO Function enzyme regulator activity GTPase regulator activity enzyme activator activity GTPase activator acivity
20
Is_a diamond in GO Cellular Component organelle non-membrane bound organelle intracellular organelle non-membrane bound intracellular organelle
21
Technically the diamonds are correct, but could be eliminated locomotory behaviorlarval behavior GTPase regulator activity enzyme activator activity non-membrane bound organelle intracellular organelle What do these pairs have in common?
22
They are all differentiated from the parent term by a different factor locomotory behaviorlarval behavior GTPase regulator activity enzyme activator activity non-membrane bound organelle intracellular organelle Type of behavior vs. what is behaving What is regulated vs. type of regulator Type of organelle vs. location of organelle
23
Insert an intermediate grouping term behavior locomotory behavior larval behavior larval locomotory behavior behavior of a thing descriptive behavior
24
Why insert terms that no one would use? behavior locomotory behavior larval behaviorrhythmic behavioradult behavior By the structure of this graph, locomotory behavior has the same relationship to larval behavior as to rhythmic behavior
25
Why insert terms that no one would use? behavior locomotory behavior larval behaviorrhythmic behavioradult behavior But actually, locomotory behavior/rhythmic behavior and larval behavior/adult behavior group naturally Descriptive behavior Behavior of a thing This type of single step differentiation of terms between levels would allow us to use distances between nodes and levels to compare similarity.
26
GO Definitions A definition written by a biologist: necessary & sufficient conditions written definition (not computable) Graph structure: necessary conditions formal (computable)
27
Relationships and definitions The set of necessary conditions is determined by the graph This can be considered a partial definition Important considerations: Placement in the graph- selecting parents Appropriate relationships to different parents True path violation
28
The importance of relationships Cyclin dependent protein kinase Complex has a catalytic and a regulatory subunit How do we represent these activities (function) in the ontology? Do we need a new relationship type (regulates)? Catalytic activity protein kinase activity protein Ser/Thr kinase activity Cyclin dependent protein kinase activity Cyclin dependent protein kinase regulator activity Molecular_function Enzyme regulator activity Protein kinase regulator activity
29
Relationships provide the computable definitions The definition of a class lower down in the hierarchy is provided by specifying the parent of the class together with the relevant differentia. Differentia tells us what marks out instances of the defined class within the wider parent class.
30
Structured definitions contain both genus and differentiae Essence = Genus + Differentiae neuron cell differentiation = Genus: differentiation (processes whereby a relatively unspecialized cell acquires the specialized features of..) Differentiae: acquires features of a neuron
31
Alignment of the Two Ontologies will permit the generation of consistent and complete definitions id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629] is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 GO Cell type New Definition + = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.
32
Alignment of the Two Ontologies will permit the generation of consistent and complete definitions id: GO:0001649 name: osteoblast differentiation synonym: osteoblast cell differentiation genus: differentiation GO:0030154 (differentiation) differentium: acquires_features_of CL:0000062 (osteoblast) definition (text): Processes whereby a relatively unspecialized cell acquires the specialized features of an osteoblast, the mesodermal cell that gives rise to bone Formal definitions with necessary and sufficient conditions, in both human readable and computer readable forms
33
Basis in Reality GO is designed by a consortium As long as egos don’t get in the way, GO represents universals rather than concepts Large-scale developments of the GO are a result of compromise Gene Annotators have a large say in GO content Annotators are experts in their fields Annotators constantly read the scientific literature But, since GO is representing a science, GO actually represents paradigms. Therefore, it is essential that GO is able to change!
34
Classes and Instances When should we create a new class as opposed to multiple annotations? When the the biology represents a universal principal. Receptor signaling protein tyrosine kinase activity does not represent receptor signaling protein activity and tyrosine kinase activity independently.
35
Consequences of inconsistencies Hard to synchronize manually can be automated currently requires text mining Inconsistent user-search results Problems likely to resurface with other ontologies embedded in the GO chemical, protein multiple species-specific anatomical ontologies OBO ontologies should inform GO
36
Combinatorial annotation Allow combinatorial annotations in appropriate context phenotypic annotation eg “tail fin placement ventralized at stage pharyngula:prim25” combinatorial GO annotation eg “activation of MAPK in mesenchymal cell” New AmiGO will allow for these annotations
37
Animal models Mutant Gene Mutant or missing Protein Mutant Phenotype Animal disease models
38
HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models
39
HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models
40
HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models
41
SHH -/+ SHH -/- shh -/+ shh -/-
42
Phenotype (clinical sign) = entity + attribute
43
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric
44
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic
45
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied
46
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied PATO: hypoteloric hypoplastic hypertrophied ZFIN: eye midface kidney +
47
Phenotype (clinical sign) = entity + attribute Anatomical ontology Cell & tissue ontology Developmental ontology Gene ontology biological process molecular function cellular component + PATO (phenotype and trait ontology)
48
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied Syndrome = P 1 + P 2 + P 3 (disease) = holoprosencephaly
49
Human holo- prosencephaly Zebrafish shh Zebrafish oep
50
OMIM gene ZFIN gene FlyBase gene FlyBase mut pub ZFIN mut pub mouseratSNO MED OMIM disease LAMB1lamb1LanB151539 - FECHfechFerro- chelatase 25229 Protoporphyria, Erythropoietic GLI2gli2aci3884122 - SLC4A1slc4a1CG81777719 Renal Tubular Acidosis, RTADR MYO7Amyo7ack8459316 Deafness; DFNB2; DFNA11 ALAS2alas2Alas1714 Anemia, Sideroblastic, X-Linked KCNH2kcnh2sei27312 - MYH6myh6Mhc1663112 Cardiomyopathy, Familial Hypertrophic; CMH TP53tp53p5364331911 Breast Cancer ATP2A1atp2a1Ca-P60A326111 Brody Myopathy EYA1eya1eya251546 Branchiootorenal Dysplasia SOX10sox10Sox100B11744 Waardenburg-Shah Syndrome
51
CB i O Bioinformatics Goals 1.Apply the ontology Software toolkit for “annotation” applications 2.Manage the data Database and interface to store & view annotations 3.Investigate and compare Linking human diseases to genetic models 4.Maintain Ongoing reconciliation of ontologies with annotations
53
Ontology requirements Ontologies attribute ontology GO anatomical ontologies cell species-specific Ontology integration technical semantic
54
Review of proposed (2003) EAV model A phenotype is described using an Entity-Attribute-Value triple similiar “slot-value” frame-based systems Entities are drawn from various OBO ontologies Attributes and Values in one ontology - PATO
55
Separation of concerns Not phenotypes: Genotype Environment Assay, measurement systems Images Association = Genotype Phenotype Environment Assay Phenotype = Entity Attribute Value Entity = OBOClassID Attribute | Value = PATOClassID schema:
56
2003 Pilot study Trial of EAV model on small collection of genotypes FlyBase ZFIN Genes were non-orthologous New curations - in progress orthologous genes with clinical relevance Use the same data model and exchange format?
57
Example data records GenotypeEntityAttributeValue npogutstructuredysplastic gutrelative sizesmall r210retinapatternirregular brainstructurefused
58
ZFIN schema extension: stages GenotypeStageEntityAttributeValue npoHatching:Pec-fingutstructuredysplastic Hatching:Pec-fingutrelative sizesmall r210Hatching:Long-pecretinapatternirregular Larval:Protruding-mouthbrainstructurefused
59
Stages Association = Genotype Phenotype Environment Assay Phenotype = Stage* Entity Attribute Value Entity = OBOClassID Stage = OBOAnatomicalStageClassID Attribute | Value = PATOClassID * means zero or more
60
Interpretation of the schema How should EAV data records be interpreted by a computer? What are the instances? Is the EAV schema just to improve database searching? Can it be used for meaningful cross- species comparisons?
61
What is the Entity slot used for? GenotypeEntityAttributeValue npogutstructuredysplastic gutrelative sizesmall r210retinapatternirregular brainstructurefused tm84d/v pattern formation qualitativeabnormal blood islandsrelative numbernumber increased Bsb[2]elongation of arista literal processarrested C-alpha[1D]adult behaviourbehavioral activityuncoordinated 2003 trial data: FB & ZFIN
62
What is the Entity slot used for? In practical terms: An ID from one of the following ontologies GO CC GO BP GO MF Species-specific anatomical ontology OBO Cell Or a cross-product e.g. acidification GO:0045851 which has_location OBORE L midgut FBbt00005383 [example from FBal0062296: Acidification in the midgut of homozygous larvae is often less than in wild-type larvae] But what does it mean in the context of an annotation?
63
Universals and particulars An ontology consist of universals (classes) fruitfly wing flight Experimental data generally concerns particulars (instances) that instantiate universals this particular wing of this particular fruitfly this particular fruitfly participating in this particular flight from here to there in annotation we often use a class ID as a proxy for an (unnamed) instance (or collection of instances) It is important to always keep this distinction in mind
64
What is an Entity? UBO - Upper Bio Ontology BFO - Basic Formal Ontology
65
What is the Attribute slot used for? GenotypeEntityAttributeValue npogutstructuredysplastic gutrelative sizesmall r210retinapatternirregular brainstructurefused tm84d/v pattern formation qualitativeabnormal blood islandsrelative numbernumber increased Bsb[2]elongation of arista literal processarrested C-alpha[1D]adult behaviourbehavioral activityuncoordinated 2003 trial data: FB & ZFIN
66
Attributes: a proposal Treat as Qualities a dependent entity a quality must have independent entity(s) as bearer the quality inheres_in the bearer Examples The particular shape of this ball The particular structure of this wing The particular length of this tail The particular rate of synaptic transmission between these two neurons
67
Attribute universals vs Attribute particulars In an EAV annotation, a PATO class ID typically serves as a proxy for an unnamed attribute instance Universals (classes) must always be defined in terms of their instances A Formal Theory of Substances, Qualities, and Universals Fabian NEUHAUS Pierre GRENON Barry SMITH
68
How is the value slot used? GenotypeEntityAttributeValue npogutstructuredysplastic gutrelative sizesmall r210retinapatternirregular brainstructurefused tm84d/v pattern formation qualitativeabnormal blood islandsrelative numbernumber increased Bsb[2]elongation of arista literal processarrested C-alpha[1D]adult behaviourbehavioral activityuncoordinated
69
Redundancy in EAV triples entityattributevalue finshapeirregular shape eyecolor hueblue mesenchymerelative thicknessthin brainstructurefused retinal cellsrelative orientationdisoriented
70
Proposal: EA model entityattributevalue finshapeirregular shape eyecolor hueblue mesenchymerelative thicknessthin brainstructurefused retinal cellsrelative orientationdisoriented
71
Proposal: EA model entityattribute finshapeirregular shape eyecolor hueblue mesenchymerelative thicknessthin brainstructurefused retinal cellsrelative orientationdisoriented
72
Association = Genotype Phenotype Environment Assay Phenotype = Stage* Entity Attribute Value Entity = OBOClassID Attribute = PATOVersion2ClassID Proposed schema modification
73
Monadic and relational attributes Monadic: the quality/attribute inheres in a single entity Relational: the quality/attribute inheres in two or more entities sensitivity of an organism to a kind of drug sensitivity of an eye to a wavelength of light can turn relational attributes into cross-product monadic attributes e.g. sensitivityToRedLight better to use relational attributes avoids redundancy with existing ontologies
74
Association = Genotype Phenotype Environment Assay Phenotype = Stage* Entity Attribute Entity* Entity = OBOClassID Attribute = PATOVersion2ClassID Incorporating relational attributes Example data record: Phenotype = “organism” sensitiveTo “puromycin”
75
Measurable attributes Some attributes are inexact and implicitly relative to a wild-type or normal attribute relatively short, relatively long, relatively reduced easier than explicitly representing: this tail length shorter-than ‘canonical mouse’ wild-type tail length Some attributes are determinable use a measure function unit, value, {time} this tail has length L measure(L, cm) = 2 Keep measurements separate from (but linked to) attribute ontology
76
Incorporating measurements Association = Genotype Phenotype Environment Assay Phenotype = Stage* Entity Attribute Entity* Measurement* Measurement = Unit Value (Time) Entity = OBOClassID Attribute = PATOVersion2ClassID Example data record: Phenotype = “gut” “acidic” Measurement = “pH” 5
77
Composite phenotype classes Mammalian phenotype has composite phenotype classes e.g. “reduced B cell number” Compose at annotation time or ontology curation time? False dichotomy Core 2 will help map between composite class based annotation and EA annotation
78
Interpreting annotations Annotations are data records typically use class IDs implicitly refer to instances How do we map an annotation to instances? Important for using annotations computationally
79
Interpreting annotations (1) What does an EA (or EAV) annotation mean? Annotation: Genotype=“FBal00123” E=“brain” A=“fused” presumed implied meaning: this organism has_part x, where x instance_of “brain” x has_quality “fused” or in natural language: “this organism has a fused brain” Various built-in assumptions
81
Interpreting annotations (II) What does this mean: annotation: Genotype=“FBal00123” E=“wing” A=“absent” using same mapping as annotation I: fly98 has_part x, where x instance_of “wing” x has_quality “absent” or in natural language: this fly has a wing which is not there ! What we really intend: NOT(this organism has_part x, where x instance_of “wing”)
82
Interpreting annotations (II) What does this mean: annotation: Genotype=“FBal00123” E=“wing” A=“absent” using same mapping as annotation I: this organism has_part x, where x instance_of “wing” x has_quality “absent” or in natural language: this fly has a wing which is not there ! What we really intend: this organism has_quality “wingless” “wingless” = the property of having count(has_part “wing”)=0
83
Are our computational representations intended to capture linguistic statements or reality?
84
Does this matter? Logical reasoners will compute incorrect results unless explicitly provided with specific rules for certain attributes such as “absent” What are the consequences? Basic search will be fine e.g. “find all wing phenotypes” But computers will not be able to reason correctly
85
Interpreting annotations (III) What does this mean: annotation: E=“digit” A=“supernumery” using same interpretation as annotation I: this organism has_part x, where x instance_of “digit” x has_quality “supernumery” or in natural language: this organism has a particular finger which is supernumery What we really intend: this person has_quality “supernumery finger” “supernumery finger” = the property of having count(has_part “digit”) > wild-type” !!!
86
Interpreting annotations (IV) What does this mean: annotation: Gt=“mp001” E=“brown fat cell” A=“increased quantity” using same mapping as annotation I: this organism has_part x, where x instance_of “brown fat cell” x has_quality “increased quantity” or in natural language: this organism has a particular brown fat cell which is increased in quantity What we really intend: this organism has_part population_of(“brown fat cell”) which has_quality increased size
87
Other use cases spermatocyte devoid of asters Homeotic transformations increased distance between wing veins Some vs all
88
Alternate perspectives process vs state regulatory processes: acidification of midgut has_quality reduced rate midgut has_quality low acidity development vs behavior wing development has_quality abnormal flight has_quality intermittent granularity (scale) chemical vs molecular vs cell vs tissue vs anatomical part
89
Summary Define attributes in terms of instances Evaluate proposed new schema measurement proposal relational attribute proposal Complexity trade-off create library of use cases Core2 will create tools to present user-friendly layer Alternate perspective annotations are useful
90
Ontologies must adapt over time Getting it right It is impossible to get it right the 1st (or 2nd, or 3rd, …) time. What we know about biology is continually growing This “standard” requires versioning. Improve Collaborate and Learn
91
Our Outlook “The needs of the many outweigh the needs of the few, or of the one.”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.