Presentation is loading. Please wait.

Presentation is loading. Please wait.

Eye  what kinds of things exist?  what are the relationships between these things? ommatidium sense organeye disc is_a part_of develops from A biological.

Similar presentations


Presentation on theme: "Eye  what kinds of things exist?  what are the relationships between these things? ommatidium sense organeye disc is_a part_of develops from A biological."— Presentation transcript:

1 eye  what kinds of things exist?  what are the relationships between these things? ommatidium sense organeye disc is_a part_of develops from A biological ontology is:  A machine interpretable representation of some aspect of biological reality

2 Three practical considerations  Acknowledge the substantive issues of sociology and operations to build community  Follow the principles of ontology construction hygiene  Annotation is the rate-limiting step. It can be facilitated by improved ontologies

3 Drosophila Meeting the goal: Drawing inferences ABCD SP:1234SP:8723SP:19345 ? PMID:5555PMID:4444 B SP:48392 BC SP:48291SP:38921 Direct evidence Indirect evidence PMID:8976 PMID:9550 PMID:3924 Human Xenopus

4 Overview  Following basic rules helps make better ontologies  We will work through the principles- based treatment of relations in ontologies, to show how ontologies can become more reliable and more powerful

5 Why do we need rules for good ontology?  Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error- checking)  Unintuitive rules for classification lead to entry errors (problematic links)  Facilitate training of curators  Overcome obstacles to alignment with other ontology and terminology systems  Enhance harvesting of content through automatic reasoning systems

6 Tactile sense Taction Tactition ? The Challenge of Univocity: People call the same thing by different names

7 Tactile sense Taction Tactition perception of touch ; GO:0050975 Univocity: GO uses 1 term and many characterized synonyms

8 = bud initiation The Challenge of Univocity: People use the same words to describe different things

9 Bud initiation? How is a computer to know?

10 = bud initiation sensu Dentists = bud initiation sensu yeasties = bud initiation sensu agronomists Univocity: GO adds “sensu” descriptors to discriminate among organisms

11 The Importance of synonyms for utility: How do we represent the function of tRNA? Biologically, what does the tRNA do? Identifies the codon and inserts the amino acid in the growing polypeptide Molecular_function Triplet_codon amino acid adaptor activity GO Definition: Mediates the insertion of an amino acid at the correct point in the sequence of a nascent polypeptide chain during protein synthesis. Synonym: tRNA

12 The Challenge of Positivity Some organelles are membrane-bound. A centrosome is not a membrane bound organelle, but it still may be considered an organelle.

13 The Challenge of Positivity: Sometimes absence is a distinction in a Biologist’s mind non-membrane-bound organelle GO:0043228 membrane-bound organelle GO:0043227

14 Positivity  Note the logical difference between  “non-membrane-bound organelle” and  “not a membrane-bound organelle”  The latter includes everything that is not a membrane bound organelle!

15 The Challenge of Objectivity: Database users want to know if we don’t know anything (Exhaustiveness with respect to knowledge) We don’t know anything about a gene product with respect to these We don’t know anything about the ligand that binds this type of GPCR

16 Objectivity  How can we use GO to annotate gene products when we know that we don’t have any information about them?  Currently GO has terms in each ontology to describe unknown  An alternative might be to annotate genes to root nodes and use an evidence code to describe that we have no data.  Similar strategies could be used for things like receptors where the ligand is unknown.

17 Single Inheritance  GO has a lot of is_a diamonds  Some are due to incompleteness of the graph  Some are due to a mixture of dissimilar classes within the graph at the same level

18 Is_a diamond in GO Process behavior locomotory behaviorlarval behavior larval locomotory behavior

19 Is_a diamond in GO Function enzyme regulator activity GTPase regulator activity enzyme activator activity GTPase activator acivity

20 Is_a diamond in GO Cellular Component organelle non-membrane bound organelle intracellular organelle non-membrane bound intracellular organelle

21 Technically the diamonds are correct, but could be eliminated locomotory behaviorlarval behavior GTPase regulator activity enzyme activator activity non-membrane bound organelle intracellular organelle What do these pairs have in common?

22 They are all differentiated from the parent term by a different factor locomotory behaviorlarval behavior GTPase regulator activity enzyme activator activity non-membrane bound organelle intracellular organelle Type of behavior vs. what is behaving What is regulated vs. type of regulator Type of organelle vs. location of organelle

23 Insert an intermediate grouping term behavior locomotory behavior larval behavior larval locomotory behavior behavior of a thing descriptive behavior

24 Why insert terms that no one would use? behavior locomotory behavior larval behaviorrhythmic behavioradult behavior By the structure of this graph, locomotory behavior has the same relationship to larval behavior as to rhythmic behavior

25 Why insert terms that no one would use? behavior locomotory behavior larval behaviorrhythmic behavioradult behavior But actually, locomotory behavior/rhythmic behavior and larval behavior/adult behavior group naturally Descriptive behavior Behavior of a thing This type of single step differentiation of terms between levels would allow us to use distances between nodes and levels to compare similarity.

26 GO Definitions A definition written by a biologist: necessary & sufficient conditions written definition (not computable) Graph structure: necessary conditions formal (computable)

27 Relationships and definitions  The set of necessary conditions is determined by the graph  This can be considered a partial definition  Important considerations:  Placement in the graph- selecting parents  Appropriate relationships to different parents  True path violation

28 The importance of relationships  Cyclin dependent protein kinase  Complex has a catalytic and a regulatory subunit  How do we represent these activities (function) in the ontology?  Do we need a new relationship type (regulates)? Catalytic activity protein kinase activity protein Ser/Thr kinase activity Cyclin dependent protein kinase activity Cyclin dependent protein kinase regulator activity Molecular_function Enzyme regulator activity Protein kinase regulator activity

29 Relationships provide the computable definitions  The definition of a class lower down in the hierarchy is provided by specifying the parent of the class together with the relevant differentia.  Differentia tells us what marks out instances of the defined class within the wider parent class.

30 Structured definitions contain both genus and differentiae Essence = Genus + Differentiae neuron cell differentiation = Genus: differentiation (processes whereby a relatively unspecialized cell acquires the specialized features of..) Differentiae: acquires features of a neuron

31 Alignment of the Two Ontologies will permit the generation of consistent and complete definitions id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629] is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 GO Cell type New Definition + = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

32 Alignment of the Two Ontologies will permit the generation of consistent and complete definitions id: GO:0001649 name: osteoblast differentiation synonym: osteoblast cell differentiation genus: differentiation GO:0030154 (differentiation) differentium: acquires_features_of CL:0000062 (osteoblast) definition (text): Processes whereby a relatively unspecialized cell acquires the specialized features of an osteoblast, the mesodermal cell that gives rise to bone Formal definitions with necessary and sufficient conditions, in both human readable and computer readable forms

33 Basis in Reality  GO is designed by a consortium  As long as egos don’t get in the way, GO represents universals rather than concepts  Large-scale developments of the GO are a result of compromise  Gene Annotators have a large say in GO content  Annotators are experts in their fields  Annotators constantly read the scientific literature But, since GO is representing a science, GO actually represents paradigms. Therefore, it is essential that GO is able to change!

34 Classes and Instances  When should we create a new class as opposed to multiple annotations?  When the the biology represents a universal principal. Receptor signaling protein tyrosine kinase activity does not represent receptor signaling protein activity and tyrosine kinase activity independently.

35 Consequences of inconsistencies  Hard to synchronize manually  can be automated  currently requires text mining  Inconsistent user-search results  Problems likely to resurface with other ontologies embedded in the GO  chemical, protein  multiple species-specific anatomical ontologies  OBO ontologies should inform GO

36 Combinatorial annotation  Allow combinatorial annotations in appropriate context  phenotypic annotation  eg “tail fin placement ventralized at stage pharyngula:prim25”  combinatorial GO annotation  eg “activation of MAPK in mesenchymal cell”  New AmiGO will allow for these annotations

37 Animal models Mutant Gene Mutant or missing Protein Mutant Phenotype Animal disease models

38 HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models

39 HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models

40 HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models

41 SHH -/+ SHH -/- shh -/+ shh -/-

42 Phenotype (clinical sign) = entity + attribute

43 Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric

44 Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic

45 Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied

46 Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied PATO: hypoteloric hypoplastic hypertrophied ZFIN: eye midface kidney +

47 Phenotype (clinical sign) = entity + attribute Anatomical ontology Cell & tissue ontology Developmental ontology Gene ontology biological process molecular function cellular component + PATO (phenotype and trait ontology)

48 Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied Syndrome = P 1 + P 2 + P 3 (disease) = holoprosencephaly

49 Human holo- prosencephaly Zebrafish shh Zebrafish oep

50 OMIM gene ZFIN gene FlyBase gene FlyBase mut pub ZFIN mut pub mouseratSNO MED OMIM disease LAMB1lamb1LanB151539 - FECHfechFerro- chelatase 25229 Protoporphyria, Erythropoietic GLI2gli2aci3884122 - SLC4A1slc4a1CG81777719 Renal Tubular Acidosis, RTADR MYO7Amyo7ack8459316 Deafness; DFNB2; DFNA11 ALAS2alas2Alas1714 Anemia, Sideroblastic, X-Linked KCNH2kcnh2sei27312 - MYH6myh6Mhc1663112 Cardiomyopathy, Familial Hypertrophic; CMH TP53tp53p5364331911 Breast Cancer ATP2A1atp2a1Ca-P60A326111 Brody Myopathy EYA1eya1eya251546 Branchiootorenal Dysplasia SOX10sox10Sox100B11744 Waardenburg-Shah Syndrome

51 CB i O Bioinformatics Goals 1.Apply the ontology  Software toolkit for “annotation” applications 2.Manage the data  Database and interface to store & view annotations 3.Investigate and compare  Linking human diseases to genetic models 4.Maintain  Ongoing reconciliation of ontologies with annotations

52

53 Ontology requirements  Ontologies  attribute ontology  GO  anatomical ontologies  cell  species-specific  Ontology integration  technical  semantic

54 Review of proposed (2003) EAV model  A phenotype is described using an Entity-Attribute-Value triple  similiar “slot-value” frame-based systems  Entities are drawn from various OBO ontologies  Attributes and Values in one ontology - PATO

55 Separation of concerns  Not phenotypes:  Genotype  Environment  Assay, measurement systems  Images Association = Genotype Phenotype Environment Assay Phenotype = Entity Attribute Value Entity = OBOClassID Attribute | Value = PATOClassID schema:

56 2003 Pilot study  Trial of EAV model on small collection of genotypes  FlyBase  ZFIN  Genes were non-orthologous  New curations - in progress  orthologous genes with clinical relevance  Use the same data model and exchange format?

57 Example data records GenotypeEntityAttributeValue npogutstructuredysplastic gutrelative sizesmall r210retinapatternirregular brainstructurefused

58 ZFIN schema extension: stages GenotypeStageEntityAttributeValue npoHatching:Pec-fingutstructuredysplastic Hatching:Pec-fingutrelative sizesmall r210Hatching:Long-pecretinapatternirregular Larval:Protruding-mouthbrainstructurefused

59 Stages Association = Genotype Phenotype Environment Assay Phenotype = Stage* Entity Attribute Value Entity = OBOClassID Stage = OBOAnatomicalStageClassID Attribute | Value = PATOClassID * means zero or more

60 Interpretation of the schema  How should EAV data records be interpreted by a computer?  What are the instances?  Is the EAV schema just to improve database searching?  Can it be used for meaningful cross- species comparisons?

61 What is the Entity slot used for? GenotypeEntityAttributeValue npogutstructuredysplastic gutrelative sizesmall r210retinapatternirregular brainstructurefused tm84d/v pattern formation qualitativeabnormal blood islandsrelative numbernumber increased Bsb[2]elongation of arista literal processarrested C-alpha[1D]adult behaviourbehavioral activityuncoordinated 2003 trial data: FB & ZFIN

62 What is the Entity slot used for?  In practical terms:  An ID from one of the following ontologies  GO CC  GO BP  GO MF  Species-specific anatomical ontology  OBO Cell  Or a cross-product  e.g. acidification GO:0045851  which has_location OBORE L midgut FBbt00005383  [example from FBal0062296: Acidification in the midgut of homozygous larvae is often less than in wild-type larvae]  But what does it mean in the context of an annotation?

63 Universals and particulars  An ontology consist of universals (classes)  fruitfly  wing  flight  Experimental data generally concerns particulars (instances) that instantiate universals  this particular wing of this particular fruitfly  this particular fruitfly participating in this particular flight from here to there  in annotation we often use a class ID as a proxy for an (unnamed) instance (or collection of instances)  It is important to always keep this distinction in mind

64 What is an Entity? UBO - Upper Bio Ontology BFO - Basic Formal Ontology

65 What is the Attribute slot used for? GenotypeEntityAttributeValue npogutstructuredysplastic gutrelative sizesmall r210retinapatternirregular brainstructurefused tm84d/v pattern formation qualitativeabnormal blood islandsrelative numbernumber increased Bsb[2]elongation of arista literal processarrested C-alpha[1D]adult behaviourbehavioral activityuncoordinated 2003 trial data: FB & ZFIN

66 Attributes: a proposal  Treat as Qualities  a dependent entity  a quality must have independent entity(s) as bearer  the quality inheres_in the bearer  Examples  The particular shape of this ball  The particular structure of this wing  The particular length of this tail  The particular rate of synaptic transmission between these two neurons

67 Attribute universals vs Attribute particulars  In an EAV annotation, a PATO class ID typically serves as a proxy for an unnamed attribute instance  Universals (classes) must always be defined in terms of their instances A Formal Theory of Substances, Qualities, and Universals Fabian NEUHAUS Pierre GRENON Barry SMITH

68 How is the value slot used? GenotypeEntityAttributeValue npogutstructuredysplastic gutrelative sizesmall r210retinapatternirregular brainstructurefused tm84d/v pattern formation qualitativeabnormal blood islandsrelative numbernumber increased Bsb[2]elongation of arista literal processarrested C-alpha[1D]adult behaviourbehavioral activityuncoordinated

69 Redundancy in EAV triples entityattributevalue finshapeirregular shape eyecolor hueblue mesenchymerelative thicknessthin brainstructurefused retinal cellsrelative orientationdisoriented

70 Proposal: EA model entityattributevalue finshapeirregular shape eyecolor hueblue mesenchymerelative thicknessthin brainstructurefused retinal cellsrelative orientationdisoriented

71 Proposal: EA model entityattribute finshapeirregular shape eyecolor hueblue mesenchymerelative thicknessthin brainstructurefused retinal cellsrelative orientationdisoriented

72 Association = Genotype Phenotype Environment Assay Phenotype = Stage* Entity Attribute Value Entity = OBOClassID Attribute = PATOVersion2ClassID Proposed schema modification

73 Monadic and relational attributes  Monadic:  the quality/attribute inheres in a single entity  Relational:  the quality/attribute inheres in two or more entities  sensitivity of an organism to a kind of drug  sensitivity of an eye to a wavelength of light  can turn relational attributes into cross-product monadic attributes  e.g. sensitivityToRedLight  better to use relational attributes  avoids redundancy with existing ontologies

74 Association = Genotype Phenotype Environment Assay Phenotype = Stage* Entity Attribute Entity* Entity = OBOClassID Attribute = PATOVersion2ClassID Incorporating relational attributes Example data record: Phenotype = “organism” sensitiveTo “puromycin”

75 Measurable attributes  Some attributes are inexact and implicitly relative to a wild-type or normal attribute  relatively short, relatively long, relatively reduced  easier than explicitly representing:  this tail length shorter-than ‘canonical mouse’ wild-type tail length  Some attributes are determinable  use a measure function  unit, value, {time}  this tail has length L  measure(L, cm) = 2  Keep measurements separate from (but linked to) attribute ontology

76 Incorporating measurements Association = Genotype Phenotype Environment Assay Phenotype = Stage* Entity Attribute Entity* Measurement* Measurement = Unit Value (Time) Entity = OBOClassID Attribute = PATOVersion2ClassID Example data record: Phenotype = “gut” “acidic” Measurement = “pH” 5

77 Composite phenotype classes  Mammalian phenotype has composite phenotype classes  e.g. “reduced B cell number”  Compose at annotation time or ontology curation time?  False dichotomy  Core 2 will help map between composite class based annotation and EA annotation

78 Interpreting annotations  Annotations are data records  typically use class IDs  implicitly refer to instances  How do we map an annotation to instances?  Important for using annotations computationally

79 Interpreting annotations (1)  What does an EA (or EAV) annotation mean?  Annotation:  Genotype=“FBal00123” E=“brain” A=“fused”  presumed implied meaning:  this organism  has_part x, where x instance_of “brain” x has_quality “fused”  or in natural language:  “this organism has a fused brain”  Various built-in assumptions

80

81 Interpreting annotations (II)  What does this mean:  annotation:  Genotype=“FBal00123” E=“wing” A=“absent”  using same mapping as annotation I:  fly98 has_part x, where  x instance_of “wing”  x has_quality “absent”  or in natural language:  this fly has a wing which is not there  !  What we really intend:  NOT(this organism has_part x, where x instance_of “wing”)

82 Interpreting annotations (II)  What does this mean:  annotation:  Genotype=“FBal00123” E=“wing” A=“absent”  using same mapping as annotation I:  this organism has_part x, where  x instance_of “wing”  x has_quality “absent”  or in natural language:  this fly has a wing which is not there  !  What we really intend:  this organism has_quality “wingless”  “wingless” = the property of having count(has_part “wing”)=0

83  Are our computational representations intended to capture linguistic statements or reality?

84 Does this matter?  Logical reasoners will compute incorrect results  unless explicitly provided with specific rules for certain attributes such as “absent”  What are the consequences?  Basic search will be fine  e.g. “find all wing phenotypes”  But computers will not be able to reason correctly

85 Interpreting annotations (III)  What does this mean:  annotation:  E=“digit” A=“supernumery”  using same interpretation as annotation I:  this organism has_part x, where  x instance_of “digit”  x has_quality “supernumery”  or in natural language:  this organism has a particular finger which is supernumery  What we really intend:  this person has_quality “supernumery finger”  “supernumery finger” = the property of having count(has_part “digit”) > wild-type”  !!!

86 Interpreting annotations (IV)  What does this mean:  annotation:  Gt=“mp001” E=“brown fat cell” A=“increased quantity”  using same mapping as annotation I:  this organism has_part x, where  x instance_of “brown fat cell”  x has_quality “increased quantity”  or in natural language:  this organism has a particular brown fat cell which is increased in quantity  What we really intend:  this organism has_part population_of(“brown fat cell”) which has_quality increased size

87 Other use cases  spermatocyte devoid of asters  Homeotic transformations  increased distance between wing veins  Some vs all

88 Alternate perspectives  process vs state  regulatory processes:  acidification of midgut has_quality reduced rate  midgut has_quality low acidity  development vs behavior  wing development has_quality abnormal  flight has_quality intermittent  granularity (scale)  chemical vs molecular vs cell vs tissue vs anatomical part

89 Summary  Define attributes in terms of instances  Evaluate proposed new schema  measurement proposal  relational attribute proposal  Complexity trade-off  create library of use cases  Core2 will create tools to present user-friendly layer  Alternate perspective annotations are useful

90 Ontologies must adapt over time  Getting it right  It is impossible to get it right the 1st (or 2nd, or 3rd, …) time.  What we know about biology is continually growing  This “standard” requires versioning. Improve Collaborate and Learn

91 Our Outlook  “The needs of the many outweigh the needs of the few, or of the one.”


Download ppt "Eye  what kinds of things exist?  what are the relationships between these things? ommatidium sense organeye disc is_a part_of develops from A biological."

Similar presentations


Ads by Google