A statistical method for comparing phenotypes in the OBD

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Discovering Disease Associations using a Biomedical Semantic Web: Integration and Ranking One of the principal goals of biomedical research is to elucidate.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Winging It: Connecting gene expression, cell signaling and morphology Group 7 Brad Davidson, Amy Vollmer, Liz Vallen: Swarthmore College Missy McElligott.
Homology.
More than one way to dissect an animal Melissa Haendel ZFIN Scientific Curator.
Linking Animal Models to Human Diseases Supported by NIH P41 HG and U54 HG the University of Oregon, Eugene, OR.
+ OWL for annotators David Osumi-Sutherland. + What is OWL? Web Ontology Language Can express everything in OBO and more. Certified web standard Fast.
Homology Review Human arm Lobed-fin fish fin Bat wing Bird wing Insect wing Homologous forelimbs not homologous as forelimbs or wings Definition: Structures.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Working with gene lists: Finding data using GEO & BioMart June 5, 2014.
Linking Animal Models to Human Diseases Supported by NIH P41 HG and U54 HG the University of Oregon, Eugene, OR
PATO An Ontology of Phenotypic Qualities
Automated tools to help construction of Trait Ontologies Chris Mungall Monarch Initiative Gene.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
PATO & Phenotypes: From model organisms to clinical medicine Suzanna Lewis September 4th, 2008 Signs, Symptoms and Findings Workshop First Steps Toward.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Relating Animal Model Phenotypes to Human Disease Genes Project Goals: To develop methods and syntax for describing phenotypes using ontologies To compare.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Networks and Interactions Boo Virk v1.0.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
UNCERTML - DESCRIBING AND COMMUNICATING UNCERTAINTY WITHIN THE (SEMANTIC) WEB Matthew Williams
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
The “über-ontology” (Uberon) Melissa Häendel, Chris Müngall, George Gkoütos Cell Ontology Workshop May, 2010.
Linking Animal Models and Human Diseases Supported by NIH P41 HG002659, U54 HG004028, & R01 HG Cambridge University & the University of Oregon.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Phenote Mark Gibson Berkeley Bioinformatics and Ontology Project (BBOP) National Center for Biomedical Ontologies(NCBO) Lawrence Berkeley National Lab.
Bioinformatics and Computational Biology
Phenote Mark Gibson Berkeley Bioinformatics and Ontology Project (BBOP) National Center for Biomedical Ontologies(NCBO) Lawrence Berkeley National Lab.
Phenotype And Trait Ontology (PATO) and plant phenotypes
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Anatomy Ontologies & Potential Users: Bridging the Gap Ravensara Travillian European Bioinformatics Institute
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Linking Animal Models and Human Diseases
TDM in the Life Sciences Application to Drug Repositioning *
COP Introduction to Database Structures
Linking Ontologies to Spatial Databases
Step 1: Specify a null hypothesis
Networks and Interactions
Integrating SysML with OWL (or other logic based formalisms)
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
School of Geography, University of Leeds
Outline Motivation: data mining Ontologies and all-some relationships
Solver & Optimization Problems
The Common Anatomy Reference Ontology (CARO) and queries across species Melissa Haendel ZFIN.
Basics of Comparative Genomics
Statistical Testing with Genes
H070 Topic Title H470 Topic Title.
Homework: Answer reading questions in HP 13
Genome Annotation Continued
Artefacts and Biases in Gene Set Analysis
ece 627 intelligent web: ontology and beyond
Hedgehog and Patched in Neural Development and Disease
CS 188: Artificial Intelligence Spring 2007
Artefacts and Biases in Gene Set Analysis
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Basics of Comparative Genomics
Database Management system
Statistical Testing with Genes
Essential knowledge 1.B.1:
Heather M. Young, Lincon A. Stamp  Gastroenterology 
Presentation transcript:

A statistical method for comparing phenotypes in the OBD Suzanna Lewis Data Round-up 2008

OBD model: Requirements Generic We can’t define a rigid schema for all of biomedicine Let the domain ontologies do the domain modeling Expressive Use cases vary from simple ‘tagging’ to complex descriptions of biological phenomena Formal semantics Amenable to logical reasoning First Order Logic and/or OWL1.1 Standards-compatible Remain open to possibility of integration with semantic web

OBD Model: overview Graph-based: nodes and links Nodes: Classes, instances, relations Links: Relation instances Connect subject and object via relation plus additional properties Annotations: Posited links with attribution / evidence Equivalent expressivity as RDF and OWL Links aka axioms and facts in OWL Attributed links: Named graphs Reification N-ary relation pattern Supports construction of complex descriptions through graph model

Experimental Design Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms) Gregory Bateson

Testing the methodology Annotated 11 gene-linked human diseases described in OMIM, and their homologs in zebrafish and fruitfly: Gene Disease ATP2A1 Brody Myopathy EPB41 Elliptocytosis EXT2 Multiple Exostoses EYA1 BOR syndrome FECH Protoporphyria PAX2 Renal-Coloboma Syndrome SHH Holoprosencephaly SOX9 Campomelic Dysplasia SOX10 Peripheral Demyelinating Neuropathy TNNT2 Familial Hypertrophic Cardiomyopathy TTN Muscular Dystrophy Incomplete list of “syndromes”!!! 5

An OMIM Record 6

Annotation Results Gene # geno-types phenotype statements total average/ allele ATP2A1 5 16 3 EPB41 4 18 EXT2 35 7 EYA1* 335 19 FECH 14 37 PAX2* 24 183 8 SHH 207 9 SOX9* 13 321 23 SOX10* 15 192 12 TNNT2 10 36 TTN 21 63 Total (11) 146 1443 This shows the results of the annotation effort. For the 11 genes we annotated 146 genotypes with a total of 1443 annotation statements. We performed 4 of these in triplicate (with asterisk) to check for consistency. Without getting into it, the genes annotated in triplicate revealed that the annotators had more than 75% similar annotations. (we just don’t have time in the 15 minutes to go through this.) 7

Experimental Design Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

Ontology-based similarity scoring First, you have to discuss the scoring metrics. There’s information content, and the IC ratios between things. Nodes are deemed similar on the basis of what they have in common. we are looking for similarity on the basis of shared annotations to classes in an ontology, or to compositional description classes In these cases, we used inferred annotations. E.g. if geneA is annotated to Leg and geneB to Wing, they have Appendage in common. Scoring is typically a measure of what the nodes have in common vs what one node has that the other one does not. The basicSimilarityScore (aka class overlap) is the ratio of nodesInCommon to nodesInUnion . Recall that this includes inferred annotations. This is desirable for two reasons: it allows approximate matching for non-exact classes, and it penalises general matches in favour of specific matches. The information content of a class is a measure of how "surprised" we are to see it in an annotation. The pre-reasoned results are essential for finding nodesInCommon - annotations do not necessarily match exactly - they may match further up the graph. so we do not report or double-count nodes that subsume existing nodes. Ontology-based similarity scoring Measure IC of any node: Compute ‘similarity’ by finding IC ratios between any genotypes, genes, classes, etc. 9

Ontology-based Search Algorithm Now, given that we can compute the IC ratios between any two things, then we can certainly do this for the phenotypic profiles for any two gene pairs. Given a query node q, we try to find hits h1, h2,... that are of the same type as q, and are similar to q in terms of their annotation profile, A(q). The annotation profile is the set of classes used to annotate that entity, and their ancestors, via some relevant relation(s). c ∈ A(q) iff link(r,q,c) link(r,q,c) may be computed via reasoning. For example: link(influences,sox9,curvature-of-tibia) → link(influences,sox9,morphology-of-bone) Candidate hits are prioritized according to how close they are to the profile. They are ordered in descending order by | A(h) ∩ H(q) |, and the first N are chosen as the final set Ontology-based Search Algorithm Given a query node q, we try to find hits h1, h2,... that are of the same type as q, and are similar to q in terms of their annotation profile, A(q). First step: create an annotation profile for the thing to be searched (i.e., a gene) The annotation profile is the set of classes used to annotate that entity, and their ancestors Comparing annotation profiles using same similarity IC metric c ∈ A(q) iff link(r,q,c) link(influences,sox9,curvature-of-tibia) → link(influences,sox9,morphology-of-bone) 10

Yes, we can find alleles of same gene # geno-types allelic phenotype profiles phenotype statements # alleles >0 sim ratio average sim ratio average IC ratio total average/ allele ATP2A1 5 0.8 0.799 16 3 EPB41 4 0.315 0.422 18 EXT2 1 35 7 EYA1* 0.226 0.229 335 19 FECH 14 0.365 0.364 37 PAX2* 24 0.068 0.063 183 8 SHH 0.457 0.414 207 9 SOX9* 13 0.207 0.197 321 23 SOX10* 15 0.038 0.031 192 12 TNNT2 10 0.517 0.505 36 TTN 21 0.106 0.1 63 Total (11) 146 142 1443 Those with astersiks (*) were done in triplicate Really, here, the take home message is that for all 11 genes tested, nearly all (exception of two alleles) were able to search in a pairwise way and a find the other alleles of the same gene. (in bold). YES WE CAN!!! 11

Experimental Design Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

UBERON: an anatomical linking ontology Each organism has its own anatomical ontology To connect annotations across species, need a way to link the anatomies Wanted an ontology that incorporated both functional homology and anatomical similarity Created an ontology linking anatomies from ZFA, FMA, XAO, MA, MIAA, WBbt, FBbt To enable these queries that annotate using different anatomical ontologies, we needed a way to connect them together. We created an “uber” anatomy ontology that brings together the anatomical parts from the different anatomy ontologies. When used in our searches, the annotations to individual anatomy terms, like fish eye and human eye can be linked together through a common “uber” eye. NEED DIAGRAM HERE 13

UBERON connects phenotype entities from separate anatomy ontologies The entities that annotations were made two in mouse, human, and zebrafish are shown in orange. Then, the links between the ontology terms have been made with the aide of the UBERON ontology… each of the annotated entities can be linked through the UBERON:forebrain term. 14

Homologs are found by similarity search simIC human/ mouse simIC human/ zebrafish Gene ATP2A1 0.047 0.177 EPB41 0.328 0.141 EXT2 0.067 0.050 EYA1 0.264 0.495 FECH 0.430 0.101 PAX2 0.157 0.375 SHH 0.091 0.253 SOX9 0.226 0.383 SOX10 0.380 0.443 TNNT2 0.000 0.118 TTN 0.248 0.567 Using the UBERON connections, we are able to find homologs of each of the human disease genes in mouse and zebrafish. Here, we show the similarity ratio based on information content between the human-mouse and human-zebrafish homologous gene pairs. The phenotypic profiles for each gene represent a consolidation (promotion) of the phenotypic description Eqs. Interesting things are suggested here. Its possible that some of the zebrafish homologs (EYA1, PAX2, SHH, SOX9, SOX10, TTN) might make better models than the mouse homologs for the diseases caused by the human genes. 15

Experimental Design Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

shha is phenotypically similar to homologous pathway members zebrafish shh pathway mouse homologs human homologs shha Shh SHH smo Smo   disp1 Disp1 prdm1a Prdm1 hdac1 HDAC4 scube2 wnt11 Wnt1, 7b, 3a, 9b, 10b WNT6 gli1,2a Gli2, Gli3 GLI2 bmp2b Bmp4 ndr1,2 NDRG1 hhip Hhip ptc1,ptc2 Ptch1,2 Rab23 Gas1 Nck1 Zic2 notch1a Notch1,2 Gsk3b This table shows the list of genes known to be involved in the shh pathway that were retrieved with a similarity search using the zebrafish shh as bait. The list of zf genes is like that in the earlier slide. The mouse and human homologs are also indicated. For some, the mouse/human homologs were retrieved when the zf genes were not. This could be fore several reasons… the biggest reason is that much of the knowledge of the zebrafish pathway members comes from morpholino experiments. The morpholino data was not included in our initial analyses. One of the next steps is for us to include the morpholino data and redo this search. Many of the human homologs also are not annotated… These lacking annotations for the human disease genes therefore represent significant deficiencies and extremely necessary resources for biological research. The next slide shows how these genes fall in the shh pathway… 17

Zebrafish SHH signaling pathway The picture is from KEGG. Their model includes the known members of the the human HH signaling pathway. Additional genes known to be involved in the zebrafish signaling pathway have been added (gli1, gli2a, hdac1, prdm1a, bmp2b, dsp1, ndr2, scube2). Ptc and Smo are transmembrane proteins thought to form a receptor complex for the Hh ligand (7, 8), and the Gli zinc-finger transcription factors have been demonstrated to have both activating and inhibitory roles in the Hh pathway (9–13). A second Ptc gene has been isolated, Ptch-2, which encodes a putative receptor for Shh (14, 15). 18

Potential candidates also found Gene Similarity Characterization dharma 0.483 Paired type homeodomain protein that has dorsal organizer inducing activity and is regulated by wnt signaling. tbx16 0.401 T-box transcription factor regulates mesenchyme to epithelial transition and LR patterning. plod3 0.387 Lysyl hydroxylase and glycosyltransferase important for axonal growth cone migration. ntl 0.382 T-box transcription factor important for notochord and mesoderm development. kny 0.374 Glypican component of the wnt/PCP pathway tll1 0.372 Metalloprotease that can cleave Chordin and increase Bmp activity. copa Cotamer vesicular coat complex important for maintenance of the Golgi and ER transport. Important for notochord differentiation. sfpq 0.369 RNA splicing factor required for cell survival and neuronal development. lama1 Basement membrane protein important for eye and body axis development. lamc1 0.367 Basement membrane protein important for eye development atp7a 0.365 Copper transporting ATPase. atp2a1 0.363 Sarcoplasmic reticulum transmembrane ATPase that mediates calcium re-uptake. flh 0.358 Homeobox gene important for notochord and epiphysis development. Anterior/posterior expression determined by wnt activity. wnt5b 0.327 Extracellular cysteine rich glycoprotein required for convergent extension movements during posterior segmentation. In addition to the known pathway members, there were many more as-yet-unlinked genes found with similar phenotypes to shha. These represent potential pathway candidates. Here we’ve summarized some likely candidates based on their characterization. This is where the real power of this method comes in… discovery! 19

Results thus far Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

Conclusions Ontologies help Promising new directions for ontology-based phenotype annotation Promising ways for identifying novel pathway members, generating hypotheses to test at the bench

Acknowledgements NCBO-Berkeley Christopher Mungall Nicole Washington Mark Gibson Rob Bruggner U of Oregon Monte Westerfield Melissa Haendel Cambridge Michael Ashburner George Gkoutos (PATO) David Osumi-Sutherland National Institutes of Health