Linking Animal Models and Human Diseases Supported by NIH P41 HG002659 and U54 HG004028 Cambridge University & the University of Oregon
Yvonne Bradford Melissa Haendel Kevin Schaper Erik Segerdell Amy Singer Sierra Taylor Michael Ashburner Rachel Drysdale George Gkoutos David Sutherland BBOP Mark Gibson Suzi Lewis Chris Mungall Nicole Washington
Linking Animal Models and Human Diseases Develop methods to: Describe phenotypes Compare descriptions (annotations) Search phenotypes within and across species
EYA gene mutants zebrafish zebrafish fly human
Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Animal disease models Humans Animal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model)
Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Sequence analysis (BLAST) can connect animal genes to human genes Humans Animal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model)
Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Shared ontologies and syntax can connect mutant phenotypes to candidate human disease genes Humans Animal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model)
Annotation of eya mutant phenotype = entity + quality EQ1 = eye + small
Linking Animal Models and Human Diseases Develop methods to: Describe phenotypes Compare descriptions (annotations) Search phenotypes within and across species
Strategy: use shared genes as proof of principle OMIM genes ZFIN mutant genes FlyBase mutant genes
Data type Total ZFIN genes 20,385 Genes with assigned human orthologs 2,884 Genes with OMIM links 2,174 ZFIN mutants 3,188 ZFIN mutants with OMIM links Corresponding human genes Drosophila homologs of these 271 720 271 187
Experimental design Annotate phenotypes in human, zebrafish, and fly Annotate human phenotypes triple blind Compare annotations + FlyBase OBD BBOP + ZFIN
Results: Number of annotations added to OBD Human Curate OMIM gene and disease data into OBD using EQ syntax Zebrafish Curate zebrafish mutant phenotype data into OBD using EQ syntax
Results: Number of annotations added to OBD Human Curate OMIM gene and disease data into OBD using EQ syntax 10 genes 677 EQ annotations from ZFIN 314 EQ annotations from Flybase 507 EQ annotations from BBOP Zebrafish Curate zebrafish mutant phenotype data into OBD using EQ syntax 4,355 genes and genotypes into OBD 17,782 EQ annotations into OBD
~10% of the annotations for 1 gene We need a quantitative method to calculate the similarity of annotations
Annotations vary among curators
Example phenotype annotation E = facial ganglion + Q = morphology influences Query: Gene variant E = facial ganglion + Q = morphology
Terms are related by ontologies epibranchial ganglion is_a influences Query: Gene variant E = facial ganglion + Q = morphology
Terms are related by ontologies cranial ganglion is_a epibranchial ganglion is_a influences Query: Gene variant E = facial ganglion + Q = morphology
Terms are related by ontologies ganglion Similar annotations have the same or more general entity types is_a cranial ganglion is_a epibranchial ganglion is_a influences Query: Gene variant E = facial ganglion + Q = morphology
Similarity calculated by reasoning across ontologies ganglion Similar annotations have the same or more general entity types is_a cranial ganglion is_a epibranchial ganglion is_a influences Query: Gene variant E = facial ganglion + Q = morphology is_a shape structure size Similar annotations have the same or more specific PATO qualities Similarity between two genotypes is weighted by: the total number and rarity of similar annotations, and the degree of relatedness within each ontology. is_a convex folded wavy
Ontologies support comparisons Curator 1 Curator 2 Curator 3 At the intersection of different curators’ term choices, colors merge, with white indicating consensus among all 3
Ontologies support comparisons Curator 1 Curator 2 Curator 3 At the intersection of different curators’ term choices, colors merge, with white indicating consensus among all 3
Subsumption in similarity scoring Gene Genotype Quality Anatomical Entity Morphology Organ Shape Part of organ Thickness Cranial nerve Increased thickness Vestibulo- cochlear nerve Trigeminal nerve Perforated Wall of heart Deformed Retina GeneA A-001 + A-002 A-003 Gene B B-001 B-002 B-003 p value 0.99 0.6 0.3 1.1e-4 3.2e-5 6.1e-6 2.0e-7 1.9e-6 Unlikely to match by chance Black = actual annotations; red = subsumption
Average annotation consistency among curators total annotations # Annotations similar annotations We can quantitate similarity and, thus, optimize consistency EYA1 SOX10 SOX9 PAX2 congruence 0.78 0.71 0.61 0.72
Linking Animal Models and Human Diseases Develop methods to: Describe phenotypes Compare descriptions (annotations) Search phenotypes within and across species
Phenotypes identify other alleles of the same gene Example: A search for phenotypes similar to each human EYA1 allele returns other human EYA1 alleles Similarity between two genotypes is weighted by: the total number and rarity of similar annotations, and the degree of relatedness within each ontology.
Phenotypes identify other alleles of the same gene Example: A search for phenotypes similar to each human EYA1 allele returns other human EYA1 alleles Allele number Allele number Similarity between two genotypes is weighted by: the total number and rarity of similar annotations, and the degree of relatedness within each ontology. (The smaller the number, the more similar)
Human phenotypes identify mutations in orthologous model organism genes A search for phenotypes similar to: Human EYA1 variant OMIM:601653 MP:deafness = E = Sensory perception of sound Q = absent These similarities are based on the same GO term entity. Anatomical cross-species queries require classification of anatomical structures, in the different species, based on function and/or homology.
Human phenotypes identify mutations in orthologous model organism genes A search for phenotypes similar to: Human EYA1 variant OMIM:601653 MP:deafness = E = Sensory perception of sound Q = absent returns: Mouse Eya1 bor/bor and Eya1tm1Rilm/tm1Rilm E = Sensory perception of sound Q = decreased These similarities are based on the same GO term entity. Anatomical cross-species queries require classification of anatomical structures, in the different species, based on function and/or homology.
Problem: different terms are used in different species mouse:EYA1 MGI MP:abnormal vestibulocochlear nerve morphology MGI:nnn eya1 variant nn Homologene: EYA1 human:EYA1 NCBO inheres_in PATO:decreased thickness OMIM:601653.nn EYA1 variant FMA:cochlear nerve zebrafish:eya1 ZFIN inheres_in PATO:decreased thickness ZFIN:geno-nnn Eya1 variant nn ZFA:cranial nerve VIII
Searches across species utilize homology annotations mouse:EYA1 MGI MP:abnormal vestibulocochlear nerve morphology MGI:nnn eya1 variant nn mammalian phenotype decomposed into EQ = Homologene: EYA1 inheres_in PATO:morphology MA:vestibulocochlear VIII nerve human:EYA1 NCBO inheres_in PATO:decreased thickness FMA:vestibulocochlear nerve OMIM:601653.nn EYA1 variant FMA:cochlear nerve zebrafish:eya1 ZFIN inheres_in PATO:decreased thickness homology annotations ZFIN:geno-nnn Eya1 variant nn ZFA:cranial nerve VIII
Animal annotations can identify human diseases (Campomelic dysplasia) Human, SOX9 (Campomelic dysplasia) Zebrafish, sox9a (jellyfish) Scapula: hypoplastic Scapulocorocoid: aplastic Curation of mutant phenotypes and human diseases using common ontologies & syntax can provide candidate genes and animal models of disease Lower jaw: decreased size Cranial cartilage: hypoplastic Heart: malformed or edematous Heart: edematous Phalanges: decreased length Pectoral fin: decreased length Long bones: bowed Cartilage development: disrupted
Annotations can identify other pathway members Similarity search for zebrafish shhat4/t4 identifies pathway members Genotype Congruence # of alleles Function disp1ty60 9.6E-19 6 regulates long range Shh signaling gli1ts269 6.6E-13 2 downstream transcriptional repressor wnt5bte1c 4.3E-11 5 downstream target gene smohi1640Tg 4.1E-11 4 membrane protein mediates Shh intracellular signaling pathway hhiphu540a 3.1E-5 binds Shh in membrane Shha Disp1 Gli1
Linking Animal Models and Human Diseases Develop methods to: Describe phenotypes Compare descriptions (annotations) Search phenotypes within and across species