Phenoscape Data Jamboree 2 Introduction and Goals: Paula Mabee Sept. 28, 2008
Phenoscape Data Roundup Introduction and Goals: Paula Mabee Sept. 28, 2008
Ontologies Data
Cowboys & cowgirls:
Jamboree goals: Curate data Evaluate web interface prototypes Project personnel meeting
Curation goals: Work with taxon experts to annotate characters using ontologies Feedback on curation workflow and interface How to make more efficient? How to make more consistent? Priorities for next 3-6 months of curation in relation to: Prioritized use cases Number of available character X taxon matrices Original taxonomic scope
Phenoscape.org History: Communication between zebrafish model organism community and Cypriniformes Tree of Life group through NESCent workshops (2005-2006) NSF-DBI to Mabee, Vision, and Westerfield (June 1, 2007) Goal: Create curated, ontology- based evolutionary phenotype database that maps to genetic databases Generalizable system: Prototype with ostariophysan fishes
Shared ontologies & syntax connects models to humans a Model organism Human Ontology Mutant Gene Mutant Gene Mutant or Missing Protein Mutant or Missing Protein Ontology Mutant Phenotype Mutant Phenotype (disease)
Shared ontologies & syntax connects models to other species a Zebrafish Multiple fish species Mutant Gene Candidate genes Mutant or Missing Protein ? Ontology Mutant Phenotype Natural Phenotypes
Zebrafish: Mutagenesis produces phenotypes no tail mutant Halpern et al. (1993) Cell Halpern et al. (1993) Cell
Phenotypes mapped to genes sox9ahi1134 edn1 sox9ahi1134; lockjaw val she, stu, edn1-MO Maxilla: size reduction Dentary: size reduction Retroarticular: loss Opercle: size reduction; loss Ceratohyal: shape change Branchiostegals: number decrease Branchiostegals: shape change
Evolutionary phenotypes Mutation, Gene Flow, Selection, Drift Species A Morphology Species B Morphology Species C Morphology
Evolutionary phenotypes Genetic bases of morphology unknown Mutation, Gene Flow, Selection, Drift ? Species A Morphology Species B Morphology Species C Morphology
Needs analysis use cases: Phenoscape designed to meet top-priority questions/needs of the community concerning development and evolution of morphology, e.g. Find genes underlying morphological characters (Which ones? How many?) Discover patterns of correlation across genes and morphology Formulate models of morphological evolution; data mining and discovery Phenotypic BLAST to discover similar phenotypes and taxa
Prototype with Ostariophysan fishes (zebrafish) Ostariophysi & outgroups Mayden et al. 2008 (unpublished)
Phenoscape priorities (yr. 1) Develop ontologies for evolutionary work Develop curation tools (Phenote -> Phenex) Refine syntax for evolutionary characters Curate phenotypes (characters)
Phenoscape ontologies Cloned: New: Teleost Anatomy Ontology (2233 terms; 387 skeletal) Teleost Taxonomy Ontology (36,060 terms; 38,000 synonyms) Taxonomic Rank Ontology (8->31 terms) Zebrafish Anatomy Ontology (2196 terms; 310 skeletal) Existing: Phenotype and Trait Ontology (1,075 terms) Evidence Code Ontology Spatial Ontology (106 terms) 19 June 2008
Entity-Quality (EQ) syntax Caudal fin Reduced (size) no tail mutant Anatomy Ontology: Quality Ontology: AO PATO
Systematic characters can also be described using Entity-Quality (EQ) syntax State Character Entity Attribute Value Caudal fin size reduced AO PATO Entity Quality
Curate evolutionary phenotypes from free-text This is not a computable format These data cannot be easily compared across taxa These data cannot be linked to developmental genetics These data cannot be reasoned across
Curation needs: Data entry tool: Phenex List of ranked papers: 76 “A” papers Manual data entry (non-expert): Pdf (1/2 scanned) Taxon list (manual entry) Character matrix (manual entry) Free text character description (manual entry)
Curation needs: 4. E/Q ontology recoding of characters and states This workshop
Curation of ichthyological data: proposed Taxon # Species # Papers # Characters Cypriniformes (Mayden; Coburn) 3,268 70 1125 Siluriformes (Lundberg) 2,867 87 1200 Characiformes (Dahdul) 1,674 124 800 Gymnotiformes (Arratia) 134 2 200 Gonorynchiformes 37 80 75 Clupeiformes (Hilton) 364 60 380 TOTAL 8,344 423 3,780 Original NSF grant estimate (2006)
Jamboree goal: ~1,530,165 EQ annotations Taxon # A Papers # species # Character Cypriniformes (Mayden) 3 299 293 Siluriformes (Lundberg) 733 477 Gonorynchiformes (Grande) 69 187 Clupeiformes (Hilton) 114 282 Ostariophysi 1 20 ? TOTAL 13 1,235 1,239
Acknowledgements NSF for Phenoscape funding (Mabee, Vision, Westerfield) NSF- DBI0641025 National Evolutionary Synthesis Center NSF EF-0423641 NIH HG002659 (to Monte Westerfield) Phenoscape project: Hilmar Lapp, Wasila Dahdul, Peter Midford, Jim Balhoff, ZFIN: Monte Westerfield, Melissa Haendel Cypriniformes Tree of Life (NSF 0431290), colleagues and students (Mayden, Miya, Saitoh, He, Coburn, Arratia, Simon, Conway, Grey, Engeman, Bogutskya, Hilton, Aspinwall); Deep Fin RCN Suzanna Lewis, Chris Mungall (Lawrence Berkeley National labs) National Center for Biomedical Ontology