Anthony Rogers* WormBase Consortium *Wellcome Trust Sanger Institute California Institute of Technology Cold Spring Harbor Laboratory Washington University at St. Louis WormBase : Recent and Future Developments
What you told us to do! User survey in Nov/Dec 2005 had 761 respondants 1)Website navigation and speed 2)Gene structures (see poster P159) 3)Genetic map (see poster P157) 4)Phenotypes 5)Literature search 6)Use of other nematode genomes 7)Community forum / wiki
Web site speed improvements Extra server hardware restructured architecture and load balancing pre-caching popular changes (ie gene pages) Another European mirror site ! wormbase.sanger.ac.uk
What you told us to do! User survey in Nov/Dec 2005 had 761 respondants 1)Website navigation and speed 2)Gene structures (see poster P159) 3)Genetic map (see poster P157) 4)Phenotypes 5)Literature search 6)Use of other nematode genomes 7)Community forum / wiki
What you told us to do! User survey in Nov/Dec 2005 had 761 respondants 1)Website navigation and speed 2)Gene structures (see poster P159) 3)Genetic map (see poster P157) 4)Phenotypes 5)Literature search 6)Use of other nematode genomes 7)Community forum / wiki
WormBase wiki
WormBase wiki
* Example 1 List all synonyms for the following genes; bli-1, egl-43, lag-1. * Example 2 From all genes in C.elegans that have an ortholog in C. briggsae, are located in chromosome III, are sterile in an RNAi screen, and have annotated UTRs, provide a FASTA file containing peptide sequence. * Example 3 Download the set of all RNAi experiments that resulted in an Emb phenotype, and in which the target genes are classified as serine/threonine kinases.
WormMart Based on the BioMart software Originally developed at EBI/WTSI for Ensembl, Various deployments – WormBase, UniProt, Gramene. WormMart Launched in April 2005, Replacement for “Batch Genes” and (eventually) “Batch Sequences” pages, Seven WormBase objects are currently described; “Gene”, “GO_term”, “Expression pattern”, “Phenotype”, “RNAi”, “Variation” and “Paper”. Development is driven largely by user feedback.
E WS140 Gene WS144 Expression Pattern Gene Phenotype RNAi Gene Upstream and downstream sequences for all miRNA genes that lie on C. elegans chromosome II
Coding mi RNA mRNA ncRNA Pseudo miRNA I II III IV X II
Features Structures Sequences
Search WormBase on Search for “ egl mutants related to hormones ”
What you told us to do! User survey in Nov/Dec 2005 had 761 respondants 1)Website navigation and speed 2)Gene structures (see poster P159) 3)Genetic map (see poster P157) 4)Phenotypes 5)Literature search 6)Use of other nematode genomes 7)Community forum / wiki
Comparative genomics Which species ? What will we do with them ? When will this happen ?
Nematode phylogeny
What we’ll do... semi-curated gene set based on various predictors Protein set protein annotation ( PFAM, InterPro, tmhmm, signalp ) blastp blastx Whole genome alignment * ortholog assignment * Pretty much the same as we have with C. briggsae
C.briggsae gene page
KOGS / InParanoid Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.Remm M, Storm CE, Sonnhammer ELJ Mol Biol : The COG database: new developments in phylogenetic classification of proteins from complete genomes.Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EVNucleic Acids Research :22-28
TreeFam worm genes “TreeFam is a database of phylogenetic trees of animal genes. It fits a gene tree into the universal species tree and finds historical duplications, speciations and losses event”
Compara “The Ensembl Compara multi-species database stores the results of genome-wide species comparisons calculated for each data release. The database includes Comparative genomics: Whole genome alignments Synteny regions Comparative proteomics: Orthologue predictions Paralogue predictions Protein family clusters”
What you told us to do! User survey in Nov/Dec 2005 had 761 respondants 1)Website navigation and speed 2)Gene structures (see poster P159) 3)Genetic map (see poster P157) 4)Phenotypes 5)Literature search 6)Use of other nematode genomes 7)Community forum / wiki
Developing a controlled vocabulary for the description of phenotpyes Will allow high level and more detailed descriptions to be contained in a hierarchial, browsable structure. Fine grained enough to distinguish between specific experimental definitions of a phenotype if required. Phenotype ontology
Sequence report page lists Transcripts and Microarray assays falling within the span of each fosmid. interpolated genetic map position DNA sequence Options to expand lists of EST, waba and blast alignments, Repeats and RNAi expts. Link to order from Vancouver Fosmids
“WormBook is a comprehensive, open-access collection of original, peer-reviewed chapters covering topics related to the biology of Caenorhabditis elegans (C. elegans). WormBook also includes WormMethods, an up-to-date collection of methods and protocols for C. elegans researchers.” Currently hold 107 chapters. wormbook.sanger.ac.uk WormBook
Useful files you may not know about best_blastp_hits.WS157.gz -- Best blastP hit for each worm protein CE00081,WP:CE24153,4.8e-121,ENSEMBL:ENSP ,4e-07,BP:CBP23671,4e-117,FLYBASE:CG7971-PD,2.3e-06 Oligo_set WBGeneID Gene_sequence_nameGene_typeMicroarray_type cea WBGene AC3.9CDSGSC at WashU *oligo_mapping.gz for affy, agilent and gsc chips ( 3 files ) cdna2orf.WS157.gz cDNACDS yk1288c01.3,H22K11.1 confirmed_genes.WS157.gz - FASTA fomat file of CDSs with full transcript evidence geneIDs.WS157.gz Gene_idCGC nameSeq name WBGene ,abf-1,C50F2.9 pcr_product2gene.WS157.gz pcr_productGene_id (cgc_name)Seq name sjj_C55H1.2WBGene (gpa-10),C55H1.2 intergenic_sequences.dna.gz >Gene_id_Gene_idChromosomeStart coordlength >WBGene _WBGene CHROMOSOME_I 16832, len: 687 atgttggcaggttttttcagtagtttttgagtgaaaatagaggtaaaaagacagaaaatc aataaaaaatgaaaacaaaactatgaaaaatggttgaaaatcgagcaaaaatcgttcaaa
Why isn’t data from paper X in WormBase ? List of s and forms where data can be submitted. User submitted data is PRIORITISED over normal curation pipelines For large or novel data sets contact us asap - before publication - confidentiality agreed or
Wellcome Trust Sanger Institute Paul Davis Richard Durbin Michael Han Anthony Rogers Mary Ann Tuli Gary Williams Cold Spring Harbor Laboratory Payan Canaran Jack Chen Tristan Fiedler Todd Harris Sheldon McKay Will Spooner Lincoln Stein California Institute of Technology Igor Antoshechkin Carol Bastiani Juancarlos Chan Wen Chen Ranjana Kishore Raymond Lee Hans-Michael Mueller Cecilia Nakamura Andrei Petcherski Gary Schindelman Erich Schwarz Paul Sternberg Kimberly Van Auken Daniel Wang Washington University at St. Louis Tamberlyn Bieri Darin Blasiar Phil Ozersky John Spieth