Anthony Rogers* WormBase Consortium *Wellcome Trust Sanger Institute California Institute of Technology Cold Spring Harbor Laboratory Washington University.

Slides:



Advertisements
Similar presentations
Text Mining Applications for Literature Curation Kimberly Van Auken WormBase Consortium Textpresso Gene Ontology Consortium.
Advertisements

ABSTRACT WormBase is a freely available information resource primarily for the nematode Caenorhabditis elegans but which progressively includes data from.
Genomic Innovations- Orthology Paralogy. Genomic innovation.
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Protein and Function Databases
Gene Ontology at WormBase: Making the Most of GO Annotations Kimberly Van Auken.
WormBase Workshop: 2015 International C. elegans Meeting Tools & Resources InterMine / WormMine – Chris Grove JBrowse – Scott Cain The WormBase Ontology.
WormBase: A Resource for the Biology & Genome of C. elegans Lincoln D. Stein.
Sequence Analysis with Artemis & Artemis Comparison Tool (ACT) South East Asian Training Course on Bioinformatics Applied to Tropical Diseases (Sponsored.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Comparative Genomics Tools in GMOD GMOD.org Dave Clements 1, Sheldon McKay 2, Ken Youns-Clark 2, Ben Faga 3, Scott Cain 4, and the GMOD Consortium 1 National.
05/04/2005 Informatics Meeting C. elegans – “Back To The Future”. Paul Davis (aka Huey)
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Anotation: Gene of which little is known What follows is a simulation of an orf page in the proposed graphical interface. The interface does not yet exist.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Improving Curation Efficiency: User Contributions and Textpresso-Based Semi-Automation SAB 2008 WormBase Literature Curators Textpresso.
The Plant Ontology Consortium website: Contact Information for deliverables Lincoln Stein,
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
BIOINFORMATIK I UEBUNG 2 mRNA processing.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Protein and RNA Families
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
2009 IADR, MIAMI, FL, USA Hands-on Experience for using the Human Oral Microbiome Database (HOMD) 2009 IADR Workshop, Miami, FL, USA Tsute (George) Chen.
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
Data Mining in Ensembl with BioMart Giulietta Spudich.
Copyright OpenHelix. No use or reproduction without express written consent1.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
By Michael Han Sanger Wormbase Group SAB 2008 Comparative Genomics with.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Maize Genome Project Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego, CA Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego,
Advisory Board Meeting, CSHL 2005 Developments at Sanger Anthony Rogers Wellcome Trust Sanger Institute.
Genome Database Comparative Genomics Phylogenomics Variation GrameneMart (BioMart) Discovery Environment Josh Stein Cold Spring Harbor Laboratory 1.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Sequence Curation Paul Davis Sanger Institute. Overview Sequence curation within WormBase consortium. Import of sequence data. Prediction stats. Work.
Welcome to the combined BLAST and Genome Browser Tutorial.
GMOD/GBrowse_syn Sheldon McKay iPlant Collaborative DNA Learning Center Cold Spring Harbor Laboratory.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats
Advisory Board Meeting, Caltech 2004 Sequence curation in WormBase Sanger Institute, Hinxton & GSC, St Louis.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Mary Ann Tuli Presented by Anthony Rogers
Mary Ann Tuli Presented by Anthony Rogers
Comparative Genomics.
Annotating with GO: an overview
Sequence based searches:
Genome Annotation Continued
Ensembl Genome Repository.
Genetic Data in Mary Ann Tuli.
Welcome - webinar instructions
Presentation transcript:

Anthony Rogers* WormBase Consortium *Wellcome Trust Sanger Institute California Institute of Technology Cold Spring Harbor Laboratory Washington University at St. Louis WormBase : Recent and Future Developments

What you told us to do! User survey in Nov/Dec 2005 had 761 respondants 1)Website navigation and speed 2)Gene structures (see poster P159) 3)Genetic map (see poster P157) 4)Phenotypes 5)Literature search 6)Use of other nematode genomes 7)Community forum / wiki

Web site speed improvements Extra server hardware restructured architecture and load balancing pre-caching popular changes (ie gene pages) Another European mirror site ! wormbase.sanger.ac.uk

What you told us to do! User survey in Nov/Dec 2005 had 761 respondants 1)Website navigation and speed 2)Gene structures (see poster P159) 3)Genetic map (see poster P157) 4)Phenotypes 5)Literature search 6)Use of other nematode genomes 7)Community forum / wiki

What you told us to do! User survey in Nov/Dec 2005 had 761 respondants 1)Website navigation and speed 2)Gene structures (see poster P159) 3)Genetic map (see poster P157) 4)Phenotypes 5)Literature search 6)Use of other nematode genomes 7)Community forum / wiki

WormBase wiki

WormBase wiki

* Example 1 List all synonyms for the following genes; bli-1, egl-43, lag-1. * Example 2 From all genes in C.elegans that have an ortholog in C. briggsae, are located in chromosome III, are sterile in an RNAi screen, and have annotated UTRs, provide a FASTA file containing peptide sequence. * Example 3 Download the set of all RNAi experiments that resulted in an Emb phenotype, and in which the target genes are classified as serine/threonine kinases.

WormMart Based on the BioMart software Originally developed at EBI/WTSI for Ensembl, Various deployments – WormBase, UniProt, Gramene. WormMart Launched in April 2005, Replacement for “Batch Genes” and (eventually) “Batch Sequences” pages, Seven WormBase objects are currently described; “Gene”, “GO_term”, “Expression pattern”, “Phenotype”, “RNAi”, “Variation” and “Paper”. Development is driven largely by user feedback.

E WS140 Gene WS144 Expression Pattern Gene Phenotype RNAi Gene Upstream and downstream sequences for all miRNA genes that lie on C. elegans chromosome II

Coding mi RNA mRNA ncRNA Pseudo miRNA I II III IV X II

Features Structures Sequences

Search WormBase on Search for “ egl mutants related to hormones ”

What you told us to do! User survey in Nov/Dec 2005 had 761 respondants 1)Website navigation and speed 2)Gene structures (see poster P159) 3)Genetic map (see poster P157) 4)Phenotypes 5)Literature search 6)Use of other nematode genomes 7)Community forum / wiki

Comparative genomics Which species ? What will we do with them ? When will this happen ?

Nematode phylogeny

What we’ll do... semi-curated gene set based on various predictors Protein set protein annotation ( PFAM, InterPro, tmhmm, signalp ) blastp blastx Whole genome alignment * ortholog assignment * Pretty much the same as we have with C. briggsae

C.briggsae gene page

KOGS / InParanoid Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.Remm M, Storm CE, Sonnhammer ELJ Mol Biol : The COG database: new developments in phylogenetic classification of proteins from complete genomes.Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EVNucleic Acids Research :22-28

TreeFam worm genes “TreeFam is a database of phylogenetic trees of animal genes. It fits a gene tree into the universal species tree and finds historical duplications, speciations and losses event”

Compara “The Ensembl Compara multi-species database stores the results of genome-wide species comparisons calculated for each data release. The database includes Comparative genomics: Whole genome alignments Synteny regions Comparative proteomics: Orthologue predictions Paralogue predictions Protein family clusters”

What you told us to do! User survey in Nov/Dec 2005 had 761 respondants 1)Website navigation and speed 2)Gene structures (see poster P159) 3)Genetic map (see poster P157) 4)Phenotypes 5)Literature search 6)Use of other nematode genomes 7)Community forum / wiki

Developing a controlled vocabulary for the description of phenotpyes Will allow high level and more detailed descriptions to be contained in a hierarchial, browsable structure. Fine grained enough to distinguish between specific experimental definitions of a phenotype if required. Phenotype ontology

Sequence report page lists Transcripts and Microarray assays falling within the span of each fosmid. interpolated genetic map position DNA sequence Options to expand lists of EST, waba and blast alignments, Repeats and RNAi expts. Link to order from Vancouver Fosmids

“WormBook is a comprehensive, open-access collection of original, peer-reviewed chapters covering topics related to the biology of Caenorhabditis elegans (C. elegans). WormBook also includes WormMethods, an up-to-date collection of methods and protocols for C. elegans researchers.” Currently hold 107 chapters. wormbook.sanger.ac.uk WormBook

Useful files you may not know about best_blastp_hits.WS157.gz -- Best blastP hit for each worm protein CE00081,WP:CE24153,4.8e-121,ENSEMBL:ENSP ,4e-07,BP:CBP23671,4e-117,FLYBASE:CG7971-PD,2.3e-06 Oligo_set WBGeneID Gene_sequence_nameGene_typeMicroarray_type cea WBGene AC3.9CDSGSC at WashU *oligo_mapping.gz for affy, agilent and gsc chips ( 3 files ) cdna2orf.WS157.gz cDNACDS yk1288c01.3,H22K11.1 confirmed_genes.WS157.gz - FASTA fomat file of CDSs with full transcript evidence geneIDs.WS157.gz Gene_idCGC nameSeq name WBGene ,abf-1,C50F2.9 pcr_product2gene.WS157.gz pcr_productGene_id (cgc_name)Seq name sjj_C55H1.2WBGene (gpa-10),C55H1.2 intergenic_sequences.dna.gz >Gene_id_Gene_idChromosomeStart coordlength >WBGene _WBGene CHROMOSOME_I 16832, len: 687 atgttggcaggttttttcagtagtttttgagtgaaaatagaggtaaaaagacagaaaatc aataaaaaatgaaaacaaaactatgaaaaatggttgaaaatcgagcaaaaatcgttcaaa

Why isn’t data from paper X in WormBase ? List of s and forms where data can be submitted. User submitted data is PRIORITISED over normal curation pipelines For large or novel data sets contact us asap - before publication - confidentiality agreed or

Wellcome Trust Sanger Institute Paul Davis Richard Durbin Michael Han Anthony Rogers Mary Ann Tuli Gary Williams Cold Spring Harbor Laboratory Payan Canaran Jack Chen Tristan Fiedler Todd Harris Sheldon McKay Will Spooner Lincoln Stein California Institute of Technology Igor Antoshechkin Carol Bastiani Juancarlos Chan Wen Chen Ranjana Kishore Raymond Lee Hans-Michael Mueller Cecilia Nakamura Andrei Petcherski Gary Schindelman Erich Schwarz Paul Sternberg Kimberly Van Auken Daniel Wang Washington University at St. Louis Tamberlyn Bieri Darin Blasiar Phil Ozersky John Spieth