ABSTRACT WormBase is a freely available information resource primarily for the nematode Caenorhabditis elegans but which progressively includes data from.

Slides:



Advertisements
Similar presentations
SRI International Bioinformatics 1 Genome Browser Markus Krummenacker Bioinformatics Research Group SRI, International Q
Advertisements

EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Mutation Analysis Server Nagarajanlab. © Copyright 2005, Washington University School of Medicine. 2 Agenda Mutation pipeline overview High level design.
Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site :
Systems Biology Biological Sequence Analysis
UniProt - The Universal Protein Resource
Creating a … Community Database Organism-Specific Database Model-Organism Database.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
WormBase: A Resource for the Biology & Genome of C. elegans Lincoln D. Stein.
CTRP User Call April 3, 2013 Gene Kraus CTRP Program Director.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Todd J. Treangen, Steven L. Salzberg
Mary Ann Tuli Advisory Board Meeting, CSHL 2005 WormBase and the CGC Mary Ann Tuli.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.
05/04/2005 Informatics Meeting C. elegans – “Back To The Future”. Paul Davis (aka Huey)
Community Curation in FlyBase 10 ways that researchers can help improve FlyBase data.
SI team data access Scott Horner 28 September 2009.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Use cases for Tools at the Bovine Genome Database Apollo and Bovine QTL viewer.
NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
On-line data submission training California Partnership for Achieving Student Success.
Improving Curation Efficiency: User Contributions and Textpresso-Based Semi-Automation SAB 2008 WormBase Literature Curators Textpresso.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Organizing information in the post-genomic era The rise of bioinformatics.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
+ => Bioinformatics: from Sequence to Knowledge Outline: Introduction to bioinformatics The TAU Bioinformatics unit Useful bioinformatics issues and databases:
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Overview of Bioinformatics 1 Module Denis Manley..
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
SRI International Bioinformatics 1 Genome Browser Markus Krummenacker Bioinformatics Research Group SRI, International Q
Curation Tools Gary Williams Sanger Institute. SAB 2008 Gene curation – prediction software Gene prediction software is good, but not perfect. Out of.
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
SRI International Bioinformatics 1 Genome Browser Tomer Altman Bioinformatics Research Group SRI, International August 19th, 2009.
Oct.27, 2003 Curator Meeting, Oct Gene Expression Curation ~WormBase, 2003 ~
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics and Computational Biology
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense.
CECAS Updates EC Program Directors’ Institute March 9, 2010.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Advisory Board Meeting, CSHL 2005 Developments at Sanger Anthony Rogers Wellcome Trust Sanger Institute.
Advisory Board Meeting, Caltech 2004 Genome Sequence Updates. Paul Davis The Sanger Institute.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Copyright OpenHelix. No use or reproduction without express written consent1 1.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Sequence Curation Paul Davis Sanger Institute. Overview Sequence curation within WormBase consortium. Import of sequence data. Prediction stats. Work.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
Plasmodium falciparum (3D7) - published in Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version.
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
GMAP Grant Management, Application, and Planning Consolidate Application Training.
Advisory Board Meeting, Caltech 2004 Sequence curation in WormBase Sanger Institute, Hinxton & GSC, St Louis.
Mary Ann Tuli Presented by Anthony Rogers
Mary Ann Tuli Presented by Anthony Rogers
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
NGS Analysis Using Galaxy
EPConDB: Endocrine Pancreas Consortium Database
Access to Sequence Data and Related Information
To the OASIS Work in Progress Online Training Course
Genetic Data in Mary Ann Tuli.
1. C. briggsae sequence curation 2. SNP data handling
Welcome - webinar instructions
Presentation transcript:

ABSTRACT WormBase is a freely available information resource primarily for the nematode Caenorhabditis elegans but which progressively includes data from other related nematodes (C.briggsae for example). WormBase operates a fortnightly release cycle where each build is made available on the website ( Over the past year we have continued to improve and incorporate new data sets (large, and small) into the data resource. WormBase encourages users to submit their data and observations via and online submission forms ( as we strive to improve the usability and content of the data resource. WormBase is constantly evolving and recently WormBase has introduced Frozen Release Versions to better accommodate the different needs of the scientific community. WormBase works very closely with the CGC (Caenorhabditis Genetics Center) to adhere to the established naming conventions and improve genetic data. WormBase's user community continues to grow as more data resources are incorporated into the database. To continue the consistent browsing experience users enjoy a second mirror site has been established to relieve load on the main site ( WormBase, A Resource for Nematode Biology: Paul Davis – The Sanger Institute WHATS NEW? 24 new releases last year. WS97 (07 Mar 2003) - WS121 (12 Mar 2004) Three ‘frozen’ releases (WS100/110/120) More and more data has been integrated into wormbase. This includes: ~42,000 new C. elegans ESTs ~60,000 new other nematode ESTs Orfeome data (Reboul et al. 2003, Nature Genetics) Brigpep (Stein et al. 2003, PLoS Biol.) Anatomy names & terms Antibody Gene regulation data New set of deletion alleles ( National Bioresource Project, Japan ) Protein 3D Structures NESG - Northeast Structural Genomics Consortium Extra stability build process is much more robust more testing of data on development website. WS120 (04) WS97 (03) WS75 (02) connected to sequence Growth of locus names How Has the Genome Changed: New sequence data is derived from a variety of sources. 3rd party clone data. Repeat assembly data. Transcript data. Resolution of N’s (St Louis). Genomic Sequence Errors. Gene predictions may have an incorrect structure compared to available experimental data. Curators check through generated lists of potential problems: Introns confirmed by transcript data but not in a prediction. Small Introns. Transcript data matching introns. WormBase Users in structure corrections, observations, falsely assigned pseudogenes as well as family structure studies, some of which help to identify sequencing errors. Data Increase since last year: Needs of Different Users. Within the worm community there are different needs from the sequenced genome. Researchers requiring latest, accurate sets of gene predictions. Bioinformatics groups wanting stability to perform global analyses. Catering for Different Needs. WormBase 2 week release cycle. Good for research groups interested in subsets of genes as allows quick turnaround of corrections and data. Introduction of WormBase “Frozen” release versions. Take place every 10 releases since WS100. Hosted on a separate website ( Remain available on ftp site. Stability and insulation from constant changes. Coordinate research and reference specific archived version Contact WormBase: TRACKING GENES IN WormBase. The Problem: No way of tracking gene name changes No unique, stable identifier for a ‘gene’ Worm genes first existed as Locus objects e.g. dpy-1 Then genes existed as Sequence objects e.g. F31D4.3 Some genes exist as both Locus and Sequence objects Gene names change frequently. Genomic Change Resolution of Repeat misassemblies Frozen Release Latest Sequence update + Other Sequence corrections Sequence Updates. A list of potential sequencing errors was compiled over the last 2 years. Archived projects were checked (Sanger) or Genomic PCR was conducted over the problem region (GSC). New sequence files were created. Incorporated into clone linkage groups. Data base rebuilt. Sequencing errors resolved meaning a large number of gene predictions were revised. Known sequencing errors still exist in problematic clones. Incomplete Archive. Some early clone projects not available. Poor quality, unfinished projects. Strategy for resolving this issue. Genomic PCR of known problems. WHAT WE HAVE DONE. New Gene database class to store genes New Gene_name class to aid querying Devolution of Sequence class: This required a large number of changes to acedb. RNA sequences transferred to new Transcript class Protein-coding sequences to go in new CDS class Pseudogenes get their own class Introduced Gene objects (April) Part 1: replaced use of Locus objects (WS124) WS124/125 will remain in testing. Part 2: replace links to CDS where relevant (WS126) WS126 will be the 1 st release to be available on the main site. All genes will have a stable ID. FUTURE. Transfer Gene IDs to MySQL database to store details on genes: Track merges, splits etc. Add history information to genes Located at Sanger, accessible to all in WormBase.