BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.

Slides:



Advertisements
Similar presentations
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Advertisements

MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
On line (DNA and amino acid) Sequence Information Lecture 7.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
SOLUTION: Source page understanding – Table interpretation Table recognition Table pattern generalization Pattern adjustment Information extraction & semantic.
Protein and Function Databases
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
On line (DNA and amino acid) Sequence Information
Sequence Analysis with Artemis & Artemis Comparison Tool (ACT) South East Asian Training Course on Bioinformatics Applied to Tropical Diseases (Sponsored.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
New Tools Samifier: A tool which converts results from protein tandem mass spectrometry into SAM format. This enables co-visualization of genomics, transcriptomics,
Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Biological Databases By : Lim Yun Ping E mail :
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
An Automated System for Deep Proteome Annotation Gary Van Domselaar †, Savita Shrivastava, Paul Stothard and David S. Wishart ‡ Unannotated Protein Sequence.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
An Automated System for Deep Proteome Annotation Gary Van Domselaar September 27, 2003.
Savita Shrivastava Feb 25 th, 2005 Lab Presentation BASys A Web Server for Automated Bacterial Annotation.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
A collaborative tool for sequence annotation. Contact:
Introduction to biological molecular networks
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
What is BLAST? Basic BLAST search What is BLAST?
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
What is BLAST? Basic BLAST search What is BLAST?
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Functional and structural genomics using PEDANT
bacteria and eukaryotes
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Basics of BLAST Basic BLAST Search - What is BLAST?
Demo: Protein Information Resource
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Bioinformatics and BLAST
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Basic Local Alignment Search Tool (BLAST)
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Basic Local Alignment Search Tool
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli Dong, Paul Lu, Duane Szafron, Russ Greiner, and David S. Wishart ‡ Departments of Computing Science and Biological Sciences University of Alberta Edmonton AB T6E 2E9 † ‡ Abstract BASys (Bacterial Annotation System) is a web server that supports automated, in-depth annotation of bacterial genomic (chromosomal, plasmid, and contig) sequences. It accepts raw DNA sequence data and an optional list of gene identification information and provides extensive textual and hyperlinked image output. BASys uses more than 30 programs to determine nearly 60 annotation subfields for each gene, including gene/protein name, GO function, COG function, possible paralogues and orthologues, molecular weight, isoelectric point, operon structure, subcellular localization, signal peptides, transmembrane regions, secondary structure, 3-D structure, reactions, and pathways. The depth and detail of a BASys annotation matches or exceeds that found in a standard SwissProt entry. BASys also generates colourful, clickable and fully zoomable maps of each query chromosome to permit rapid navigation and detailed visual analysis of all resulting gene annotations. The textual annotations and images that are provided by BASys can be generated in approximately 24 hours for an average bacterial chromosome (5 Megabases). BASys annotations may be viewed and downloaded anonymously or through a password protected access system. The BASys server and databases can also be downloaded and run locally. BASys is accessible at: Abstract BASys (Bacterial Annotation System) is a web server that supports automated, in-depth annotation of bacterial genomic (chromosomal, plasmid, and contig) sequences. It accepts raw DNA sequence data and an optional list of gene identification information and provides extensive textual and hyperlinked image output. BASys uses more than 30 programs to determine nearly 60 annotation subfields for each gene, including gene/protein name, GO function, COG function, possible paralogues and orthologues, molecular weight, isoelectric point, operon structure, subcellular localization, signal peptides, transmembrane regions, secondary structure, 3-D structure, reactions, and pathways. The depth and detail of a BASys annotation matches or exceeds that found in a standard SwissProt entry. BASys also generates colourful, clickable and fully zoomable maps of each query chromosome to permit rapid navigation and detailed visual analysis of all resulting gene annotations. The textual annotations and images that are provided by BASys can be generated in approximately 24 hours for an average bacterial chromosome (5 Megabases). BASys annotations may be viewed and downloaded anonymously or through a password protected access system. The BASys server and databases can also be downloaded and run locally. BASys is accessible at: Genomic Sequence Data Genomic Sequence Data (Optional) Gene Identification Data (Optional) Gene Identification Data Head Node SWISSPRO T CCDB Reference DB Similarity Search Data Submission BASys supplies a web form for uploading chromosome, plasmid, or contig sequence data. Optional gene identification data can be provided, or BASys can predict protein coding regions from the genomic data using Glimmer [1]. Data Submission BASys supplies a web form for uploading chromosome, plasmid, or contig sequence data. Optional gene identification data can be provided, or BASys can predict protein coding regions from the genomic data using Glimmer [1]. E. coli D. melanogaster H. sapiens C. elegans S. cerevisiae Model Organism Similarity Search Compute Node KEGG Metabolite Analysis Sequence Analysis Pfam PROSITE PredictSPTM etc. Data Scheduling BASys is implemented as a distributed system. The head node monitors and manages the job scheduling. Annotation and report generation are carried out by the compute nodes. Data Scheduling BASys is implemented as a distributed system. The head node monitors and manages the job scheduling. Annotation and report generation are carried out by the compute nodes. Annotation Reports BASys uses CGView [3] to generate clickable genome maps for navigating the genome data. An HTML-formatted tabular summary is also provided. The genome maps are prerendered as a series of hyperlinked PNG image files. Each gene label is hyperlinked to its corresponding HTML-formatted “gene card”. The card is hyperlinked where applicable to external references. Text-only gene cards are also provided. BASys also supplies an 'evidence card' describing how each annotation was generated. The gene cards, evidence cards, and graphical genome maps are downloadable for offline viewing. Annotation Reports BASys uses CGView [3] to generate clickable genome maps for navigating the genome data. An HTML-formatted tabular summary is also provided. The genome maps are prerendered as a series of hyperlinked PNG image files. Each gene label is hyperlinked to its corresponding HTML-formatted “gene card”. The card is hyperlinked where applicable to external references. Text-only gene cards are also provided. BASys also supplies an 'evidence card' describing how each annotation was generated. The gene cards, evidence cards, and graphical genome maps are downloadable for offline viewing. References 1.Delcher AL et al. (1999) Nucleic Acid Res. 27: Ilioupoulos I et al. (2003) Bioinformatics 19: Stothard P. and Wishart DS (2005) Bioinformatics 21: References 1.Delcher AL et al. (1999) Nucleic Acid Res. 27: Ilioupoulos I et al. (2003) Bioinformatics 19: Stothard P. and Wishart DS (2005) Bioinformatics 21: Report Generation CGview Annotation Reports Annotation Reports Search Capability BASys supports online keyword searches and sequence similarity searches Search results contain hyperlinks to their gene cards and graphical genome maps. Search Capability BASys supports online keyword searches and sequence similarity searches Search results contain hyperlinks to their gene cards and graphical genome maps. BASys Annotation Pipeline The BASys annotation engine combines database comparison and computational sequence analysis in its annotation pipeline. Translated coding sequences are initially compared using BLAST to the expertly annotated reference databases UniProt and the CyberCell comprehensive molecular database on Escherichia coli. The similarity score between the query and database sequence is compared to the threshold value for each annotation type and qualifying annotations are transitively applied to the query sequence. BASys attempts to fill the remaining annotations with additional similarity searches and sequence analyses. BLAST searches are conducted against the protein sequences of C. elegans, human, yeast, and Drosophila; a non- redundant database of bacterial protein sequences, the PDB, KEGG, and COG databases. Various sequence analyses are also performed including Pfam, PROSITE, signal peptide and transmembrane domain predictions, and predicted secondary structure with PSIPRED. If the sequence has sufficient similarity to a sequence represented in the PDB database, then BASys may use HOMODELLER to generate a homology model and subsequently perform a structural analysis using VADAR. Several additional annotations, such as protein molecular weight, isoelectric point, and operon structure are calculated directly from the chromosomal, protein-coding nucleotide, and translated protein sequence data. In all collection of nearly 60 distinct annotations is generated for each gene. BASys Annotation Pipeline The BASys annotation engine combines database comparison and computational sequence analysis in its annotation pipeline. Translated coding sequences are initially compared using BLAST to the expertly annotated reference databases UniProt and the CyberCell comprehensive molecular database on Escherichia coli. The similarity score between the query and database sequence is compared to the threshold value for each annotation type and qualifying annotations are transitively applied to the query sequence. BASys attempts to fill the remaining annotations with additional similarity searches and sequence analyses. BLAST searches are conducted against the protein sequences of C. elegans, human, yeast, and Drosophila; a non- redundant database of bacterial protein sequences, the PDB, KEGG, and COG databases. Various sequence analyses are also performed including Pfam, PROSITE, signal peptide and transmembrane domain predictions, and predicted secondary structure with PSIPRED. If the sequence has sufficient similarity to a sequence represented in the PDB database, then BASys may use HOMODELLER to generate a homology model and subsequently perform a structural analysis using VADAR. Several additional annotations, such as protein molecular weight, isoelectric point, and operon structure are calculated directly from the chromosomal, protein-coding nucleotide, and translated protein sequence data. In all collection of nearly 60 distinct annotations is generated for each gene. Validation BASys annotations were compared to a set of expertly annotated proteins from C. trachomatis [2]. BASys annotations agreed with the expert annotations 762 times out of 894. The sensitivity is 94% ; the specificity is 73%. Validation BASys annotations were compared to a set of expertly annotated proteins from C. trachomatis [2]. BASys annotations agreed with the expert annotations 762 times out of 894. The sensitivity is 94% ; the specificity is 73%. Structure Analysis Homodeller VADAR PDB