Download presentation
Presentation is loading. Please wait.
Published byBrianna Bryan Modified over 8 years ago
1
Savita Shrivastava Feb 25 th, 2005 Lab Presentation BASys A Web Server for Automated Bacterial Annotation
2
BASys-Introduction A web server for automated, in-depth annotation of bacterial genomic sequence. BASys uses more then 30 programs to determine ~60 annotation subfields for each gene BASys also generates colorful, clickable and fully zoomable maps of each query chromosome Annotation and map can be generated in ~24 hrs for an average bacterial chromosome (5Mb) or 3000 genes BASys annotations may be viewed or downloaded anonymously or through a password protected access system BASys server and databases can also be downloaded and run locally BASys is available at : http://wishart.biology.ualberta.ca/basys
3
Automated genome annotation-why? Complete published genomes – 21 Archaeal –205 Bacterial – 32 Eukaryal Ongoing projects – 655 Prokaryotic genomes – 474 Eukaryotic genomes
4
Challenges Sheer volume of data The heterogeneous and growing types of annotations The time sensitivity of searches Computing power can be expensive The need to present the information in an integrated graphical fashion
5
Existing automated genome annotation systems GeneQuiz (from protein sequence to biochemical function using variety of search and analysis methods) PEDANT (focuses on protein based annotation as well as on many DNA based analysis) Genotator (gene prediction, and searches for homologs, promoters, splice sites, and ORFs) MAGPIE and Bluejay (gene description, gene taxonomic information, similarity searches, metabolic pathways, GO, etc.) GAIA (Structural annotation) TIGR CMR (gene and protein name, GO, M.W., pI and taxonomic information of organism)
6
BASys in detail Data submission and scheduling The BASys annotation engine The BASys report generator
7
Data submission and scheduling A front-end web interface for :- -Submitting the raw genomic data -Scheduling the annotations -Monitoring and reporting the annotation progress
8
Submitting the raw genomic data Anonymous access –For anonymous submission, the user is emailed a secure URL for monitoring and retrieving the progress of their annotations –For Single chromosome submission Login-based access –Register with BASys –Password-protected –Allows users to submit and monitor multiple chromosome and plasmid annotations
9
Submitting the raw genomic data BASys provides a web based form for submitting –Chromosome data as a FASTA-formatted file –Chromosome topology (circular or linear) –Gram stain subtype –Chromosome identifier
10
Submitting the raw genomic data Gene prediction using “Glimmer”, a popular gene prediction program If gene positions are already known, they can be supplied to BASys in a simple TAB-delimited format or as an NCBI’s “.ffn” formatted FASTA file “.ffn” includes the nucleotide coding sequences along with the location and direction along the chromosome.
11
Overview BASys is a distributed system operating in a clustered computing environment accommodates multiple users simultaneously performing long running, resource intensive genome annotations user genome data SwissProt CCDB Reference DB Similarity Search Model Organisms Similarity Searches Sequence Analysis Pfam PROSITE predictSPTM Etc. Structure Analysis PDB Homodeller VADAR Master node Slave node BASys annotation engine G.D. Email Host the web server and runs the queuing and scheduling system Can also issue directives to suspend, resume, restart, and remove the genome annotation jobs on the slave nodes.
12
Each slave node continually communicates its progress to the master node while generating the annotations and reports. Upon completion of the annotation job submitter is notified by email that the annotations are ready MySQL client server protocol to communicate directives and status Apache web server/HTTP protocol to transfer the sequence data and reports Monitoring and reporting the annotation progress
13
BASys in detail Data submission and scheduling The BASys annotation engine The BASys report generator
14
BASys annotation engine Function prediction Comparative annotations Structural annotations Secondary structure analysis Metabolic annotations General properties prediction
15
BASys annotation pipeline Genomic Sequence Data Gene Identification Translation Proteomic Sequence Data BLAST ExactHomolog e-10 Annotation Parser Annotations + Features Annotations Annotations from other sources Annotation Collection Pfam predictSPTMPROSITE PSORTB No hit BLAST against nr database for protein function prediction No hit Hypothetical Protein Homologues & Paralogues Structural Analysis BLAST PDB database Homodeller VADAR PsiPred Modification of secondary structure if transmembrane regions are present Structure class General Properties Operon Structure Preceding and Following Gene TargetDB Status and Availability KEGG (metabolic information) COG Information CCDB formatEvidence cardsHTML format SwissProtCCDB Check for missing annotations
16
Annotations from multiple sources Example: Sub cellular location SwissProt If gene ontology is associated with hydrolase, nuclease, endonuclease or ribonuclease activity or nucleic acid or RNA binding properties then the sub cellular locations is "Cytoplasmic“ If protein name is related with transcriptional activities then the sub cellular location is "Cytoplasmic” CCDB If transmembrane regions are present then the sub cellular location is "Membrane“ PSORTB If above cases are not true then the sub cellular location is assigned as "Cytoplasmic"
17
Example: Enzyme Classification (EC) number and it’s related field SwissProt. CCDB KEGG database Metabolic information from CCDB is transferred –When EC number from SwissProt/KEGG is matching with EC number from CCDB or –If EC number is not available from SwissProt/KEGG. Annotations from multiple sources
18
Annotation parsing CCDB format (Annotations) Text format (Annotations and evidence) HTML format ( Annotation table)
19
CCDB format Clean view Annotations are marked with –[S] if exact match to SwissProt –[H] if homology to a SwissProt entry –[C] if homology to a CCDB entry Annotations are linked to online sources i.e. Pfam, PROSITE, InterPro accession no, GI numbers from Homologues etc.
20
Text Format Provides evidence –Source of annotation, i.e. database name and version –Evidence used to support the annotation, i.e. BLAST report in case of similarity search –Quality indicator such as “marginal”, “strong” or “clear” –Time of generation of annotations
21
Table format For a quick view of annotations Shows start and end position and direction of the gene, accession no., gene name, COG id and protein function
22
BASys annotation pipeline Each analysis program is written in Object Oriented Perl also uses Bioperl library. The annotation API is fully compatible with the Bioperl project Currently the BASys system contains nearly 54 Perl modules and many small scripts with more than 60,000 lines of code defining classes and fully object-oriented code. Tried to write a fully documented code
23
BASys annotation pipeline ~8 external tools to analyze the data –Glimmer, HMMER, BLAST, Homodeller, VADAR, predictSPTM, ps_scan, etc. ~20 databases as a source of annotation –SwissProt, CCDB, nr, COG, KEGG, PROSITE, reference database of model organisms, PDB, PSORTB, TargetDB, gene ontology etc.
24
BASys and BacMap BASys annotation engine is used in BacMap to generate annotation of bacterial genomes Successfully completed annotation of 200 bacterial & archaeal genomes in NCBI
25
BASys in detail Data submission and scheduling The BASys annotation engine The BASys report generator
26
A navigable circular genome map automatically generated after the annotation are done for genome visualization and exploration. BASys uses CGView application to produce the navigable circular genome map. BASys passes annotations to CGView in the form of an XML document. CGView then renders this information as a series of hyperlinked PNG images files. Map shows annotated genes and COG category classification.
27
The BASys report generator Each identified gene is displayed and labeled on the map. Each gene is hyperlinked to gene cards containing the annotations for the gene Each gene card contains hyperlinks to evidence card for more detailed description of source and quality of the annotation and an annotation table for brief annotations.
28
Future work BLAST and text searching Manual annotation TIGRFAMs BLOCKS PRINTS
29
Publications G. H. Van Domselaar, P. Stothard, S. Shrivastava, J. Cruz, A. Guo, X. Dong, P. Lu, D. Szafron, R. Greiner, and D. S. Wishart (2005) BASys: A web server for automated bacterial genome annotation. Nucleic Acids Research (accepted). P. Stothard, G. Van Domselaar, S. Shrivastava, A. Guo, B. O'Neill, J. Cruz, M. Ellison, and D. S. Wishart (2005) BacMap: an interactive picture atlas of annotated bacterial genomes. Nucleic Acids Research 33: D317- D320.
30
Acknowledgements Prof. David Wishart Dr. Gary Van Domselaar Dr. Paul Stothard Anchi Guo Joseph Cruz Xiaoli Dong Nelson Young All the lab members and Dr. Warren Gallin
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.