Bacteriophage Gene Functions

Slides:



Advertisements
Similar presentations
Unknown Phage 11 Leah Liston Cluster Identification Unknown 11 is identified with Cluster Q Other Phage in Cluster Q – Long time singleton: Giles – Evanesce.
Advertisements

Bacterial viruses. Very complex shape, requiring 20 gene products for assembly - Capsid (head), contains linear dsDNA genome - Tail, consists of sheath.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis.
Bioinformatics and Phylogenetic Analysis
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
Psi-Blast: Detecting structural homologs Psi-Blast was designed to detect homology for highly divergent amino acid sequences Psi = position-specific iterated.
Sequence alignment, E-value & Extreme value distribution
Bacteriophage Gene Functions Welkin Pope SEA-PHAGES In Silico Workshop, 2014.
Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer. Customizing the Content: The placeholders in this.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Advancing Science with DNA Sequence Data Curation in IMG-ER Natalia Ivanova MGM Workshop May 16, 2012.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Sackler Medical School
Protein and RNA Families
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Bacteriophage Gene Functions Welkin Pope SEA-PHAGES Bioinformatics Workshop, 2015.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Bacillus Phage Annotation Phage Lab II, Spring 2015.
MCB 3421 class 25. student evaluations Please go to husky CT and complete student evaluations !
What is BLAST? Basic BLAST search What is BLAST?
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource Kutbuddin Doctor, PhD.
Structure of T4 Bacteriophage
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
What is BLAST? Basic BLAST search What is BLAST?
Bacterial infection by lytic virus
bacteria and eukaryotes
Bacterial infection by lytic virus
Bio/Chem-informatics
Protein Families, Motifs & Domains.
Basics of BLAST Basic BLAST Search - What is BLAST?
Demo: Protein Information Resource
Sequence based searches:
[Rz/Rz1, LysB/LysC, gp u/v] proteins of Lytic Cassette
SEA-PHAGES Bioinformatics Workshop Overview
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
Genome Annotation Continued
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Predicting Genes in Actinobacteriophages
BLAST.
BLAST.
Identify D. melanogaster ortholog
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
GOMASHI ANNOTATION GENES 1-13
TF candidate selection pipeline.
Figure 1a. Insertion of sequence into Claudi capsid gene
Welkin Pope SEA-PHAGES Bioinformatics Workshop, 2017
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Bacteriophage Gene Functions Welkin Pope SEA-PHAGES Bioinformatics Workshop, 2016

Anaya Virion structural and assembly genes, i.e. those encoding proteins that are either components of virion particles or assist in their formation. These include genes encoding the terminase, portal, capsid maturation protease, scaffolding protein, major capsid protein, head to tail connectors, major tail subunit, tail assembly chaperones, tape measure protein, and minor tail proteins. Genes involved in phage DNA replication. These include DNA polymerase, DNA primase, DNA helicase, nucleotide metabolism genes, and ssDNA binding proteins. Genes involved in life cycle regulation. These include various regulators such as repressors and activators, integrases, recombination directionality factors, etc. Genes involved in lysis, including endolysins (referred to as Lysin A in the mycobacteriophages), Lysin B, and Holins. Other genes, including transcription factors, toxin/anti-toxin systems, peptidases, phosphatases, host gene homologues, methylases, nucleases, and DNA binding proteins, among others.

Structural genes

Bob Duda

Functional Assignments with Alignment Tools Databases: Phagesdb nr at GenBank PDB Conserved domain pFam Alignment Tools BLASTp HHpred Browsers Website-phagesdb Website-NCBI Website-HHpred Phamerator DNA Master PECAAN

Databases Phagesdb.org: Phamerator nr at GenBank Contains Actinobacteriophage genes and genomes Maintained and curated by University of Pittsburgh Phamerator Contains phage genomes and gene sequences, gene phamilies Maintained and curated by JMU and Pitt (linked to phagesdb) nr at GenBank Contains all nucleotide and protein sequences Additional db are RefSeq and Conserved Domain Database Curated by overworked GenBank staff (not phage experts) Anyone can submit anything to nr

Databases, cont. Conserved domain database (CDD) Actually five databases of multiple sequence alignments (housed at NCBI): pFam, SMART, COGS, PRK, TIGRFAM pFam (protein families) Multiple sequence alignments of conserved protein sequences (housed at EBI) Protein data bank Consists of crystal structure coordinates. Data is very high quality, however it is limited by what can be crystallized.

Alignment tools BLASTp – pairwise local word-based alignments of protein sequences. You can control your scoring matrix and word size (defaults are usually fine) Sacrifices optimal alignment for speed Scores are influenced by length of the alignment (longer alignments score more highly) and database size E value and % alignment You may need to manually check to see if critical residues are present for enzymatic activity

Conserved Domain Database

HHpred HHPred Probability and % alignment uses a combination of multiple sequence alignments and Profile Hidden Markov Models to find remote homologs: proteins that may be distantly related to your query sequence and share the same function. Uses iterative alignments, somewhat like PSI-BLAST Control number of iterations and databases you want to search Probability and % alignment You may still need to manually check and see if critical conserved residues are present

HHPred

Synteny Phage gene order is *somewhat* conserved Use Phamerator or DNA Master Genome Comparison Tool Terminase  Portal  Capsid Maturation Protease  Scaffolding  Major Capsid Subunit  Major Tail Subunit  Tail Assembly Chaperones  Tape measure  Minor Tail Proteins Lysis (lysins, holins) Integration cassette DNA metabolism/Replication

Additional secondary structure prediction tools Phagesdb.org/links

An advanced exploration RosiePosie gene 5

Assigning Functions and function sources in your Annotation File   First: Phamerator/Phagesdb. Include phagename and gene number, e value, and % alignment Most of these will be supported by HHpred and BLAST on NCBI.  Second: If you find a new function NOT in Phamerator/Phagesdb or in conflict with the Phamerator/Phagesdb assignment, include in your notes: HHpred, Include probability score and approximate % of match length. Or -- BLASTp on NCBI. Include e value, species/phagename, and approximate % of match length.  Finally: Include any other support you would like to. (Run TMHMM on a putative holin, and find two transmembrane domains? Write it down! Find one unlabeled gene between the portal and the major capsid protein? Sounds like a good candidate for the capsid maturation protease, assigned via synteny!)

SEA-PHAGES functional assignments a standardized SEA-PHAGES function list SEA-PHAGES functional assignments USE Do NOT use Example terminase, small subunit TerS Sisi_1 terminase, large subunit TerL Sisi_2 terminase If there are not two obvious large and small terminase genes in the same genome, just assign the function "terminase". TM4_4 portal protein head to tail connector TM4_5 scaffolding protein Scaffold Sisi_5 capsid maturation protease Protease, prohead protease Sisi_4 major capsid protein capsid Sisi_6 head-to-tail connector protein   Sisi_7,8,9,10 major tail protein major tail subunit Sisi_11 tail assembly chaperone Tail scaffolding protein TM4_15; 16 Note: case matters. GenBank wants functions written all lower-case (except when using conventional protein labels derived from genes eg “LacZ”)

Bacteriophage HK97 (gp3) (gp4) (gp5, mcp) Conway, Duda, and Hendrix