Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Using Basic FormulasUsing Basic Formulas Lesson 4 © 2014, John Wiley & Sons, Inc.Microsoft Official Academic Course, Microsoft Word Microsoft Excel.
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Bioinformatics and Phylogenetic Analysis
Creating And Maintaining A Database. 2 Learn the guidelines for designing databases When designing a database, first try to think of all the fields of.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Annotation Presentation Alternative Start Codons &
Structure-based Evidence for Function (TIGRfam, Pfam and PDB)
Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
 First time student activates their google account, they need to go to an internet browser and go to  drive.google.com/a/students.macon.k12.nc.usdrive.google.com/a/students.macon.k12.nc.us.
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Introduction to Gene Mining Part B: How similar are plant and human versions of a gene? After completing part B, you will demonstrate How to use NCBI BLASTp.
T-COFFEE Multiple Alignments of Orthologous Sequences Horizontal Gene Transfer (Phylogenetic Trees) WebLogo.
Pathway Assignments. The assignment – Annotating Pathways KEGG Pathway Database.
Adding Content to the Agency Web Site - Part 2. Adding individual web pages for success stories Agency Web Site Adding Content 2, Slide 2Copyright © 2004,
Overview. What is Annotation? Annotation is the process of determining the location and function of all identifiable genes in a genome. Annotation is.
1.Getting Started 2.Modifying Design 3.Page 4.News 5.Events 6.Photo Gallery 7.Newsletter Index Training 15 th Mar., 2011.
Fall 2005 Using FrontPage to Enhance Blackboard - Darek Sady1 Using FrontPage to Enhance Blackboard 1.Introduction 2.Starting FrontPage 3.Creating Documents.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Copyright OpenHelix. No use or reproduction without express written consent1.
Region 3 Playbook for Website Friday, October 14, 2011 Rose Buchanan Hardin Construction Company Region 3 Webmaster Greater Orlando Chapter 73.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
1 LSM2241 AY0910 Semester 2 MiniProject Briefing Round 5.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Instructions for using this template. Remember this is Jeopardy, so where I have written “Answer” this is the prompt the students will see, and where.
Rev.04/2015© 2015 PLEASE NOTE: The Application Review Module (ARM) is a system that is designed as a shared service and is maintained by the Grants Centers.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Know your computer Make a Folder Copy from Word to Composer Format the Font Change the Alignment Format the Background Format the Colors Insert a Picture.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Copyright OpenHelix. No use or reproduction without express written consent1.
Prepared by the Academic Faculty Members of IT. Tables Creating Tables. Merging Cells. Splitting Cells. Sorting Tables. Performing Calculations.
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Creating a Google Site For a Digital Portfolio Purpose.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Bacterial infection by lytic virus
Bacterial infection by lytic virus
Annotation Presentation
Genome Annotation Continued
MODULE 7 Microsoft Access 2010
Bioinformatics and BLAST
BLAST.
Comparative Genomics.
What do you with a whole genome sequence?
Annotation Presentation
Basic Local Alignment Search Tool
Explore Evolution: Instrument for Analysis
Basic Local Alignment Search Tool (BLAST)
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Shelly Cashman: Microsoft Excel 2016
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
BLAST Slides adapted & edited from a set by
Presentation transcript:

Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Phylogenetic tree of Bacteria  Recall: Planctomycetes are one of the GEBA genomes, representing an under-represented phylum within domain Bacteria GEBA: Genomic Encyclopedia of Bacteria & Archaea Insert Figure 1 from Handelsman (2004) Microbiol. Mol. Biol. Rev. 68:

Recent phylogenetic analysis using 23S rRNA gene supports the monophyletic grouping and branch order for these four bacterial phyla Insert Figure 4A from Pilhofer et al. (2008) Characterization and Evolution of Cell Division and Cell Wall Synthesis Genes in the Bacterial Phyla Verrucomicrobia, Lentisphaerae, Chlamydiae, and Planctomycetes and Phylogenetic Comparison with rRNA Genes. J Bacteriology 190:

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between two sequences. Conserved Domain Database Search (CDD) finds sequence similarity with genes in conserved orthologous groups (COGs).

Verifying Function Based on Sequence Conservation Different types of BLAST searches –blastp –blastn –blastx –tblastn –tblastx >35% identity to experimentally characterized protein (especially in conserved regions) can be considered good evidence for function E-value  less than is significant  equal to or less than may indicate good match Be cautious of auto-annotated gene function – GenBank not a curated database Beware!!! Mindless BLAST – Similarity score and E-value do not tell whole story! Must also consider length of match (query coverage) & biological function (organismal context)

 Follow this link from the lab notebook BLAST: Altschul et al. (1997) Nucleic Acids Research 25: Genbank: Benson et al. (2006) Nucleic Acids Research 35: D21 – D25.

Retrieve query sequence from first module in imgACT Lab Notebook

Copy amino acid sequence in FASTA format from in imgACT Lab Notebook

Paste query sequence into box “Click”

WHAT YOU SHOULD SEE... BLAST RESULTS Scroll down

Accession ID Top significant hit Start with first hit...  Click on Accession ID

NOTE: Top hit is from class organism; Do not include results in P. limnophilus in lab notebook

Accession ID Next significant hit  Click on Accession ID

NOTE: Function assigned by automatic Gene Caller (not experimentally verified) Copy/paste this information into imgACT notebook

Reminder: Make sure you are in EDIT mode when making changes to imgACT notebook and SAVE your work along the way Return to BLAST results for this information

“Click” on Bit score

 Copy/paste into imgACT notebook: Length of alignment Score Expect (E-value) Identities Positives Gaps Pair-wise alignment between “Query” and “Sbjct” sequences. Pair-wise alignment with statistics (including E-value) Sequence length of database hit (not alignment length)

NOTE: You need to modify your notebook for requested info (statistics include E-value)  REPEAT procedure with second BLAST hit. 725

“Click” on Bit score “Click” on Accession ID Copy/paste requested information in lab notebook 733

CDD: Conserved Domain Database Bi-directional best hit in curated database COG genes have sequence similarity & functional conservation COG 1 – ion transport COG 2 – energy production COG 3 – cell division etc. Figure from Sanders-Lorenz and Miller (2010)

 Return to top of BLAST Results page CDD: Marchler-Bauer et al. (2006) Nucleic Acids Research 35: D237-D240.

 “Click” on Conserved Domain image “Click”

+ If there are no hits, write “no significant hits” in notebook If there are hits, scroll down & click the + sign next to the top hit Click here

 Copy top COG hit and COG name into notebook Modify BOX to include length, bit score, and E-value COG hit COG name Length, bit score, and E-value COG description

 Change headings and enter COG information as shown for top hit  If obtain more than one significant hit, record this info for at least the top 2 hits  Hint: Look at Score & E-value

Retrieve from Gene Detail page

How do I return to the Gene Detail page for my proposed gene? “Click” on URL saved for your gene during first module (week 2)

Then what? Keep the Gene Detail page open in separate tab while working on imgACT Lab Notebook modules Scroll down

“Click” here on Gene Detail page

Change to 40

Note the red arrow corresponds to your gene  Plus strand genes on top (right to left)  Minus strand genes on bottom (right to left) Is your gene a stand alone ORF or is it clustered with other genes on same DNA strand and in same orientation?  Could be evidence that your gene is part of an operon  What are the functions of adjacent genes? Do they have related function? How conserved is the gene neighborhood?  Are there similar patterns in other organisms that contain a gene from same orthologous group?  If considerably different, may be evidence for HGT

Need to save individual panels as JPEG or PNG files. Include P. limnophilus as well as 4-5 different organisms in imgACT notebook.

“Click” here to insert images into notebook Delete ‘gene neighborhood images’ and place cursor in the box

1- Click “Browse” to find image file. 2- Press “Attach” button. Thumbnail image should appear in window. 3- Repeat for each individual neighborhood panel until all are loaded in the window prompt.

4- Next, select one image at a time and press [OK] to insert them into imgACT notebook at cursor position. NOTE: The images should be inserted in same order that the organisms were listed in img/edu Insert next image

Results: Ortholog Neighborhood Scroll down

Enter comments about homology & context: Is your gene a stand alone ORF or is it clustered with other genes or same DNA strand and in same orientation?  Could be evidence that your gene is part of an operon  What are the functions of adjacent genes? Do they have related function? How conserved is the gene neighborhood?  Are there similar patterns in other organisms that contain a gene from same orthologous group?  If considerably different, may be evidence for HGT

Retrieve from Organism Details page Retrieve from Gene Detail page

On Gene Detail page, you will find the GC content for your gene.

To find GC content for the entire P. limnophilus genome, select “Find Genomes” tab from the Gene Detail page.

Search for Planctomyces limnophilus and click on the corresponding hyperlink.

Scroll down WHAT YOU SHOULD SEE...

GC content will be listed under Genome Statistics.

NOTE: A gene with a GC content that is more than a few percentage points above or below the the average GC content in the genome may have originated from another organism by HGT. Add a comment box & make note of this if your gene meets this criterion.