Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Similar presentations


Presentation on theme: "Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)"— Presentation transcript:

1 Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

2 Phylogenetic tree of Bacteria  Recall: Planctomycetes are one of the GEBA genomes, representing an under-represented phylum within domain Bacteria GEBA: Genomic Encyclopedia of Bacteria & Archaea Insert Figure 1 from Handelsman (2004) Microbiol. Mol. Biol. Rev. 68: 669-685.

3 Recent phylogenetic analysis using 23S rRNA gene supports the monophyletic grouping and branch order for these four bacterial phyla Insert Figure 4A from Pilhofer et al. (2008) Characterization and Evolution of Cell Division and Cell Wall Synthesis Genes in the Bacterial Phyla Verrucomicrobia, Lentisphaerae, Chlamydiae, and Planctomycetes and Phylogenetic Comparison with rRNA Genes. J Bacteriology 190: 3192-3202.

4 http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=126

5 The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between two sequences. Conserved Domain Database Search (CDD) finds sequence similarity with genes in conserved orthologous groups (COGs).

6 Verifying Function Based on Sequence Conservation Different types of BLAST searches –blastp –blastn –blastx –tblastn –tblastx http://www.ncbi.nlm.nih.gov/ >35% identity to experimentally characterized protein (especially in conserved regions) can be considered good evidence for function E-value  less than 10 -3 is significant  equal to or less than 10 -15 may indicate good match Be cautious of auto-annotated gene function – GenBank not a curated database Beware!!! Mindless BLAST – Similarity score and E-value do not tell whole story! Must also consider length of match (query coverage) & biological function (organismal context)

7  Follow this link from the lab notebook BLAST: Altschul et al. (1997) Nucleic Acids Research 25: 3389-2402. Genbank: Benson et al. (2006) Nucleic Acids Research 35: D21 – D25.

8

9 Retrieve query sequence from first module in imgACT Lab Notebook

10 Copy amino acid sequence in FASTA format from in imgACT Lab Notebook

11 Paste query sequence into box “Click”

12 WHAT YOU SHOULD SEE... BLAST RESULTS Scroll down

13 Accession ID Top significant hit Start with first hit...  Click on Accession ID

14 NOTE: Top hit is from class organism; Do not include results in P. limnophilus in lab notebook

15 Accession ID Next significant hit  Click on Accession ID

16 NOTE: Function assigned by automatic Gene Caller (not experimentally verified) Copy/paste this information into imgACT notebook

17 Reminder: Make sure you are in EDIT mode when making changes to imgACT notebook and SAVE your work along the way Return to BLAST results for this information

18 “Click” on Bit score

19  Copy/paste into imgACT notebook: Length of alignment Score Expect (E-value) Identities Positives Gaps Pair-wise alignment between “Query” and “Sbjct” sequences. Pair-wise alignment with statistics (including E-value) Sequence length of database hit (not alignment length)

20 NOTE: You need to modify your notebook for requested info (statistics include E-value)  REPEAT procedure with second BLAST hit. 725

21 “Click” on Bit score “Click” on Accession ID Copy/paste requested information in lab notebook 733

22 CDD: Conserved Domain Database Bi-directional best hit in curated database COG genes have sequence similarity & functional conservation COG 1 – ion transport COG 2 – energy production COG 3 – cell division etc. Figure from Sanders-Lorenz and Miller (2010)

23  Return to top of BLAST Results page CDD: Marchler-Bauer et al. (2006) Nucleic Acids Research 35: D237-D240.

24  “Click” on Conserved Domain image “Click”

25 + If there are no hits, write “no significant hits” in notebook If there are hits, scroll down & click the + sign next to the top hit Click here

26  Copy top COG hit and COG name into notebook Modify BOX to include length, bit score, and E-value COG hit COG name Length, bit score, and E-value COG description

27  Change headings and enter COG information as shown for top hit  If obtain more than one significant hit, record this info for at least the top 2 hits  Hint: Look at Score & E-value

28 Retrieve from Gene Detail page

29 How do I return to the Gene Detail page for my proposed gene? “Click” on URL saved for your gene during first module (week 2)

30 Then what? Keep the Gene Detail page open in separate tab while working on imgACT Lab Notebook modules Scroll down

31 “Click” here on Gene Detail page

32 Change to 40

33 Note the red arrow corresponds to your gene  Plus strand genes on top (right to left)  Minus strand genes on bottom (right to left) Is your gene a stand alone ORF or is it clustered with other genes on same DNA strand and in same orientation?  Could be evidence that your gene is part of an operon  What are the functions of adjacent genes? Do they have related function? How conserved is the gene neighborhood?  Are there similar patterns in other organisms that contain a gene from same orthologous group?  If considerably different, may be evidence for HGT

34 Need to save individual panels as JPEG or PNG files. Include P. limnophilus as well as 4-5 different organisms in imgACT notebook.

35 “Click” here to insert images into notebook Delete ‘gene neighborhood images’ and place cursor in the box

36 1- Click “Browse” to find image file. 2- Press “Attach” button. Thumbnail image should appear in window. 3- Repeat for each individual neighborhood panel until all are loaded in the window prompt.

37 4- Next, select one image at a time and press [OK] to insert them into imgACT notebook at cursor position. NOTE: The images should be inserted in same order that the organisms were listed in img/edu Insert next image

38 Results: Ortholog Neighborhood Scroll down

39 Enter comments about homology & context: Is your gene a stand alone ORF or is it clustered with other genes or same DNA strand and in same orientation?  Could be evidence that your gene is part of an operon  What are the functions of adjacent genes? Do they have related function? How conserved is the gene neighborhood?  Are there similar patterns in other organisms that contain a gene from same orthologous group?  If considerably different, may be evidence for HGT

40 Retrieve from Organism Details page Retrieve from Gene Detail page

41 On Gene Detail page, you will find the GC content for your gene.

42 To find GC content for the entire P. limnophilus genome, select “Find Genomes” tab from the Gene Detail page.

43 Search for Planctomyces limnophilus and click on the corresponding hyperlink.

44 Scroll down WHAT YOU SHOULD SEE...

45 GC content will be listed under Genome Statistics.

46 NOTE: A gene with a GC content that is more than a few percentage points above or below the the average GC content in the genome may have originated from another organism by HGT. Add a comment box & make note of this if your gene meets this criterion.


Download ppt "Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)"

Similar presentations


Ads by Google