Testing Bacterial Proteins for Evidence of Horizontal Gene Transfer James Godde, John Iverson, Kabi Neupane, and Sara Penhale
Repetitive DNA Found in abundance in Eukaryotes Only 1% of the human genome encodes protein, while more than half of the genome consists of repetitive DNA. Relatively rare in Prokaryotes Nearly 89% of the E. coli genome encodes protein, while less than 1% consists of repetitive DNA
Different Classes of Repetitive DNA
CRISPRs What is a CRISPR? –Clustered Regularly Interspersed Short Palindromic Repeats –Class of repeats found exclusively in prokaryotes How widespread are they? Frequency of Occurrence Unknown What is their function? Function Unknown How did they get there in the first place? Mode of Transmission Unknown
What are Cas genes? In addition to the CRISPR sequences themselves, there are a number of genes usually found in close association with the regions of repetitive DNA These genes were termed Cas (CRISPR associated) genes There are 4 Cas genes which have been characterized to date. The function of each gene can be guessed at due to similarities they share with known genes: Cas 1 is homologous to a DNA repair gene Cas 2 is homologous to a transposase Cas 3 is homologous to a helicase Cas 4 is homologous to RecB exonuclease
Finding Cas genes Cas genes were found by using NCBI BLAST to search for homologs to previously characterized Cas genes (Jansen et al., 2002), as well as to any newly characterized ones In addition to homology with other genes, Cas genes had to be located near CRISPR sequences themselves
Cas 1Cas 2Cas 3 Cas 4
Formation of a total evidence tree Cas genes have been found in 115 different species of prokaryotes Analysis was limited to the 58 species for which sequence data were available for all four Cas genes Protein sequences for all Cas genes were concatenated and aligned using Clustal W Combined dataset was used to draw a neighbor-joining tree with MacVector
Classical rRNA-based Phylogeny Archaea Eukarya Bacteria Yang et al., 2005
Classical rRNA-based Phylogeny Archaea Bacteria Yang et al., 2005
Method:Neighbor Joining; Best Tree; tie breaking = Random Distance:Absolute (# differences) Gaps distributed proportionally Nanoarchaem Pyrococcus hor 1 Archaeoglobus 2 Methanobacterium Thermotoga Rubrobacter Clostridium ther Desulfobacterium 2 Thermoanaerobacter Fusobacterium Moorella 2 Porphyromonas Bacteroides Methanosarcina bar Methanosarcina acet Methanococcus Pyrococcus hor 2 Pyrococcus fur Chloroflexus Corynebacterium Chlorobium 2 Desulfovibrio desul Rhodospillium 1 Salmonella typhi CT18 Salmonella typhimurium E. coli K12 E. coli 0157 Geobacter sulf Photobacterium (mega) Sulfolobus tok Sulfolobus sol Archaeoglobus 1 Methanosarcina maz Leptospira (lai) Streptococcus pyo 1 Streptococcus aga 2603 Streptococcus aga NEM316 Streptococcus pyo 2 Streptococcus mut Moorella 1 Geobacter meta Methylococcus Magnetococcus Chlorobium 1 Desulfovibrio vul (mega) Shewanella (Sargasso Sea) Rhodospillium 2 Xanthomonas Chromobacterium Azotobacter Bacillus halo Desulfobacterium 1 Pyrobaculum aero Thermus HB8 (mega) Synechocystis (mega) Nostoc pun Nostoc Archaea
Method:Neighbor Joining; Best Tree; tie breaking = Random Distance:Absolute (# differences) Gaps distributed proportionally Nanoarchaem Pyrococcus hor 1 Archaeoglobus 2 Methanobacterium Thermotoga Rubrobacter Clostridium ther Desulfobacterium 2 Thermoanaerobacter Fusobacterium Moorella 2 Porphyromonas Bacteroides Methanosarcina bar Methanosarcina acet Methanococcus Pyrococcus hor 2 Pyrococcus fur Chloroflexus Corynebacterium Chlorobium 2 Desulfovibrio desul Rhodospillium 1 Salmonella typhi CT18 Salmonella typhimurium E. coli K12 E. coli 0157 Geobacter sulf Photobacterium (mega) Sulfolobus tok Sulfolobus sol Archaeoglobus 1 Methanosarcina maz Leptospira (lai) Streptococcus pyo 1 Streptococcus aga 2603 Streptococcus aga NEM316 Streptococcus pyo 2 Streptococcus mut Moorella 1 Geobacter meta Methylococcus Magnetococcus Chlorobium 1 Desulfovibrio vul (mega) Shewanella (Sargasso Sea) Rhodospillium 2 Xanthomonas Chromobacterium Azotobacter Bacillus halo Desulfobacterium 1 Pyrobaculum aero Thermus HB8 (mega) Synechocystis (mega) Nostoc pun Nostoc Proteobacteria
Conclusions The total evidence tree is a good representation of the individual Cas gene trees, and can be used to draw the same conclusions The trees support the hypothesis that Cas genes have been passed via horizontal gene transfer More work is required to eliminate the alternate hypothesis that the trees reflect convergent evolution in response to similar environments
Yang, S. Doolittle, R. F., and Bourne, P. E Phylogeny determined by protein domain content. PNAS 102: Jansen, R., van Embden, J. D., Gaastra, W., and Schouls, L. M Identification of genes that are associated with DNA repeats in prokaryotes. Mol. Microbiol. 43: References