Download presentation
Presentation is loading. Please wait.
Published byGeorgina Clark Modified over 8 years ago
1
IslandPath: A computational aid for identifying genomic islands that may play a role in microbial pathogenicity William Hsiao 1 *, Nancy Price 2, Ivan Wan 3, Steven J. Jones 3, and Fiona S. L. Brinkman 1. 1 Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, 2 Department of Medical Genetics, University of British Columbia, Vancouver, and 3 Genome Sequence Centre, B.C. Cancer Agency, British Columbia, Canada Abstract As more genomes from bacterial pathogens are sequenced, it is becoming apparent that a significant proportion of virulence factors are encoded in clusters of genes, termed Pathogenicity Islands (reviewed in 1 ). These islands and other genomic islands, tend to have atypical guanine and cytosine content (%G+C), contain mobility genes (e.g. transposases and integrases), and are associated with tRNA sequences. We have developed a web-based computational tool, IslandPath, to aid the visualization of these features in a full genome display in order to facilitate the identification of genes in new genome sequences that may be involved in virulence or have horizontal origins. The ability to visualize these features within the genomic context can facilitate better detection of the genomic island borders and neighbouring genes. Atypical %G+C by itself is not indicative of the horizontal origin of the sequence involved, however, the predictive power increases when such regions are associated with mobile elements, direct repeats, or contain genes with similarity to known virulence factors. Therefore, we are incorporating into IslandPath algorithms to detect partial tRNAs in new genomic sequences that are likely to be the reminiscent of phage insertion events, and are also comparing the genomic sequences to a custom-built database of a subset of known virulence factors. Preliminary results are encouraging through our investigation of the ability of IslandPath to visualize known Pathogenicity Islands as distinct regions within the genomes. This computational tool also permitted us to perform a more in-depth analysis of %G+C variance in genomes and enabled us to detect correlations not previously reported. As more and more genome data become available, tools like IslandPath, which can be updated in an automated fashion, will become valuable for genomic research. Acknowledgements This project is funded by the Peter Wall Institute for Advanced Studies. We wish to thank Tatiana Tatusov of NCBI for providing helpful files for IslandPath and acknowledge the efforts of the many genome projects that have made our analysis possible. www.pathogenomics.bc.ca/brinkman Methods: Core scripts written in Perl and CGI/Perl Sequence Data: NCBI Genome FTP site Potential mobility elements: COG analysis 2,3 plus keyword scan RNA locations: NCBI data plus tRNAscan-SE 4 %G+C calculated for each ORF Mean and Std. Dev. for all ORFs in genome calculated File containing all ORF information used to generate a graphical representation Virulence Gene Subset (VGS) database developed through literature analysis of genes identified as virulence factors using the “Molecular Koch’s Postulates” (i.e. gene knockout affects virulence) Bacterial Pathogens Primary Diseases Cellular Localization # of ORFs %G+C Mean (ORFs >300bp) %G+C S.D. (ORFs >300bp) Neisseria meningitidis serogroup B strain MC58 meningitisextracellular202552.46.9 Neisseria meningitidis serogroup A strain Z2491 meningitisextracellular212152.66.5 Xylella fastidiosaCitrus variegated chlorosis extracellular276653.45.4 Escherichia coli O157:H7 (E. coli O157:H7_EDL933) diarrhoeafacultative intracellular 5361 (5349) 51.1 (51.9) 5.3 (5.3) Mycoplasma pneumoniae M129 mycoplasmal pneumonia ("walking pneumonia") extracellular67740.34.9 Yersinia pestis strain CO92bubonic plague and Pneumonic plague facultative intracellular 388548.34.7 Streptococcus pneumoniae TIGR4 (S. pneumoniae R6) bacterial pneumonia, meningitis, sepsis, and otitis media extracellular2094 (2043) 40.3 (40.4) 4.4 (4.3) Treponema pallidum Nichols syphilisextracellular103151.44.2 Mycoplasma pulmonismurine respiratory mycoplasmosis extracellular78227.23.8 Pseudomonas aeruginosa PAO1 variety of mucosal infections (opportunistic) extracellular556567.03.8 Rickettsia conorii Malish 7Mediterranean spotted fever obligate intracellular 137432.43.8 Ureaplasma urealyticum serovar 3 urethritisextracellular61325.83.8 Vibrio cholerae N16961choleraextracellularI: 2736 II: 1092 I: 48.1 II: 46.9 I: 3.7 II: 4.3 Borrelia burgdorferi B31Lyme diseasefacultative intracellular 85128.73.6 Streptococcus pyogenesscarlet fever, toxic shock like syndrome extracellular169638.93.6 Mycoplasma genitalium G37 urethritis (opportunistic, usually HIV patients) extracellular48431.43.5 Campylobacter jejuni NCTC11168 gastroenteritisextracellular165430.63.5 Helicobacter pylori 26695 (H. pylori J99) peptic ulcers and gastritisextracellular1566 (1491) 39.4 (39.7) 3.4 (3.3) Haemophilus influenzae Rd-KW20 upper respiratory infection meningitis extracellular170938.53.4 Mycobacterium tuberculosis CDC1551 (M. tuberculosis H37Rv) tuberculosisfacultative intracellular 4187 (3918) 65.5 (65.6) 3.3 (3.3) Pasteurella multocida PM70 fowl cholera, cattle septicemia, etc. extracellular201440.83.3 Rickettsia prowazekii Madrid E epidemic typhusobligate intracellular 83430.13.3 Staphylococcus aureus Mu50 (S. aureus N315) food poisoning, toxic shock syndrome, necrotizing fascitis extracellular2714 (2595) 33.3 (32.2) 3.0 (3.0) Mycobacterium lepraeLeprosyobligate intracellular 272060.02.9 Agrobacterium tumefacien C58 (Cereon) crown gall (in plants)Extracellularc:2721 l:1833 c: 59.8 l: 59.7 c: 2.7 l: 2.9 Chlamydophila pneumoniae AR39 (C. pneumoniae J138) [C. pneumoniae CWL029] chlamydial pneumoniaobligate intracellular 1110 (1070) [1052] 41.1 (41.1) [41.1] 2.6 (2.6) [2.6] Chlamydia trachomatis Dchlamydiaobligate intracellular 89441.52.3 Chlamydia muridarum MoPn chlamydiaobligate intracellular 90940.82.2 %G+C Analysis for Complete Genome Sequences: Non-pathogens# of ORFs %G+C Mean (ORFs >300bp) %G+C S.D. (ORFs >300bp) Escherichia coli K12428951.34.7 Discussion: IslandPath appears to be an effective automated tool to visualize and detect genomic islands. Previous reports have expressed concern about the use of %G+C to detect HGT; however, these reports were examining %G+C for individual genes. We propose that %G+C analysis is effective if clusters of genes containing motifs associated with mobility elements are considered. Foreign genes with similar %G+C to the organism’s genome are not detected, and due to gene amelioration, only “recent” HGT can be detected. This tool represents one approach that can be complemented with others, to prioritize particular genomic islands that merit further research. Future developments: Virulence factor homology search (based on comparison to our VGS dataset) Alternative DNA signatures (e.g. codon usage) Allow users to input their own sequences for analysis %G+C Analysis General Observations: High %G+C variance is associated with species with evidence of recent horizontal gene transfers (e.g. N. meningitidis). Low %G+C variance is associated with highly clonal species and species with no evidence of horizontal gene transfers (e.g. Chlamydia species, which are obligate intracellular microbes thought to have been ecologically isolated from other bacteria for a longer period than other obligate intracellular bacteria). %G+C variance is similar for single species, with the exception of the two V. cholerae chromosomes and two E. coli strains. However, chromosome II of V. cholerae appears to have originated from a megaplasmid captured by Vibrio 5. For E. coli, pathogenic strain O175:H7 has higher %G+C variance. This might be due to the presence of PAI and other potentially horizontally transferred genetic elements. Frequencies of ORF %G+C in Genomes: Histograms of frequencies of %G+C were plotted for several organisms. Observations: Lowest kurtosis occurs most commonly with a mode of 33.33% for %G+C values of ORFs in a genome (e.g. M. jannaschii DSM2661) This G+C value corresponds to maximum A/T in synonymous sites for the standard codon usage table. Long tails in the frequency plots occur more frequently downward (e.g. H. pylori J99 and N. meningitidis) than upward These observations likely reflect either a bias in gene identification in high G+C genomes, or a selection to higher A+T content. Detection of Proposed or Potential Genomic Islands: Escherichia coli O157:H7: Area displayed in white rectangle is ~ 28kb in size (from 3708kbp to 3736kbp) and contains Type III Secretion proteins Epr’s, Epa’s, and Eiv’s; and numerous hypothetical proteins with unknown functions Vibrio cholerae chromosome I: Area displayed in red rectangle is ~ 34kb in size (from 1896kbp to 1930kbp) and contains a tRNA-ser in the same orientation as the phage integrase downstream of it. The ORFs contain one putative helicase, one chemotaxis protein MotB-related protein, one putative type I restriction enzyme HsdR, one putative DNA methylase, one putative N-acetylneuraminate lyase, one C4-dicarboxylate-binding periplasmic protein, and numerous hypothetical proteins and conserved hypothetical proteins. tRNA when adjacent to an abnormal %G+C region is often observed to be in the same orientation as the stretch. This might be an artefact of phage insertion and excision events as 3’ end of tRNA are common phage attachment (att) sites. Horizontal Gene Transfer and Bacterial Pathogenicity: Several types of mobile elements have been shown to carry virulence factors: Transposons: ST enterotoxin genes in E. coli Prophages: Shiga-like toxins in EHEC Diptheria toxin gene Cholera toxin Botulinum toxins Plasmids: Shigella, Salmonella, Yersinia Pathogenicity Islands: Uro/Entero-pathogenic E. coli Salmonella typhimurium Yersinia spp. Helicobacter pylori Vibrio cholerae References 1 Hacker J and Kaper JB, 2000, Annu Rev Microbiol. 54:641-79 2 Tatusov RL, et al., 1997, Science 278(5338):631-7 3 Tatusov RL, et al., 2001, Nucleic Acids Res. 29(1)22-8 4 Lowe TM and Eddy SR, 1997, Nucleic Acids Res. 25(5):955-64 5 Heidelberg JF, et al., 2000, Nature 406:477-84 Whole Genome (predicted) ORF Display: Genome ORFs are displayed to allow interesting regions (rich in mobility genes, abnormal %G+C, close to structural RNAs) to be viewed in a genome context. E.g. H. Pylori 26695 Genome Several low %G+C regions can be seen in the graphic display: = CAG island = plasticity zone (contain different genes for J99 and 26695) = region contains virB homologues; not present in strain J99 Detection of Known Pathogenicity Islands: Vibrio cholerae chromosome I: VPI (toxin regulated pili) VPI delineated as a stretch of low %G+C region flanked by mobility genes %GC S.D. Location OrientationProduct 56.48 +1 2140840..2142861 - pesticin/yersiniabactin receptor protein 58.81 +2 2142992..2144569 - yersiniabactin siderophore biosynthetic protein 58.33 +2 2144573..2145376 - yersiniabactin biosynthetic protein YbtT 60.40 +2 2145373..2146473 - yersiniabactin biosynthetic protein YbtU 60.79 +2 2146470..2155961 - yersiniabactin biosynthetic protein 60.15 +2 2156049..2162156 - yersiniabactin biosynthetic protein 56.35 +1 2162347..2163306 - transcriptional regulator YbtA 57.29 +1 2163473..2165275 + lipoprotein inner membrane ABC-transporter 58.62 +2 2165262..2167064 + inner membrane ABC-transporter YbtQ 59.48 +2 2167057..2168337 + putative signal transducer 55.25 +1 2168365..2169669 + putative salicylate synthetase 52.65 2169863..2171125 - integrase Yersinia pestis strain CO92: High Pathogenicity Island core (in red rectangle) Mean: 47.9 STD DEV: 4.9 IslandPath Graphical Display: Each dot in a graphic corresponds to a predicted protein-coding ORF in the genome. Dot colours indicate if an ORF has a higher or lower %G+C than cutoffs you set (default settings are +/- 3.48* of the mean %G+C). You may click on a dot to view a portion of an annotation table presented below the graphic. 3.48 = 1.5 S.D. of the mean for Chlamydia genomes, which are proposed to have undergone no recent horizontal gene transfer (data not shown).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.