Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mastering Microbes with Microchips Fiona Brinkman Fiona Brinkman Department of Molecular Biology and Biochemistry Department of Molecular Biology and Biochemistry.

Similar presentations


Presentation on theme: "Mastering Microbes with Microchips Fiona Brinkman Fiona Brinkman Department of Molecular Biology and Biochemistry Department of Molecular Biology and Biochemistry."— Presentation transcript:

1 Mastering Microbes with Microchips Fiona Brinkman Fiona Brinkman Department of Molecular Biology and Biochemistry Department of Molecular Biology and Biochemistry Simon Fraser University, Greater Vancouver, British Columbia, Canada

2 What I won’t talk about! 1.Pseudomonas Genome Database: Model for continually-updated genome annotation and analysis 2.Microarray analysis software development for the Pathogenomics (FPMI) Project

3 How can we best combat infectious disease causing-bacteria? How can we best combat infectious disease causing-bacteria?

4

5

6

7 + = RankName Kills 1. Fiona 54 2. Ryan 0

8 + =

9 Pathogens and The Art of War “What is of supreme importance in war is to attack the enemy's strategy. Next best is to disrupt his alliances by diplomacy. The next best is to attack his army. And the worst policy is to attack cities.”

10 Pathogens and The Art of War “And the worst policy is to attack cities.”

11 Infectious Diseases – There must be a better way… Infectious Diseases – There must be a better way…  Leading cause of productivity loss  Responsible for two thirds of deaths of persons under age 40

12 Pathogens and The Art of War “What is of supreme importance in war is to attack the enemy's strategy.” strategy = virulence factors 

13 Art Pathogens and The Art of War  “Attack your enemy where he is unprepared” Boost innate immune system

14 How can we best combat pathogens? A. Identify pathogen proteins more likely to be… 1.…virulence factors - VGS Database and IslandPath 2.…quickly accessible to drugs/immune system (cell surface) - PSORT-B B. Identify human genes involved in boosting our innate immune system Summary of insights and lessons learned…

15 Virulence Gene Subset (VGS) Database Based on literature analysisBased on literature analysis Experimentally determined virulence factorsExperimentally determined virulence factors Extensive information in separate fieldsExtensive information in separate fields –Species information –Gene/Protein information –Gene knockout information relevant to virulence studies –Infection assay information –References

16 Horizontal Gene Transfer and Virulence Factors Transposons: ST enterotoxin genes in E. coli Prophages: Shiga-like toxins in EHEC Diptheria toxin gene, Cholera toxin Botulinum toxins Plasmids: Shigella, Salmonella, Yersinia Pathogenicity Islands: Uro/Entero-pathogenic E. coli Salmonella typhimurium Yersinia spp. Helicobacter pylori Vibrio cholerae

17 Pathogenicity Islands Associated with –Atypical %G+C –tRNA sequences –Transposases, Integrases and other mobility genes –Flanking repeats

18 IslandPath: Aiding identification of Pathogenicity Islands and other Genomic Islands Yellow circle = high %G+C Pink circle = low %G+C Region of unusual dinucleotide bias tRNA gene lies between the two dots rRNA gene lies between the two dots Both tRNA and rRNA lie between the two dots Dot is named a transposase Dot is named an integrase _ Hsiao et al. (2003) Bioinformatics 19: 418-420

19 Genome divided into “ORF-clusters” of 6 consecutive ORFs Dinucleotide relative abundance is calculated for the region as  * XY = f* XY /f* X f* Y where f* X denotes the frequency of the mononucleotide X f* XY the frequency of the dinucleotide XY For each ORF cluster, the average absolute dinucleotide relative abundance difference is where f (fragment) is derived from sequences in an ORF-cluster g (genome) is derived from all predicted ORFs in the genome Dinucleotide bias analysis Hsiao et al. (2003) Bioinformatics 19: 418-420

20 Dinucleotide bias analysis “ORF-clusters” sampled in an overlapping manner (shift by one ORF at a time) The mean is calculated by averaging the results from all ORF-clusters in the genome Regions with greater than 1 standard deviation away from the mean are marked on the IslandPath graphical display with strikethrough lines Why did we use 6 ORFs per cluster? - Not enough bp in a single ORF to get a good estimate - 4.5kb (corresponding to approximately 6-8 ORFs) is required for “reliable estimation of nucleotide composition” (Lawrence and Ochman, J Mol Evolution 1997 44:383-97)

21 1 7 11 20 22 33 34 35 36 II I V IV III VI VII VIII IX X 32 Boxes: Known islands in the Salmonella typhi genome

22 What features best predict Islands? Examined prevalence of features in over 200 known islands 94% of islands contain >25% dinucleotide bias (majority have >75% dinucleotide bias coverage)94% of islands contain >25% dinucleotide bias (majority have >75% dinucleotide bias coverage) Mobility genes identified in >75% (but ID recently improved)Mobility genes identified in >75% (but ID recently improved) Atypical %G+C (above cutoff used in Brinkman et al., 2002) not over 50% coverage on average, and tRNA genes not observed with >50% of known islandsAtypical %G+C (above cutoff used in Brinkman et al., 2002) not over 50% coverage on average, and tRNA genes not observed with >50% of known islands

23 1 3 7 11 18 20 22 33 34 35 36 II I V IV III VI VII VIII IX X 32 1 5 6 9 10 12 13 14 15 17 21 22 24 32 33 34 35 36 Boxes: “Insertions” in the Salmonella typhi genome verses Salmonella typhimurium

24 Properties of genes in these islands? Defined a “putative island” as –8 or more genes in a row with dinucleotide bias Functional category analysis  Any difference for genes in islands verses genome?

25 P value of Paired T test (66 organisms): 4e-19 Hypothetical genes are more common in putative islands vs the rest of the genome

26 Why are hypothetical genes more common within putative islands/dinucleotide biased regions? 1.Genes being horizontally acquired in bacteria come from a large pool of as yet unstudied genes? 2.Genes are being miss-predicted within these regions because of the region’s different genomic composition?  Testing hypothesis 2: - Genes <300 bp in size are more likely to be false positives - Therefore, remove genes less than 300 bp and reanalyze

27 P value of Paired T test (55 organisms): 0.027

28 P value of Paired T test (66 organisms): 3e-17

29 Other categories more common in islands COG functional category Paired T test p value Hypothesis to test Translation, ribosomal structure and biogenesis 4.6e-8 Ribosome operons highly expressed and so have unusual bp composition and falsely ID’d as islands Cell motility 6e-3 Mix of above and below hypotheses Secretion0.02 Reflects nature of acquired subnetworks and how they must interact with the environment?

30 Aquiring genes = Acquiring subnetworks Most functional categories involve cytoplasmic proteins Secretion category more associated with subcellular localization and possible subnetworks that would easy to add to an existing cell network bacterial cell

31 What does all this mean? 1.Acquired genes may come from a large pool of genes of which many are still uncharacterized? 2.Acquired genes = acquired subnetworks …that involve interactions that cross cell membranes? 3.What predicted gene dataset you use can have a significant effect on downstream analyses. 4.Analyzing correlations is difficult! Keep testing those hypotheses!

32 Future studies 1.Vary the analysis approach - Same result with other functional category classification systems - More precise criteria for identifying islands - Different dinucleotide bias calculation? 2.Examine in the context of gene expression data 3.Statistical modeling of the data (Dana Aeschliman and Jenny Bryan)

33 How can we best combat pathogens? A. Identify pathogen proteins more likely to be… 1.…virulence factors - VGS Database and IslandPath 2.…quickly accessible to drugs/immune system (cell surface) - PSORT-B B. Identify human genes involved in boosting our innate immune system Summary of insights and lessons learned…

34 Subcellular Localization Prediction Annotation Experimental design Functions Drug/vaccine targets

35 www.psort.org/psortb Web-based subcellular localization prediction toolWeb-based subcellular localization prediction tool Score for each of 5 primary Gram -ve localization sitesScore for each of 5 primary Gram -ve localization sites –PSORT I does not predict extracellular proteins –Also returns “unknown” (PSORT I forces a prediction) Trained and tested using a dataset of proteins of experimentally- verified subcellular localizationTrained and tested using a dataset of proteins of experimentally- verified subcellular localization –Constructed manually through literature review –Largest dataset of its kind Analyzes 6 biological features using 6 modulesAnalyzes 6 biological features using 6 modules –More comprehensive than existing tools

36 PSORT-B Modules Signal peptides: Non-cytoplasmic Amino acid composition/patterns: Cytoplasmic  All localizations - Support Vector Machine’s trained with aa composition  subsequences Transmembrane helices: Inner membrane - HMMTOP PROSITE motifs: All localizations Outer membrane motifs: Outer membrane - Association-rule mining to identify Homology to proteins of experimentally known localization: All localizations - “SCL-BLAST” against database of pro of known localizations - E=10e-10 and Length restriction of 80-120% vs both subject and query Integration with a Baysian Network

37 Of Precision, Recall and Accuracy… Of Precision, Recall and Accuracy… PSORT- B designed for high precision (97% specificity, )PSORT- B designed for high precision (97% specificity, ) –PSORT I’s specificity measured at 59% However, recall lower (75% sensitivity, ) which affects overall measure of accuracyHowever, recall lower (75% sensitivity, ) which affects overall measure of accuracy –PSORT I recall 60% New version to be released this yearNew version to be released this year TPTP+FP TPTP+FN

38 Insights Gained During Development Localization is an highly evolutionarily conserved traitLocalization is an highly evolutionarily conserved trait –Conserved between Gram-positives and Gram-negatives (for localizations present in both classes) –Reflection of the: Need for cell to conserve subcellular networks? Different environments of each localization?

39 Insights Gained During Development Identified motifs characteristic of outer membrane proteins through a data mining approach (Martin Ester, Ke Wang, and others)Identified motifs characteristic of outer membrane proteins through a data mining approach (Martin Ester, Ke Wang, and others) –Motifs (~6 aa long) map primarily to periplasmic turn regions of known 3D structures –May reflect importance of periplasmic turns in a transmembrane beta-barrel structure vs. other similar non-membrane barrel structures Periplasmic turns 

40 Analysis of bacterial proteomes What proportion of proteins are of a particular subcellular localization?What proportion of proteins are of a particular subcellular localization? Investigating the hypothesis:Investigating the hypothesis: –The proportion of membrane proteins increases in those organisms inhabiting a greater variety of environments Analysis of the deduced proteomes from 77 bacterial genome projects.Analysis of the deduced proteomes from 77 bacterial genome projects.

41

42

43

44

45 PSORT-B prediction Proportion of total predicted proteins %st dev. Cytoplasmic 30 %5.9 % Cytoplasmic Membrane 57 %5.8 % Periplasmic 7.6 %3.1 % Outer Membrane 3.8 %1.9 % Extracelluar 1.3 %0.8 %

46 What does this mean? 1.Protein localization is very conserved 2.Increased genome size = increase in networks Therefore, conservation in localization proportions indicates that new networks being added tend to traverse localizations 3.Note: Can’t discount biases in unpredicted proteins, but new PSORT-B version will help confirm results

47 Summary Converting pathogens and boosting rapid defenses may be the way to win the war against pathogensConverting pathogens and boosting rapid defenses may be the way to win the war against pathogens Identifying virulence factors is criticalIdentifying virulence factors is critical Acquired genes, including virulence factors, may come from a large pool of genes that are predominantly uncharacterized.Acquired genes, including virulence factors, may come from a large pool of genes that are predominantly uncharacterized. Acquired genes = acquired subnetworks that involve interactions that tend to traverse subcellular boundaries.Acquired genes = acquired subnetworks that involve interactions that tend to traverse subcellular boundaries.

48 www.pathogenomics.sfu.ca/brinkman The Brinkman Lab Genome Prairie Genome BC Inimex NSERC Ray Karsten Geoff Sébastien Matt Jenn Will Mike Fiona Anastasia “The other Alison Fiona” Dana Aeschliman Jenny Bryan Martin Ester Ke Wang Rong She Christopher Walsh All Software freely available and open source

49 FPMI INDUSTRY Inimex Pharma Inc ACADEMIA VIDO, U Sask UBC, SFU, BCGSC GOVERNMENT Genome Canada Genome Prairie Genome BC Govt of Saskatchewan Functional Pathogenomics of Mucosal Immunity www.pathogenomics.ca

50


Download ppt "Mastering Microbes with Microchips Fiona Brinkman Fiona Brinkman Department of Molecular Biology and Biochemistry Department of Molecular Biology and Biochemistry."

Similar presentations


Ads by Google