Mastering Microbes with Microchips Fiona Brinkman Fiona Brinkman Department of Molecular Biology and Biochemistry Department of Molecular Biology and Biochemistry.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent.
Assignment of PROSITE motifs to topological regions: Application to a novel database of well characterised transmembrane proteins Tim Nugent 6 Month.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Genomic island analysis: Improved web-based software and insights into an apparent gene pool associated with genomic islands William Hsiao Brinkman Laboratory.
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
1 3. genome analysis. 2 The first DNA-based genome to be sequenced in its entirety was that of bacteriophage Φ-X174; (5,368 bp), sequenced by Frederick.
Chapter 15 The Human Genome Project and Genomics
Identification of Novel Virulence-Associated Genes via Genome Analysis of Hypothetical Genes Sara Garbom, Åke Forsberg, Hans Wolf- Watz, and Britt-Marie.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
MGH-PGA Genomic Analysis of Stress and Inflammation: Sequence Analysis of Pseudomonas aeruginosa Strain PA14 Nicole T. Liberati, Dan G. Lee, Jacinto M.
Frequent-Subsequence-Based Prediction of Outer Membrane Proteins R. She, F. Chen, K. Wang, M. Ester, School of Computing Science J. L. Gardy, F. S. L.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Article by Peter Uetz, et.al. Presented by Kerstin Obando.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Protein Modules An Introduction to Bioinformatics.
Bacterial Physiology (Micr430)
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Gene Structure and Identification
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
EXPLORING DEAD GENES Adrienne Manuel I400. What are they? Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Bioinformatics Brad Windle Ph# Web Site:
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Top X interactions of PIN Network A interactions Coverage of Network A Figure S1 - Network A interactions are distributed evenly across the top 60,000.
Jennifer Gardy The Brinkman Lab 9 th September 2004 Updates from the Protein Localization Prediction Front: PSORTb, PSORTdb and Perspectives on Predictive.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Type III Secretion System
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Cluster validation Integration ICES Bioinformatics.
Protein Properties Function, structure Residue features Targeting Post-trans modifications BIO520 BioinformaticsJim Lund Reading: Chapter , 11.7,
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
(H)MMs in gene prediction and similarity searches.
Finding genes in the genome
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
1 Computational Approaches(1/7)  Computational methods can be divided into four categories: prediction methods based on  (i) The overall protein amino.
bacteria and eukaryotes
Biotechnology.
The Mimivirus Giant double stranded DNA virus Discovered in amoebas
Target selection strategies for the mouse genome
Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets  Benjamin P. Lewis, Christopher B. Burge,
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Jun Xu, Roger W. Hendrix, Robert L. Duda  Molecular Cell 
Basic Local Alignment Search Tool
Volume 10, Issue 2, Pages (August 2011)
Integrative omic approaches for the study of host–pathogen interactions Integrative omic approaches for the study of host–pathogen interactions (A) Proteomic.
Global analysis of the chemical–genetic interaction map.
Presentation transcript:

Mastering Microbes with Microchips Fiona Brinkman Fiona Brinkman Department of Molecular Biology and Biochemistry Department of Molecular Biology and Biochemistry Simon Fraser University, Greater Vancouver, British Columbia, Canada

What I won’t talk about! 1.Pseudomonas Genome Database: Model for continually-updated genome annotation and analysis 2.Microarray analysis software development for the Pathogenomics (FPMI) Project

How can we best combat infectious disease causing-bacteria? How can we best combat infectious disease causing-bacteria?

+ = RankName Kills 1. Fiona Ryan 0

+ =

Pathogens and The Art of War “What is of supreme importance in war is to attack the enemy's strategy. Next best is to disrupt his alliances by diplomacy. The next best is to attack his army. And the worst policy is to attack cities.”

Pathogens and The Art of War “And the worst policy is to attack cities.”

Infectious Diseases – There must be a better way… Infectious Diseases – There must be a better way…  Leading cause of productivity loss  Responsible for two thirds of deaths of persons under age 40

Pathogens and The Art of War “What is of supreme importance in war is to attack the enemy's strategy.” strategy = virulence factors 

Art Pathogens and The Art of War  “Attack your enemy where he is unprepared” Boost innate immune system

How can we best combat pathogens? A. Identify pathogen proteins more likely to be… 1.…virulence factors - VGS Database and IslandPath 2.…quickly accessible to drugs/immune system (cell surface) - PSORT-B B. Identify human genes involved in boosting our innate immune system Summary of insights and lessons learned…

Virulence Gene Subset (VGS) Database Based on literature analysisBased on literature analysis Experimentally determined virulence factorsExperimentally determined virulence factors Extensive information in separate fieldsExtensive information in separate fields –Species information –Gene/Protein information –Gene knockout information relevant to virulence studies –Infection assay information –References

Horizontal Gene Transfer and Virulence Factors Transposons: ST enterotoxin genes in E. coli Prophages: Shiga-like toxins in EHEC Diptheria toxin gene, Cholera toxin Botulinum toxins Plasmids: Shigella, Salmonella, Yersinia Pathogenicity Islands: Uro/Entero-pathogenic E. coli Salmonella typhimurium Yersinia spp. Helicobacter pylori Vibrio cholerae

Pathogenicity Islands Associated with –Atypical %G+C –tRNA sequences –Transposases, Integrases and other mobility genes –Flanking repeats

IslandPath: Aiding identification of Pathogenicity Islands and other Genomic Islands Yellow circle = high %G+C Pink circle = low %G+C Region of unusual dinucleotide bias tRNA gene lies between the two dots rRNA gene lies between the two dots Both tRNA and rRNA lie between the two dots Dot is named a transposase Dot is named an integrase _ Hsiao et al. (2003) Bioinformatics 19:

Genome divided into “ORF-clusters” of 6 consecutive ORFs Dinucleotide relative abundance is calculated for the region as  * XY = f* XY /f* X f* Y where f* X denotes the frequency of the mononucleotide X f* XY the frequency of the dinucleotide XY For each ORF cluster, the average absolute dinucleotide relative abundance difference is where f (fragment) is derived from sequences in an ORF-cluster g (genome) is derived from all predicted ORFs in the genome Dinucleotide bias analysis Hsiao et al. (2003) Bioinformatics 19:

Dinucleotide bias analysis “ORF-clusters” sampled in an overlapping manner (shift by one ORF at a time) The mean is calculated by averaging the results from all ORF-clusters in the genome Regions with greater than 1 standard deviation away from the mean are marked on the IslandPath graphical display with strikethrough lines Why did we use 6 ORFs per cluster? - Not enough bp in a single ORF to get a good estimate - 4.5kb (corresponding to approximately 6-8 ORFs) is required for “reliable estimation of nucleotide composition” (Lawrence and Ochman, J Mol Evolution :383-97)

II I V IV III VI VII VIII IX X 32 Boxes: Known islands in the Salmonella typhi genome

What features best predict Islands? Examined prevalence of features in over 200 known islands 94% of islands contain >25% dinucleotide bias (majority have >75% dinucleotide bias coverage)94% of islands contain >25% dinucleotide bias (majority have >75% dinucleotide bias coverage) Mobility genes identified in >75% (but ID recently improved)Mobility genes identified in >75% (but ID recently improved) Atypical %G+C (above cutoff used in Brinkman et al., 2002) not over 50% coverage on average, and tRNA genes not observed with >50% of known islandsAtypical %G+C (above cutoff used in Brinkman et al., 2002) not over 50% coverage on average, and tRNA genes not observed with >50% of known islands

II I V IV III VI VII VIII IX X Boxes: “Insertions” in the Salmonella typhi genome verses Salmonella typhimurium

Properties of genes in these islands? Defined a “putative island” as –8 or more genes in a row with dinucleotide bias Functional category analysis  Any difference for genes in islands verses genome?

P value of Paired T test (66 organisms): 4e-19 Hypothetical genes are more common in putative islands vs the rest of the genome

Why are hypothetical genes more common within putative islands/dinucleotide biased regions? 1.Genes being horizontally acquired in bacteria come from a large pool of as yet unstudied genes? 2.Genes are being miss-predicted within these regions because of the region’s different genomic composition?  Testing hypothesis 2: - Genes <300 bp in size are more likely to be false positives - Therefore, remove genes less than 300 bp and reanalyze

P value of Paired T test (55 organisms): 0.027

P value of Paired T test (66 organisms): 3e-17

Other categories more common in islands COG functional category Paired T test p value Hypothesis to test Translation, ribosomal structure and biogenesis 4.6e-8 Ribosome operons highly expressed and so have unusual bp composition and falsely ID’d as islands Cell motility 6e-3 Mix of above and below hypotheses Secretion0.02 Reflects nature of acquired subnetworks and how they must interact with the environment?

Aquiring genes = Acquiring subnetworks Most functional categories involve cytoplasmic proteins Secretion category more associated with subcellular localization and possible subnetworks that would easy to add to an existing cell network bacterial cell

What does all this mean? 1.Acquired genes may come from a large pool of genes of which many are still uncharacterized? 2.Acquired genes = acquired subnetworks …that involve interactions that cross cell membranes? 3.What predicted gene dataset you use can have a significant effect on downstream analyses. 4.Analyzing correlations is difficult! Keep testing those hypotheses!

Future studies 1.Vary the analysis approach - Same result with other functional category classification systems - More precise criteria for identifying islands - Different dinucleotide bias calculation? 2.Examine in the context of gene expression data 3.Statistical modeling of the data (Dana Aeschliman and Jenny Bryan)

How can we best combat pathogens? A. Identify pathogen proteins more likely to be… 1.…virulence factors - VGS Database and IslandPath 2.…quickly accessible to drugs/immune system (cell surface) - PSORT-B B. Identify human genes involved in boosting our innate immune system Summary of insights and lessons learned…

Subcellular Localization Prediction Annotation Experimental design Functions Drug/vaccine targets

Web-based subcellular localization prediction toolWeb-based subcellular localization prediction tool Score for each of 5 primary Gram -ve localization sitesScore for each of 5 primary Gram -ve localization sites –PSORT I does not predict extracellular proteins –Also returns “unknown” (PSORT I forces a prediction) Trained and tested using a dataset of proteins of experimentally- verified subcellular localizationTrained and tested using a dataset of proteins of experimentally- verified subcellular localization –Constructed manually through literature review –Largest dataset of its kind Analyzes 6 biological features using 6 modulesAnalyzes 6 biological features using 6 modules –More comprehensive than existing tools

PSORT-B Modules Signal peptides: Non-cytoplasmic Amino acid composition/patterns: Cytoplasmic  All localizations - Support Vector Machine’s trained with aa composition  subsequences Transmembrane helices: Inner membrane - HMMTOP PROSITE motifs: All localizations Outer membrane motifs: Outer membrane - Association-rule mining to identify Homology to proteins of experimentally known localization: All localizations - “SCL-BLAST” against database of pro of known localizations - E=10e-10 and Length restriction of % vs both subject and query Integration with a Baysian Network

Of Precision, Recall and Accuracy… Of Precision, Recall and Accuracy… PSORT- B designed for high precision (97% specificity, )PSORT- B designed for high precision (97% specificity, ) –PSORT I’s specificity measured at 59% However, recall lower (75% sensitivity, ) which affects overall measure of accuracyHowever, recall lower (75% sensitivity, ) which affects overall measure of accuracy –PSORT I recall 60% New version to be released this yearNew version to be released this year TPTP+FP TPTP+FN

Insights Gained During Development Localization is an highly evolutionarily conserved traitLocalization is an highly evolutionarily conserved trait –Conserved between Gram-positives and Gram-negatives (for localizations present in both classes) –Reflection of the: Need for cell to conserve subcellular networks? Different environments of each localization?

Insights Gained During Development Identified motifs characteristic of outer membrane proteins through a data mining approach (Martin Ester, Ke Wang, and others)Identified motifs characteristic of outer membrane proteins through a data mining approach (Martin Ester, Ke Wang, and others) –Motifs (~6 aa long) map primarily to periplasmic turn regions of known 3D structures –May reflect importance of periplasmic turns in a transmembrane beta-barrel structure vs. other similar non-membrane barrel structures Periplasmic turns 

Analysis of bacterial proteomes What proportion of proteins are of a particular subcellular localization?What proportion of proteins are of a particular subcellular localization? Investigating the hypothesis:Investigating the hypothesis: –The proportion of membrane proteins increases in those organisms inhabiting a greater variety of environments Analysis of the deduced proteomes from 77 bacterial genome projects.Analysis of the deduced proteomes from 77 bacterial genome projects.

PSORT-B prediction Proportion of total predicted proteins %st dev. Cytoplasmic 30 %5.9 % Cytoplasmic Membrane 57 %5.8 % Periplasmic 7.6 %3.1 % Outer Membrane 3.8 %1.9 % Extracelluar 1.3 %0.8 %

What does this mean? 1.Protein localization is very conserved 2.Increased genome size = increase in networks Therefore, conservation in localization proportions indicates that new networks being added tend to traverse localizations 3.Note: Can’t discount biases in unpredicted proteins, but new PSORT-B version will help confirm results

Summary Converting pathogens and boosting rapid defenses may be the way to win the war against pathogensConverting pathogens and boosting rapid defenses may be the way to win the war against pathogens Identifying virulence factors is criticalIdentifying virulence factors is critical Acquired genes, including virulence factors, may come from a large pool of genes that are predominantly uncharacterized.Acquired genes, including virulence factors, may come from a large pool of genes that are predominantly uncharacterized. Acquired genes = acquired subnetworks that involve interactions that tend to traverse subcellular boundaries.Acquired genes = acquired subnetworks that involve interactions that tend to traverse subcellular boundaries.

The Brinkman Lab Genome Prairie Genome BC Inimex NSERC Ray Karsten Geoff Sébastien Matt Jenn Will Mike Fiona Anastasia “The other Alison Fiona” Dana Aeschliman Jenny Bryan Martin Ester Ke Wang Rong She Christopher Walsh All Software freely available and open source

FPMI INDUSTRY Inimex Pharma Inc ACADEMIA VIDO, U Sask UBC, SFU, BCGSC GOVERNMENT Genome Canada Genome Prairie Genome BC Govt of Saskatchewan Functional Pathogenomics of Mucosal Immunity