Download presentation
Presentation is loading. Please wait.
Published byEstella Morton Modified over 9 years ago
1
What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards http://phage.sdsu.edu/~rob Fellowship for Interpretation of Genomes, San Diego State University, Burnham Institute for Medical Research, IMEC, LLC SIO, San Diego, May 2006
2
Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling
3
The Players FIG: Fellowship for Interpretation of Genomes NMPDR: Natl. Microbial Pathogen Data Resource BRC: NIH Bioinformatics Resource Centers SEED: The SEED database.
4
How Many Genomes Have Been Sequenced? CompleteDraftTotal Archaea Bacteria Eukarya
5
How Many Genomes Have Been Sequenced? CompleteDraftTotal Archaea261238 Bacteria Eukarya
6
How Many Genomes Have Been Sequenced? CompleteDraftTotal Archaea261238 Bacteria342238580 Eukarya
7
How Many Genomes Have Been Sequenced? CompleteDraftTotal Archaea261238 Bacteria342238580 Eukarya29533562
8
When will the 1,000th microbial genome be sequenced? 1,000 2,000 3,000 4,000 5,000 199 6 20002004 2008 X XXXX X X X X X Complete Genomes Year
9
Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling
10
http://theseed.uchicago.edu/FIG/index.cgi The SEED database developed by FIG Current version: 580 Bacteria (342 complete) 38 Archaea (26 complete) 562 Eukarya (29 complete) 1335 Viruses 2 Environmental Genomes
11
The problem: How do you generate consistent annotations for 1,000 genomes?
12
Basic biology lacZlacIlacYlacA
13
Different types of clustering < 80 %
14
Actinobacteria Aquificae Bacteroidetes Chlamydiae Chloroflexi Cyanobacteria Deinococcus- Thermus Firmicutes Spirochaetes Thermotogae Proteobacteria 1 0.8 0.6 0.4 0.2 0 Clusters of genes w/ maximum 80% identity Genes in subsystems in clusters Total number of genomes in group Fraction of genes in clusters Number of genomes 0 40 80 120 Average Occurrence of clustering in different genomes
15
Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling
16
The Subsystems Approach to Annotation Subsystem is a generalization of “pathway” –collection of functional roles jointly involved in a biological process or complex Functional Role is the abstract biological function of a gene product –atomic, or user-defined, examples: 6-phosphofructokinase (EC 2.7.1.11) LSU ribosomal protein L31p Streptococcal virulence factors Does not contain “putative”, “thermostable”, etc Populated subsystem is complete spreadsheet of functions and roles
17
Subsystems developed based on Wet lab Chromosomal context Metabolic context Phylogenetic context Microarray data Proteomics data …
18
Example Subsystem: Histidine Degradation Conversion of histidine to glutamate Functional roles defined in table Inclusion in subsystem is only by functional role Controlled vocabulary …
19
Subsystem Spreadsheet Column headers taken from table of functional roles Rows are selected genomes or organisms Cells are populated with specific, annotated genes Functional variants defined by the annotated roles Variant code -1 indicates subsystem is not functional Clustering shown by color OrganismVariant HutHHutUHutIGluFHutGNfoDForI Bacteroides thetaiotaomicron 1 Q8A4B3Q8A4A9Q8A4B1Q8A4B0 Desulfotela psychrophila 1 gi51246205gi51246204gi51246203gi51246202 Halobacterium sp. 2 Q9HQD5Q9HQD8Q9HQD6Q9HQD7 Deinococcus radiodurans 2 Q9RZ06Q9RZ02Q9RZ05Q9RZ04 Bacillus subtilis 2 P10944P25503P42084P42068 Caulobacter crescentus 3 P58082Q9A9MIP58079Q9A9M0Q9A9L9 Pseudomonas putida 3 Q88CZ7Q88CZ6Q88CZ9Q88D00Q88CZ3 Xanthomonas campestris 3 Q8PAA7P58988Q8PAA6Q8PAA8Q8PAA5 Listeria monocytogenes Subsystem Spreadsheet
20
“The Populated Subsystem” OrganismVariant HutHHutUHutIGluFHutGNfoDForI Bacteroides thetaiotaomicron 1 Q8A4B3Q8A4A9Q8A4B1Q8A4B0 Desulfotela psychrophila 1 gi51246205gi51246204gi51246203gi51246202 Halobacterium sp. 2 Q9HQD5Q9HQD8Q9HQD6Q9HQD7 Deinococcus radiodurans 2 Q9RZ06Q9RZ02Q9RZ05Q9RZ04 Bacillus subtilis 2 P10944P25503P42084P42068 Caulobacter crescentus 3 P58082Q9A9MIP58079Q9A9M0Q9A9L9 Pseudomonas putida 3 Q88CZ7Q88CZ6Q88CZ9Q88D00Q88CZ3 Xanthomonas campestris 3 Q8PAA7P58988Q8PAA6Q8PAA8Q8PAA5 Listeria monocytogenes Subsystem Spreadsheet
21
Subsystem Diagram Three functional variants Universal subset has three roles, followed by three alternative paths from IV to VI No ForI known experimentally
22
Subsystem Spreadsheet Prediction from subsystems confirmed experimentally OrganismVariant HutHHutUHutIGluFHutGNfoDForI Bacteroides thetaiotaomicron 1 Q8A4B3Q8A4A9Q8A4B1Q8A4B0 Desulfotela psychrophila 1 gi51246205gi51246204gi51246203gi51246202 Halobacterium sp. 2 Q9HQD5Q9HQD8Q9HQD6Q9HQD7 Deinococcus radiodurans 2 Q9RZ06Q9RZ02Q9RZ05Q9RZ04 Bacillus subtilis 2 P10944P25503P42084P42068 Caulobacter crescentus 3 P58082Q9A9MIP58079Q9A9M0Q9A9L9 Pseudomonas putida 3 Q88CZ7Q88CZ6Q88CZ9Q88D00Q88CZ3 Xanthomonas campestris 3 Q8PAA7P58988Q8PAA6Q8PAA8Q8PAA5 Listeria monocytogenes Subsystem Spreadsheet
23
Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling
24
How do bacteria make methionine? acquire homoserine convert cysteine to cystathione convert cystathione to homocysteine acquire met or convert homocysteine to methionine sulfur and acetylhomoserine sulfhydralase
26
? ? Missing genes
27
Cyanoseed: http://cyanoseed.theFIG.info
28
Marineseed: http://theseed.uchicago.edu/FIG/organisms.cgi?show=marine
29
predicted or measured co-regulation genome context (virulence islands, prophages, conserved gene clusters) virulence mechanism cellular localization enzymatic activity common phenotype combinations of criteria Subsystems are not just for gene clusters
30
How much progress has been made? 541 subsystems encoded 80 – 85% of the genes in core machinery are contained in subsystems 30 – 35% of genes in NMPDR organism genomes, 20 – 30% of other genomes contained in subsystems
31
Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling
32
Metagenomics 200 liters water 5-500 g fresh fecal matter DNA/RNA LASL Sequence Epifluorescent Microscopy Concentrate and purify viruses Extract nucleic acids Breitbart et al., multiple papers
33
Control datasets for metagenome comparisons Bacteria952,758 Archaea49,694 Eukarya259,653 Acid mine7,588 Sargasso (without Shewanella, Burkholderia) 960,561 Sorcerer II~13,000,000 Number of proteins in different datasets
34
Subsystems per million CDS
35
Determination of Statistical Differences Between Metagenomes Take 10,000 proteins from sample 1 Count frequency of each subsystem Repeat 20,000 times Repeat for sample 2 Combine both samples Sample 10,000 proteins 20,000 times Build 95% CI Compare medians from samples 1 and 2 with 95% CI Rodriguez-Brito (2006). BMC Bioinformatics
36
Sampling Sargasso and “SEED” metagenomes
37
Comparison of all Subsystems More in SargassoMore in SEED
38
Is serine being used as an osmolyte? Few trehalose, proline, sucrose synthetic genes Serine is most abundant amino acid in ocean (Suttle, Keil) Serine is more effective osmoprotectant than glycine betaine (Yancey)
39
Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling
40
Metagenomics 200 liters water 5-500 g fresh fecal matter DNA/RNA LASL Sequence Epifluorescent Microscopy Concentrate and purify viruses Extract nucleic acids Breitbart et al., multiple papers 454 So 2004
41
454 Sequence Data (Only from Rohwer Lab, in one year) 42 libraries –22 microbial, 20 phage 1,028,563,420 bp total –33% of the human genome –95% of all complete and partial bacterial genomes –10% of community sequencing of JGI per year 9,933,184 sequences –Average 236,511 per library Average read length 103.5 bp –Av. read length has not increased in 12 months
42
The Soudan Mine, Minnesota Red Stuff Oxidized Black Stuff Reduced
43
Red and Black Samples Are Different Cloned and 454 sequenced 16S are indistinguishable Black stuff Red Cloned Red
44
There are different amounts of metabolism in each environment
45
There are different amounts of substrates in each environment Black Stuff Red Stuff
46
But are the differences significant? Sample 10,000 proteins from site 1 Count frequency of each “subsystem” Repeat 20,000 times Repeat for sample 2 Combine both samples Sample 10,000 proteins 20,000 times Build 95% CI Compare medians from sites 1 and 2 with 95% CI Rodriguez-Brito (2006). BMC Bioinformatics
47
Subsystem differences & metabolism Iron acquisition Black Stuff Siderophore enterobactin biosynthesis ferric enterobactin transport ABC transporter ferrichrome ABC transporter heme Black stuff: ferrous iron (Fe 2+, ferroan [(Mg,Fe) 6 (Si,Al) 4 O 10 (OH) 8 ]) Red stuff: ferric iron (goethite [FeO(OH)])
48
Nitrification differentiates the samples Edwards (2006) BMC Genomics
49
The challenge is explaining the differences between samples Red Sample Arg, Trp, His Ubiquinone FA oxidation Chemotaxis, Flagella Methylglyoxal metabolism Black Sample Ile, Leu, Val Siderophores Glycerolipids NiFe hydrogenase Phenylpropionate degradation
50
We can cheaply compare the important biochemistry happening in different environments We don’t care which organisms are doing the metabolism but we know what organisms are there
51
Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling
52
Phages In The Worlds Oceans GOM 41 samples 13 sites 5 years SAR 1 sample 1 site 1 year BBC 85 samples 38 sites 8 years ARC 56 samples 16 sites 1 year LI 4 sites 1 year
53
Phages, Reefs, and Human Disturbance The Northern Line Islands Expedition, 2005 Christmas Kingman Christmas Kingman Palmyra Washington Fanning
54
16S rDNA at each island
55
16S rDNA of the Proteobacteria
56
Phages at each island
57
Christmas to Kingman Bias in No. Phage Hosts Negative numbers mean relatively more phage hosts at Kingman
58
Phages In The Worlds Oceans GOM 41 samples 13 sites 5 years SAR 1 sample 1 site 1 year BBC 85 samples 38 sites 8 years ARC 56 samples 16 sites 1 year LI 4 sites 1 year
59
Most Marine Phage Sequences are Novel
60
Thanks: Mya Breitbart Phages are specific to environments Phage Proteomic Tree v. 5 (Edwards, Rohwer) ssDNA -like T7-like T4-like
61
Marine Single-Stranded DNA Viruses 6% of SAR sequences ssDNA phage (Chlamydia-like Microviridae) 40% viral particles in SAR are ssDNA phage Several full-genome sequences were recovered via de novo assembly of these fragments Confirmed by PCR and sequencing
62
12,297 sequence fragments hit using TBLASTX over a ~4.5 kb genome SAR Aligned Against the Chlamydia 4 Individual sequence reads Chlamydia phi 4 genome Coverage Concatenated hits
63
Summary You only need to remember: Subsystems are the best way to annotate genomes 454 generates lots of data We can use subsystems to find out what is going on in the environment
64
SDSU Forest Rohwer Beltran Brito-Rodriguez Linda Wegley USF Mya Breitbart University of Bielefeld Folker Meyer Lutz Krause FIG Veronika Vonstein Ross Overbeek Gordon Pusch ANL Rick Stevens Bob Olsen Terry Disz Annotators Gary Olsen Andrei Ostermann Olga Zagnitko Olga Vassieva Svetlana Gerdes Ramy Aziz UBC Curtis Suttle Amy Chan
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.