Presentation is loading. Please wait.

Presentation is loading. Please wait.

What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards Fellowship.

Similar presentations


Presentation on theme: "What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards Fellowship."— Presentation transcript:

1 What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards http://phage.sdsu.edu/~rob Fellowship for Interpretation of Genomes, San Diego State University, Burnham Institute for Medical Research, IMEC, LLC SIO, San Diego, May 2006

2 Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling

3 The Players FIG: Fellowship for Interpretation of Genomes NMPDR: Natl. Microbial Pathogen Data Resource BRC: NIH Bioinformatics Resource Centers SEED: The SEED database.

4 How Many Genomes Have Been Sequenced? CompleteDraftTotal Archaea Bacteria Eukarya

5 How Many Genomes Have Been Sequenced? CompleteDraftTotal Archaea261238 Bacteria Eukarya

6 How Many Genomes Have Been Sequenced? CompleteDraftTotal Archaea261238 Bacteria342238580 Eukarya

7 How Many Genomes Have Been Sequenced? CompleteDraftTotal Archaea261238 Bacteria342238580 Eukarya29533562

8 When will the 1,000th microbial genome be sequenced? 1,000 2,000 3,000 4,000 5,000 199 6 20002004 2008 X XXXX X X X X X Complete Genomes Year

9 Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling

10 http://theseed.uchicago.edu/FIG/index.cgi The SEED database developed by FIG Current version: 580 Bacteria (342 complete) 38 Archaea (26 complete) 562 Eukarya (29 complete) 1335 Viruses 2 Environmental Genomes

11 The problem: How do you generate consistent annotations for 1,000 genomes?

12 Basic biology lacZlacIlacYlacA

13 Different types of clustering < 80 %

14 Actinobacteria Aquificae Bacteroidetes Chlamydiae Chloroflexi Cyanobacteria Deinococcus- Thermus Firmicutes Spirochaetes Thermotogae Proteobacteria 1 0.8 0.6 0.4 0.2 0 Clusters of genes w/ maximum 80% identity Genes in subsystems in clusters Total number of genomes in group Fraction of genes in clusters Number of genomes 0 40 80 120 Average Occurrence of clustering in different genomes

15 Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling

16 The Subsystems Approach to Annotation Subsystem is a generalization of “pathway” –collection of functional roles jointly involved in a biological process or complex Functional Role is the abstract biological function of a gene product –atomic, or user-defined, examples: 6-phosphofructokinase (EC 2.7.1.11) LSU ribosomal protein L31p Streptococcal virulence factors Does not contain “putative”, “thermostable”, etc Populated subsystem is complete spreadsheet of functions and roles

17 Subsystems developed based on Wet lab Chromosomal context Metabolic context Phylogenetic context Microarray data Proteomics data …

18 Example Subsystem: Histidine Degradation Conversion of histidine to glutamate Functional roles defined in table Inclusion in subsystem is only by functional role Controlled vocabulary …

19 Subsystem Spreadsheet Column headers taken from table of functional roles Rows are selected genomes or organisms Cells are populated with specific, annotated genes Functional variants defined by the annotated roles Variant code -1 indicates subsystem is not functional Clustering shown by color OrganismVariant HutHHutUHutIGluFHutGNfoDForI Bacteroides thetaiotaomicron 1 Q8A4B3Q8A4A9Q8A4B1Q8A4B0 Desulfotela psychrophila 1 gi51246205gi51246204gi51246203gi51246202 Halobacterium sp. 2 Q9HQD5Q9HQD8Q9HQD6Q9HQD7 Deinococcus radiodurans 2 Q9RZ06Q9RZ02Q9RZ05Q9RZ04 Bacillus subtilis 2 P10944P25503P42084P42068 Caulobacter crescentus 3 P58082Q9A9MIP58079Q9A9M0Q9A9L9 Pseudomonas putida 3 Q88CZ7Q88CZ6Q88CZ9Q88D00Q88CZ3 Xanthomonas campestris 3 Q8PAA7P58988Q8PAA6Q8PAA8Q8PAA5 Listeria monocytogenes Subsystem Spreadsheet

20 “The Populated Subsystem” OrganismVariant HutHHutUHutIGluFHutGNfoDForI Bacteroides thetaiotaomicron 1 Q8A4B3Q8A4A9Q8A4B1Q8A4B0 Desulfotela psychrophila 1 gi51246205gi51246204gi51246203gi51246202 Halobacterium sp. 2 Q9HQD5Q9HQD8Q9HQD6Q9HQD7 Deinococcus radiodurans 2 Q9RZ06Q9RZ02Q9RZ05Q9RZ04 Bacillus subtilis 2 P10944P25503P42084P42068 Caulobacter crescentus 3 P58082Q9A9MIP58079Q9A9M0Q9A9L9 Pseudomonas putida 3 Q88CZ7Q88CZ6Q88CZ9Q88D00Q88CZ3 Xanthomonas campestris 3 Q8PAA7P58988Q8PAA6Q8PAA8Q8PAA5 Listeria monocytogenes Subsystem Spreadsheet

21 Subsystem Diagram Three functional variants Universal subset has three roles, followed by three alternative paths from IV to VI No ForI known experimentally

22 Subsystem Spreadsheet Prediction from subsystems confirmed experimentally OrganismVariant HutHHutUHutIGluFHutGNfoDForI Bacteroides thetaiotaomicron 1 Q8A4B3Q8A4A9Q8A4B1Q8A4B0 Desulfotela psychrophila 1 gi51246205gi51246204gi51246203gi51246202 Halobacterium sp. 2 Q9HQD5Q9HQD8Q9HQD6Q9HQD7 Deinococcus radiodurans 2 Q9RZ06Q9RZ02Q9RZ05Q9RZ04 Bacillus subtilis 2 P10944P25503P42084P42068 Caulobacter crescentus 3 P58082Q9A9MIP58079Q9A9M0Q9A9L9 Pseudomonas putida 3 Q88CZ7Q88CZ6Q88CZ9Q88D00Q88CZ3 Xanthomonas campestris 3 Q8PAA7P58988Q8PAA6Q8PAA8Q8PAA5 Listeria monocytogenes Subsystem Spreadsheet

23 Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling

24 How do bacteria make methionine? acquire homoserine convert cysteine to cystathione convert cystathione to homocysteine acquire met or convert homocysteine to methionine sulfur and acetylhomoserine sulfhydralase

25

26 ? ? Missing genes

27 Cyanoseed: http://cyanoseed.theFIG.info

28 Marineseed: http://theseed.uchicago.edu/FIG/organisms.cgi?show=marine

29 predicted or measured co-regulation genome context (virulence islands, prophages, conserved gene clusters) virulence mechanism cellular localization enzymatic activity common phenotype combinations of criteria Subsystems are not just for gene clusters

30 How much progress has been made? 541 subsystems encoded 80 – 85% of the genes in core machinery are contained in subsystems 30 – 35% of genes in NMPDR organism genomes, 20 – 30% of other genomes contained in subsystems

31 Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling

32 Metagenomics 200 liters water 5-500 g fresh fecal matter DNA/RNA LASL Sequence Epifluorescent Microscopy Concentrate and purify viruses Extract nucleic acids Breitbart et al., multiple papers

33 Control datasets for metagenome comparisons Bacteria952,758 Archaea49,694 Eukarya259,653 Acid mine7,588 Sargasso (without Shewanella, Burkholderia) 960,561 Sorcerer II~13,000,000 Number of proteins in different datasets

34 Subsystems per million CDS

35 Determination of Statistical Differences Between Metagenomes Take 10,000 proteins from sample 1 Count frequency of each subsystem Repeat 20,000 times Repeat for sample 2 Combine both samples Sample 10,000 proteins 20,000 times Build 95% CI Compare medians from samples 1 and 2 with 95% CI Rodriguez-Brito (2006). BMC Bioinformatics

36 Sampling Sargasso and “SEED” metagenomes

37 Comparison of all Subsystems More in SargassoMore in SEED

38 Is serine being used as an osmolyte? Few trehalose, proline, sucrose synthetic genes Serine is most abundant amino acid in ocean (Suttle, Keil) Serine is more effective osmoprotectant than glycine betaine (Yancey)

39 Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling

40 Metagenomics 200 liters water 5-500 g fresh fecal matter DNA/RNA LASL Sequence Epifluorescent Microscopy Concentrate and purify viruses Extract nucleic acids Breitbart et al., multiple papers 454 So 2004

41 454 Sequence Data (Only from Rohwer Lab, in one year) 42 libraries –22 microbial, 20 phage 1,028,563,420 bp total –33% of the human genome –95% of all complete and partial bacterial genomes –10% of community sequencing of JGI per year 9,933,184 sequences –Average 236,511 per library Average read length 103.5 bp –Av. read length has not increased in 12 months

42 The Soudan Mine, Minnesota Red Stuff Oxidized Black Stuff Reduced

43 Red and Black Samples Are Different Cloned and 454 sequenced 16S are indistinguishable Black stuff Red Cloned Red

44 There are different amounts of metabolism in each environment

45 There are different amounts of substrates in each environment Black Stuff Red Stuff

46 But are the differences significant? Sample 10,000 proteins from site 1 Count frequency of each “subsystem” Repeat 20,000 times Repeat for sample 2 Combine both samples Sample 10,000 proteins 20,000 times Build 95% CI Compare medians from sites 1 and 2 with 95% CI Rodriguez-Brito (2006). BMC Bioinformatics

47 Subsystem differences & metabolism Iron acquisition Black Stuff Siderophore enterobactin biosynthesis ferric enterobactin transport ABC transporter ferrichrome ABC transporter heme Black stuff: ferrous iron (Fe 2+, ferroan [(Mg,Fe) 6 (Si,Al) 4 O 10 (OH) 8 ]) Red stuff: ferric iron (goethite [FeO(OH)])

48 Nitrification differentiates the samples Edwards (2006) BMC Genomics

49 The challenge is explaining the differences between samples Red Sample Arg, Trp, His Ubiquinone FA oxidation Chemotaxis, Flagella Methylglyoxal metabolism Black Sample Ile, Leu, Val Siderophores Glycerolipids NiFe hydrogenase Phenylpropionate degradation

50 We can cheaply compare the important biochemistry happening in different environments We don’t care which organisms are doing the metabolism but we know what organisms are there

51 Outline Sequencing statistics scare skeptics The SEED database Some simply stunning Subsystems Mysterious missing methionine metabolism Marine metabolism mined from metagenomics Fabulous four-five-four for facile functional findings Marine phage most puzzling

52 Phages In The Worlds Oceans GOM 41 samples 13 sites 5 years SAR 1 sample 1 site 1 year BBC 85 samples 38 sites 8 years ARC 56 samples 16 sites 1 year LI 4 sites 1 year

53 Phages, Reefs, and Human Disturbance The Northern Line Islands Expedition, 2005 Christmas Kingman Christmas Kingman Palmyra Washington Fanning

54 16S rDNA at each island

55 16S rDNA of the Proteobacteria

56 Phages at each island

57 Christmas to Kingman Bias in No. Phage Hosts Negative numbers mean relatively more phage hosts at Kingman

58 Phages In The Worlds Oceans GOM 41 samples 13 sites 5 years SAR 1 sample 1 site 1 year BBC 85 samples 38 sites 8 years ARC 56 samples 16 sites 1 year LI 4 sites 1 year

59 Most Marine Phage Sequences are Novel

60 Thanks: Mya Breitbart Phages are specific to environments Phage Proteomic Tree v. 5 (Edwards, Rohwer) ssDNA -like T7-like T4-like

61 Marine Single-Stranded DNA Viruses 6% of SAR sequences ssDNA phage (Chlamydia-like Microviridae) 40% viral particles in SAR are ssDNA phage Several full-genome sequences were recovered via de novo assembly of these fragments Confirmed by PCR and sequencing

62 12,297 sequence fragments hit using TBLASTX over a ~4.5 kb genome SAR Aligned Against the Chlamydia  4 Individual sequence reads Chlamydia phi 4 genome Coverage Concatenated hits

63 Summary You only need to remember: Subsystems are the best way to annotate genomes 454 generates lots of data We can use subsystems to find out what is going on in the environment

64 SDSU Forest Rohwer Beltran Brito-Rodriguez Linda Wegley USF Mya Breitbart University of Bielefeld Folker Meyer Lutz Krause FIG Veronika Vonstein Ross Overbeek Gordon Pusch ANL Rick Stevens Bob Olsen Terry Disz Annotators Gary Olsen Andrei Ostermann Olga Zagnitko Olga Vassieva Svetlana Gerdes Ramy Aziz UBC Curtis Suttle Amy Chan


Download ppt "What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards Fellowship."

Similar presentations


Ads by Google