Download presentation
Presentation is loading. Please wait.
Published byHendri Kusuma Modified over 6 years ago
2
Life on Earth Thought to be ~3.8 billion years old
For the first 1.5 billion years, it was all aquatic microbes
3
Diversity of bacteria and archaea
Only ~1% of all microbial species can be cultured 97% of prokaryotic isolates in stock centers are from just 4 phyla: Proteobacteria (Escherichia, Helicobacter, Pseudomonas) Firmicutes (Bacillus, Streptococcus, Staphylococcus) Actinobacteria (Mycobacterium) Bacteriodetes
4
Unculturable microbes
Hugenholtz P (2002) Genome Biology 3, 1-8
5
Environmental sampling: two approaches
SSU rRNA Common to all cells Can be amplified with universal PCR primers Does not reveal metabolic diversity Random genomic DNA Obtained by blindly cloning mixed DNA samples Difficult to know what you have got! But this is where the genes underlying metabolic diversity are
6
SSU rRNA Phylogeny Pace NR (1997) Science 276, 734
7
Genome sequencing The first microbe (Haemophilus influenzae) was sequenced in 1995 Since then, well over 100 genomes have been completed Over 500 genome projects are ongoing
8
Diversity of sequenced genomes
Archaea Bacteria Eukarya Importance: academic medicinal agricultural ecological industrial Representation of completed genome sequences over time (x-axis) and size (y-axis, in Mb, logarithmic scale) labeled according to their social impact. Genomes from Archaea (squares), Bacteria (circles) and Eukarya (triangles) are colored according to their academic (blue), medical (pink), agricultural (light green), ecological (dark green) and industrial (black) relevance. Janssen et al. (2003) Genome Biology 4, 402
9
Genome projects online
10
60K protein families and counting
clustering of 311,256 proteins from 83 complete genomes Kunin et al. (2003) Genome Biology 4, 401
11
Shotgun sequencing Random size-selected clones are sequenced from both ends (mate pairs) Overlapping sequences are assembled in contigs Contigs connected by mate pairs are assembled in scaffolds The total number of bases sequenced divided by the length of the genome is the coverage. Coverage must be at least 8X to avoid large gaps contigs & scaffolds mate pair reads chromosome
12
What if we combined environmental sampling and shotgun sequencing?
How many genomes would be sampled and from what organisms? How many novel genes would be discovered? How many genomes could we completely assemble?
13
Current estimates of microbial community diversity
Curtis et al (2002) PNAS 99, estimated that there are up to 160 species in a typical millilitre of seawater while there are somewhere between 6,400 and 38,000 in a typical gram of soil.
14
The Sargasso Sea A sea with no coastline (bounded by ocean currents)
It moves! Generally between the West Indies and the Azores Water is very placid (the ‘doldrums’) Covered by a lens of warm, nutrient-poor water and a vast mat of algae (Sargassum) As simple a microbial community as is likely to be found in the ocean
15
The Sargasso Sea
16
Sargassum (can you spot the nudibranchs?)
© 2000 by Image Quest 3-D
17
Institute for Biological Energy Alternatives
Founded by J. Craig Venter, who also founded TIGR and Celera The IBEA “is dedicated to exploring solutions for carbon sequestration using microbes, microbial pathways, and plants.” “will develop and use microbial pathways and microbial metabolism to produce fuels with higher energy content in an environmentally sound fashion”. “will undertake genome engineering to better understand the evolution of cellular life and how these cell components function together in a living system”. For example, genomics could be applied to enhance the ability of terrestrial and oceanic microbial communities to remove carbon from the atmosphere.
18
Sampling scheme 1700 liters of surface water sampled from four different sites Most during season of winter nutrient upwelling in February Some during nutrient-poor season in May Filters allowed only cells in the micron range Excluded dissolved DNA and free virus Excluded most eukaryotes
19
Sampling sites and ocean chlorophyll levels (February)
20
Lots and lots of sequences
2 million cloned fragments 2-6 kbp in size were sequenced This yielded 1.6 billion base pairs total 1 billion bp non-redundant For comparison, the human genome is 3 billion bp
21
Assembly issues Organisms differ in
abundance genome size Cannot rely on assumption that coverage is uniformly random Some contigs will have extremely deep coverage, which is a challenge for assembly algorithms Requires an iterative assembly process with lots of manual intervention and many unassembled sequences
22
Assembly results Assembly was only successful in February sample
64,000 scaffolds, most less than 10 kbp 500,000 clones did not assemble Of those with 3X or greater coverage About ½ could be classified taxonomically 21 scaffolds with greater than 14X coverage SNPs occur at 1/10 kbp, suggesting genetic diversity within ‘species’ Only two genomes were fully assembled, and then only with the aid of an existing reference sequence for both
23
Genome diversity in Prochlorococcus
outer circle: Prochlorococcus marinus colors represent role categories There are up to four structurally distinct scaffolds aligning with a given interval on the reference genome
24
Unexpected sequences Relatives of Burkholderia and Shewanella, typical of much more nutrient-rich environments, possibly living off of ‘marine snow’ At least two abundant Archaeal organisms, typical of much greater depths (200 meters) At least 10 mega-plasmids, many with genes related to trace metal utilization or toxicity Not too surprising Some phage genomes, presumably integrated About 70 different eukaryote species (based mainly on the presence of 18S rDNA)
25
Homology of four scaffolds to reference crenarchaeal sequence 4B7
26
How many new genes are discovered?
1.2 million genes identified Equal to the number of genes submitted to the Swissprot/TrEMBL database from the last 8 years! Interesting findings Ammonium oxidation in Archaea, which was previously unknown Widespread presence of genes allowing unconventional forms of phosphate uptake Only~37 Rubisco sequences were found, but ~800 proteorhodopsin-like genes
27
Can we estimate the relative abundance of the taxa?
How good of a marker is rDNA? Universal primers may miss some taxa Genomes vary in rDNA copy number Examined five other gene markers to measure relative abundance of different taxa
29
How many species are there?
Number of distinct SSU genes 1164 in February 248 in May Dominated by proteobacteria 148 are new phylotypes (at 97% identity) Can we estimate how many remain to be sampled? From a model of assembly completeness, one can estimate 1,800 to 48,000 species in the combined sample With 5-10 fold deeper coverage, one can estimate that 50 genomes could be fully assembled
30
Patchiness Patchiness is well documented in marine macro-organisms in the open ocean, but is not known for microbial communities Of the species represented by assemblies with more than 50 fragments, more than half differed in abundance among sites
31
Proteorhodopsin Proteorhodopsin was recently discovered in an uncultured lineage of planktonic Gammaproteobacteria Related to the light-driven proton pump bacteriorhodopsins Is part of a non-chlorophyll based photosynthetic pathway About 70 sequences are known, mostly from proteobacteria ~800 proteorhodopsins were sequenced Can be grouped into 13 subfamilies Only four of which are found in cultured organisms Seven are only known from the Sargasso Sea
32
Rhodopsin-like sequences
33
Is this the future? Cost of sequencing decreases steadily
The more reference sequences exist, the easier data analysis will become Many proteins can be expressed even when organisms cannot be cultured How else could we sample the genetic diversity of unculturable microbes?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.