Presentation is loading. Please wait.

Presentation is loading. Please wait.

Life on Earth Thought to be ~3.8 billion years old

Similar presentations


Presentation on theme: "Life on Earth Thought to be ~3.8 billion years old"— Presentation transcript:

1

2 Life on Earth Thought to be ~3.8 billion years old
For the first 1.5 billion years, it was all aquatic microbes

3 Diversity of bacteria and archaea
Only ~1% of all microbial species can be cultured 97% of prokaryotic isolates in stock centers are from just 4 phyla: Proteobacteria (Escherichia, Helicobacter, Pseudomonas) Firmicutes (Bacillus, Streptococcus, Staphylococcus) Actinobacteria (Mycobacterium) Bacteriodetes

4 Unculturable microbes
Hugenholtz P (2002) Genome Biology 3, 1-8

5 Environmental sampling: two approaches
SSU rRNA Common to all cells Can be amplified with universal PCR primers Does not reveal metabolic diversity Random genomic DNA Obtained by blindly cloning mixed DNA samples Difficult to know what you have got! But this is where the genes underlying metabolic diversity are

6 SSU rRNA Phylogeny Pace NR (1997) Science 276, 734

7 Genome sequencing The first microbe (Haemophilus influenzae) was sequenced in 1995 Since then, well over 100 genomes have been completed Over 500 genome projects are ongoing

8 Diversity of sequenced genomes
Archaea Bacteria Eukarya Importance: academic medicinal agricultural ecological industrial Representation of completed genome sequences over time (x-axis) and size (y-axis, in Mb, logarithmic scale) labeled according to their social impact. Genomes from Archaea (squares), Bacteria (circles) and Eukarya (triangles) are colored according to their academic (blue), medical (pink), agricultural (light green), ecological (dark green) and industrial (black) relevance. Janssen et al. (2003) Genome Biology 4, 402

9 Genome projects online

10 60K protein families and counting
clustering of 311,256 proteins from 83 complete genomes Kunin et al. (2003) Genome Biology 4, 401

11 Shotgun sequencing Random size-selected clones are sequenced from both ends (mate pairs) Overlapping sequences are assembled in contigs Contigs connected by mate pairs are assembled in scaffolds The total number of bases sequenced divided by the length of the genome is the coverage. Coverage must be at least 8X to avoid large gaps contigs & scaffolds mate pair reads chromosome

12 What if we combined environmental sampling and shotgun sequencing?
How many genomes would be sampled and from what organisms? How many novel genes would be discovered? How many genomes could we completely assemble?

13 Current estimates of microbial community diversity
Curtis et al (2002) PNAS 99, estimated that there are up to 160 species in a typical millilitre of seawater while there are somewhere between 6,400 and 38,000 in a typical gram of soil.

14 The Sargasso Sea A sea with no coastline (bounded by ocean currents)
It moves! Generally between the West Indies and the Azores Water is very placid (the ‘doldrums’) Covered by a lens of warm, nutrient-poor water and a vast mat of algae (Sargassum) As simple a microbial community as is likely to be found in the ocean

15 The Sargasso Sea

16 Sargassum (can you spot the nudibranchs?)
© 2000 by Image Quest 3-D

17 Institute for Biological Energy Alternatives
Founded by J. Craig Venter, who also founded TIGR and Celera The IBEA “is dedicated to exploring solutions for carbon sequestration using microbes, microbial pathways, and plants.” “will develop and use microbial pathways and microbial metabolism to produce fuels with higher energy content in an environmentally sound fashion”. “will undertake genome engineering to better understand the evolution of cellular life and how these cell components function together in a living system”. For example, genomics could be applied to enhance the ability of terrestrial and oceanic microbial communities to remove carbon from the atmosphere.

18 Sampling scheme 1700 liters of surface water sampled from four different sites Most during season of winter nutrient upwelling in February Some during nutrient-poor season in May Filters allowed only cells in the micron range Excluded dissolved DNA and free virus Excluded most eukaryotes

19 Sampling sites and ocean chlorophyll levels (February)

20 Lots and lots of sequences
2 million cloned fragments 2-6 kbp in size were sequenced This yielded 1.6 billion base pairs total 1 billion bp non-redundant For comparison, the human genome is 3 billion bp

21 Assembly issues Organisms differ in
abundance genome size Cannot rely on assumption that coverage is uniformly random Some contigs will have extremely deep coverage, which is a challenge for assembly algorithms Requires an iterative assembly process with lots of manual intervention and many unassembled sequences

22 Assembly results Assembly was only successful in February sample
64,000 scaffolds, most less than 10 kbp 500,000 clones did not assemble Of those with 3X or greater coverage About ½ could be classified taxonomically 21 scaffolds with greater than 14X coverage SNPs occur at 1/10 kbp, suggesting genetic diversity within ‘species’ Only two genomes were fully assembled, and then only with the aid of an existing reference sequence for both

23 Genome diversity in Prochlorococcus
outer circle: Prochlorococcus marinus colors represent role categories There are up to four structurally distinct scaffolds aligning with a given interval on the reference genome

24 Unexpected sequences Relatives of Burkholderia and Shewanella, typical of much more nutrient-rich environments, possibly living off of ‘marine snow’ At least two abundant Archaeal organisms, typical of much greater depths (200 meters) At least 10 mega-plasmids, many with genes related to trace metal utilization or toxicity Not too surprising Some phage genomes, presumably integrated About 70 different eukaryote species (based mainly on the presence of 18S rDNA)

25 Homology of four scaffolds to reference crenarchaeal sequence 4B7

26 How many new genes are discovered?
1.2 million genes identified Equal to the number of genes submitted to the Swissprot/TrEMBL database from the last 8 years! Interesting findings Ammonium oxidation in Archaea, which was previously unknown Widespread presence of genes allowing unconventional forms of phosphate uptake Only~37 Rubisco sequences were found, but ~800 proteorhodopsin-like genes

27 Can we estimate the relative abundance of the taxa?
How good of a marker is rDNA? Universal primers may miss some taxa Genomes vary in rDNA copy number Examined five other gene markers to measure relative abundance of different taxa

28

29 How many species are there?
Number of distinct SSU genes 1164 in February 248 in May Dominated by proteobacteria 148 are new phylotypes (at 97% identity) Can we estimate how many remain to be sampled? From a model of assembly completeness, one can estimate 1,800 to 48,000 species in the combined sample With 5-10 fold deeper coverage, one can estimate that 50 genomes could be fully assembled

30 Patchiness Patchiness is well documented in marine macro-organisms in the open ocean, but is not known for microbial communities Of the species represented by assemblies with more than 50 fragments, more than half differed in abundance among sites

31 Proteorhodopsin Proteorhodopsin was recently discovered in an uncultured lineage of planktonic Gammaproteobacteria Related to the light-driven proton pump bacteriorhodopsins Is part of a non-chlorophyll based photosynthetic pathway About 70 sequences are known, mostly from proteobacteria ~800 proteorhodopsins were sequenced Can be grouped into 13 subfamilies Only four of which are found in cultured organisms Seven are only known from the Sargasso Sea

32 Rhodopsin-like sequences

33 Is this the future? Cost of sequencing decreases steadily
The more reference sequences exist, the easier data analysis will become Many proteins can be expressed even when organisms cannot be cultured How else could we sample the genetic diversity of unculturable microbes?


Download ppt "Life on Earth Thought to be ~3.8 billion years old"

Similar presentations


Ads by Google