Life on Earth Thought to be ~3.8 billion years old

Slides:



Advertisements
Similar presentations
An Overview of Microbial Life
Advertisements

Tucson High School Biotechnology Course Spring 2010.
Metabarcoding 16S RNA targeted sequencing
Lecture 2 Overview of Microbial Diversity Prokaryotic and Eukaryotic Cells Taxonomy and Nomenclature (Text Chapters: 2; 11)
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Comparative Genomics Virulence in E. coli Diversity of Genomes How Many Genomes are There? Different Genome Perspectives.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Geomicrobiology.
Central Dogma Information storage in biological molecules DNA RNA Protein transcription translation replication.
Brock Biology of Microorganisms
Geomicrobiology. Course Goals At the end of this course you will be able to… –Intelligently converse with microbiologists, geologists, environmental scientists.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
The Microbiome and Metagenomics
Reading the Blueprint of Life
Environmental Genome Shotgun Sequencing of the Sargasso Sea
Open Oceans: Pelagic Ecosystems II
Molecular Microbial Ecology
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Ocean & Climate Atmospheric CO 2, DMS, … Ocean/Atmosphere Circulation Dust-Iron Influx, pH Ocean Nutrient Fields Ecosystem State Biomass Primary Productivity.
Probes can be designed in an evolutionary hierarchy.
The Sargasso Sea “Metagenome”
Microbial genomics Genomics: study of entire genomes Logical next step after genetics: study of genes Genomics: 1) “Structural genomics” * Determine and.
Big Picture Of ≈1.7 million species classified so far, roughly 6000 are microbes True number of microbes is obviously larger than 6000 “Imagine if our.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
CHP: 13 BIOTECHNOLOGY. GENETIC ENGINEERING  The procedure for cleaving DNA from an organism into smaller fragments & inserting the fragments into another.
15.2, slides with notes to write down
Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 29, 2011 Metagenome analysis: use case.
Neanderthals Noonan, et al. Sequencing and Analysis of Neanderthal Genomic DNA Green, et al. Analysis of one million base pairs of Neanderthal DNA Kristine.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Major characteristics used in taxonomy
Environmental Genome Shotgun Sequencing of the Sargasso Sea Venter et. al (2004) Presented by Ken Vittayarukskul Steven S. White.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Metagenomic survey of a biological tannery wastewater treatment plant in Modjo, Ethiopia Adey Feleke Desta*, Seyoum Leta***, Francesca Stomeo**, Joyce.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
General Microbiology (Micr300)
Boundless Lecture Slides Free to share, print, make copies and changes. Get yours at Available on the Boundless Teaching Platform.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The Escherichia coli nucleoid.
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
Soil Microbiome of Native and Invasive Marsh Grasses in Blackbird Creek, Delaware Lathadevi K.Chintapenta 1#, Gulnihal Ozbay 1#, Venu Kalavacharla 1* Figure.
Bacterial Cell Structure
Prokaryotes capture solar energy
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
Biodiversity How did biological diversity come about?
Microbial genomics.
Metagenomic Species Diversity.
Microbial Genomics Workshop Elizabeth Dinsdale Rio October
Environmental Genome Shotgun Sequencing of the Sargasso Sea
Bacterial Cell Structure
Section 3: Kingdoms and Domains
Genomic Data Manipulation Thinking about data visually
15.2, slides with notes to write down
Genomes and Their Evolution
Research in Computational Molecular Biology , Vol (2008)
Section 3: Kingdoms and Domains
Environmental Genome Shotgun Sequencing of the Sargasso Sea
Workshop on the analysis of microbial sequence data using ARB
Genomic Data Manipulation
Gene Transfer, Genetic Engineering, and Genomics
Today… Review a few items from last class
Genomes and Their Evolution
H = -Σpi log2 pi.
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
Metagenomics Microbial community DNA extraction
Animal, Plant & Soil Science
Exploring the forest canopy metagenome for novel compounds
Extra chromosomal Agents Transposable elements
Ruth E. Ley, Daniel A. Peterson, Jeffrey I. Gordon  Cell 
Introduction to Sequencing
Unit Genomic sequencing
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Life on Earth Thought to be ~3.8 billion years old For the first 1.5 billion years, it was all aquatic microbes

Diversity of bacteria and archaea Only ~1% of all microbial species can be cultured 97% of prokaryotic isolates in stock centers are from just 4 phyla: Proteobacteria (Escherichia, Helicobacter, Pseudomonas) Firmicutes (Bacillus, Streptococcus, Staphylococcus) Actinobacteria (Mycobacterium) Bacteriodetes

Unculturable microbes Hugenholtz P (2002) Genome Biology 3, 1-8

Environmental sampling: two approaches SSU rRNA Common to all cells Can be amplified with universal PCR primers Does not reveal metabolic diversity Random genomic DNA Obtained by blindly cloning mixed DNA samples Difficult to know what you have got! But this is where the genes underlying metabolic diversity are

SSU rRNA Phylogeny Pace NR (1997) Science 276, 734

Genome sequencing The first microbe (Haemophilus influenzae) was sequenced in 1995 Since then, well over 100 genomes have been completed Over 500 genome projects are ongoing

Diversity of sequenced genomes Archaea Bacteria Eukarya Importance: academic medicinal agricultural ecological industrial Representation of completed genome sequences over time (x-axis) and size (y-axis, in Mb, logarithmic scale) labeled according to their social impact. Genomes from Archaea (squares), Bacteria (circles) and Eukarya (triangles) are colored according to their academic (blue), medical (pink), agricultural (light green), ecological (dark green) and industrial (black) relevance. Janssen et al. (2003) Genome Biology 4, 402

Genome projects online

60K protein families and counting clustering of 311,256 proteins from 83 complete genomes Kunin et al. (2003) Genome Biology 4, 401

Shotgun sequencing Random size-selected clones are sequenced from both ends (mate pairs) Overlapping sequences are assembled in contigs Contigs connected by mate pairs are assembled in scaffolds The total number of bases sequenced divided by the length of the genome is the coverage. Coverage must be at least 8X to avoid large gaps contigs & scaffolds mate pair reads chromosome

What if we combined environmental sampling and shotgun sequencing? How many genomes would be sampled and from what organisms? How many novel genes would be discovered? How many genomes could we completely assemble?

Current estimates of microbial community diversity Curtis et al (2002) PNAS 99, 10494 estimated that there are up to 160 species in a typical millilitre of seawater while there are somewhere between 6,400 and 38,000 in a typical gram of soil.

The Sargasso Sea A sea with no coastline (bounded by ocean currents) It moves! Generally between the West Indies and the Azores Water is very placid (the ‘doldrums’) Covered by a lens of warm, nutrient-poor water and a vast mat of algae (Sargassum) As simple a microbial community as is likely to be found in the ocean

The Sargasso Sea

Sargassum (can you spot the nudibranchs?) © 2000 by Image Quest 3-D

Institute for Biological Energy Alternatives Founded by J. Craig Venter, who also founded TIGR and Celera The IBEA “is dedicated to exploring solutions for carbon sequestration using microbes, microbial pathways, and plants.” “will develop and use microbial pathways and microbial metabolism to produce fuels with higher energy content in an environmentally sound fashion”. “will undertake genome engineering to better understand the evolution of cellular life and how these cell components function together in a living system”. For example, genomics could be applied to enhance the ability of terrestrial and oceanic microbial communities to remove carbon from the atmosphere.

Sampling scheme 1700 liters of surface water sampled from four different sites Most during season of winter nutrient upwelling in February Some during nutrient-poor season in May Filters allowed only cells in the 0.1-3.0 micron range Excluded dissolved DNA and free virus Excluded most eukaryotes

Sampling sites and ocean chlorophyll levels (February)

Lots and lots of sequences 2 million cloned fragments 2-6 kbp in size were sequenced This yielded 1.6 billion base pairs total 1 billion bp non-redundant For comparison, the human genome is 3 billion bp

Assembly issues Organisms differ in abundance genome size Cannot rely on assumption that coverage is uniformly random Some contigs will have extremely deep coverage, which is a challenge for assembly algorithms Requires an iterative assembly process with lots of manual intervention and many unassembled sequences

Assembly results Assembly was only successful in February sample 64,000 scaffolds, most less than 10 kbp 500,000 clones did not assemble Of those with 3X or greater coverage About ½ could be classified taxonomically 21 scaffolds with greater than 14X coverage SNPs occur at 1/10 kbp, suggesting genetic diversity within ‘species’ Only two genomes were fully assembled, and then only with the aid of an existing reference sequence for both

Genome diversity in Prochlorococcus outer circle: Prochlorococcus marinus colors represent role categories There are up to four structurally distinct scaffolds aligning with a given interval on the reference genome

Unexpected sequences Relatives of Burkholderia and Shewanella, typical of much more nutrient-rich environments, possibly living off of ‘marine snow’ At least two abundant Archaeal organisms, typical of much greater depths (200 meters) At least 10 mega-plasmids, many with genes related to trace metal utilization or toxicity Not too surprising Some phage genomes, presumably integrated About 70 different eukaryote species (based mainly on the presence of 18S rDNA)

Homology of four scaffolds to reference crenarchaeal sequence 4B7

How many new genes are discovered? 1.2 million genes identified Equal to the number of genes submitted to the Swissprot/TrEMBL database from the last 8 years! Interesting findings Ammonium oxidation in Archaea, which was previously unknown Widespread presence of genes allowing unconventional forms of phosphate uptake Only~37 Rubisco sequences were found, but ~800 proteorhodopsin-like genes

Can we estimate the relative abundance of the taxa? How good of a marker is rDNA? Universal primers may miss some taxa Genomes vary in rDNA copy number Examined five other gene markers to measure relative abundance of different taxa

How many species are there? Number of distinct SSU genes 1164 in February 248 in May Dominated by proteobacteria 148 are new phylotypes (at 97% identity) Can we estimate how many remain to be sampled? From a model of assembly completeness, one can estimate 1,800 to 48,000 species in the combined sample With 5-10 fold deeper coverage, one can estimate that 50 genomes could be fully assembled

Patchiness Patchiness is well documented in marine macro-organisms in the open ocean, but is not known for microbial communities Of the species represented by assemblies with more than 50 fragments, more than half differed in abundance among sites

Proteorhodopsin Proteorhodopsin was recently discovered in an uncultured lineage of planktonic Gammaproteobacteria Related to the light-driven proton pump bacteriorhodopsins Is part of a non-chlorophyll based photosynthetic pathway About 70 sequences are known, mostly from proteobacteria ~800 proteorhodopsins were sequenced Can be grouped into 13 subfamilies Only four of which are found in cultured organisms Seven are only known from the Sargasso Sea

Rhodopsin-like sequences

Is this the future? Cost of sequencing decreases steadily The more reference sequences exist, the easier data analysis will become Many proteins can be expressed even when organisms cannot be cultured How else could we sample the genetic diversity of unculturable microbes?