1 Les mesures de diversité microbienne par séquençage massivement parallèle Richard Christen CNRS UMR 6543 & Université de Nice

Slides:



Advertisements
Similar presentations
The Past, Present, and Future of DNA Sequencing
Advertisements

 Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina  Prokaryotic profiling › De novo genome sequencing › Metagenomics › SNP profiling › Species.
Today’s Lecture Topics
Human Genome Project What did they do? Why did they do it? What will it mean for humankind? Animation OverviewAnimation Overview - Click.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Metabarcoding 16S RNA targeted sequencing
Robert May ecologist Photo: Hubble Telescope We have a catalog of all the celestial bodies our instruments can detect in the universe, but …
Jenifer Unruh VCU-HHMI Summer Scholars Program Mentor: Dr. Shozo Ozaki.
CNRS UMR 6543 & Université de Nice
Practical Bioinformatics Community structure measures for meta-genomics István Albert Bioinformatics Consulting Center Penn State.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
A Lot More Advanced Biotechnology Tools DNA Sequencing.
Welcome to Chem 434 Bioinformatics March 29, 2005 Review of course prerequisites Course objectives Become proficient at using existing bioinformatics.
Richard, Rochelle, Zohal, Angie
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Microbial Diversity.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
The Microbiome and Metagenomics
The impact of next-generation sequencing technology of genetics Elaine R. Mardis – 11 February Washington School of Medicine, Genome Sequencing Center.
BioInformatics (2). Physical Mapping - I Low resolution  Megabase-scale High resolution  Kilobase-scale or better Methods for low resolution mapping.
Metagenomic Analysis Using MEGAN4
Development of Bioinformatics and its application on Biotechnology
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
Analysis of Microbial Community Structure
Molecular Microbial Ecology
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Systematics, Taxonomy, Phylogeny and Evolution Systematics The systematic classification of organisms, the science of systematic classification and the.
H = -Σp i log 2 p i. SCOPI Each one of the many microbial communities has its own structure and ecosystem, depending on the body environment it exists.
AP Biology A Lot More Advanced Biotechnology Tools Sequencing.
Probes can be designed in an evolutionary hierarchy.
Ch 10 Classification of Microorganisms.
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
Greengenes: A Tutorial
Prokaryote Taxonomy & Diversity
 16S rRNA gene marker  intra-gene variability  primer selection  size & information content Primer selection, information content, alignment and length.
Genome sequencing Haixu Tang School of Informatics.
Christian Rinke Microbial Genomics DOE, Joint Genome Institute Introduction to ARB (From A User's Perspective)
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.
Abstract Our current understanding of the taxonomic and phylogenetic diversity of cellular organisms, especially the bacteria and archaea, is mostly based.
Elucidating factors behind pair wise distances discrepancies between short and near full-length sequences. We hypothesized that since the 16S rRNA molecule.
A Lot More Advanced Biotechnology Tools Sequencing.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Accurate estimation of microbial communities using 16S tags
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
The use of 16S rRNA gene sequences to study phylogeny and taxonomy.
TIGER * Biosensor for Emerging Infectious Disease Surveillance *Triangulation Identification for Genetic Evaluation of Risks Ranga Sampath David Ecker.
MEGAN analysis of metagenomic data Daniel H. Huson, Alexander F. Auch, Ji Qi, et al. Genome Res
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Metagenomics The study of metagenomes, genetic material recovered directly from environmental samples. Term: Coined in 1998 to refer to the idea that a.
General Microbiology (Micr300)
Computational Characterization of Short Environmental DNA Fragments Jens Stoye 1, Lutz Krause 1, Robert A. Edwards 2, Forest Rohwer 2, Naryttza N. Diaz.
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
Robert Edgar Independent scientist
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
Microbial genomics.
Metagenomic Species Diversity.
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Metagenomics Rob Edwards.
Workshop on the analysis of microbial sequence data using ARB
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
Today… Review a few items from last class
Genomes and Their Evolution
H = -Σpi log2 pi.
BIOL 433 Plant Genetics Term 2,
A Lot More Advanced Biotechnology Tools
Presentation transcript:

1 Les mesures de diversité microbienne par séquençage massivement parallèle Richard Christen CNRS UMR 6543 & Université de Nice

2 Genome Res : The “classic” approach Use PCR with “universal” primers. Clone. Random sequence clones.

3 Biodiversity analyses - classic PCR – clone - sequence : too tedious for most labs !

4 30 years Roadmap to «Global Sequencing» 1975 First complete DNA genome : bacteriophage φX Maxam and Gilbert "DNA sequencing by chemical degradation" 1977 Sanger "DNA sequencing by enzymatic synthesis" Genbank starts as a public repository of DNA sequences PCR 1986 First semi-automated DNA sequencing machine. BLAST algorithm for sequence retrieval. Capillary electrophoresis Venter expressed genes with ESTs 1992 Venter leaves NIH to set up The Institute for Genomic Research (TIGR). BACs (Bacterial Artificial Chromosomes) for cloning. First chromosome physical maps published: Y & 21 Complete mouse genetic map Complete human genetic map 1993 Wellcome Trust and MRC open Sanger Centre, near Cambridge, UK. –The GenBank database migrates from Los Alamos (DOE) to NCBI (NIH) Haemophilus influenzae S. cerevisiae RIKEN : first set of full-length mouse cDNAs. ABI : the ABI310 sequence analyzer E. coli Venter starts “Celera” Applied Biosystems introduces the 3700 capillary sequencing machine. C. elegans Human chromosome Drosophila melanogaster H.s. chromosome 21 Arabidopsis thaliana 2001 HGP consortium : Human Genome Sequence Celera : the Human Genome sequence ,000 human sequences (Applied VariantSEQr). –Pyrosequencing machine 2007 A set of closely related species (12 Drosophilidae) sequenced Craig Venter publishes his full diploid genome : the first human genome to be sequenced completely.

5 High-throughput sequencing High-throughput sequencing technologies are intended to lower the cost of sequencing DNA libraries Many of the new high-throughput methods use methods that parallelize the sequencing process, producing thousands or millions of sequences at once. No cloning ! One day experiment !

6 Advantages and Disadvantages 454 Sequencing runs at 20 megabases per 4.5- hour run (1 day: from sampling to sequences). G-C rich content is not as much of a problem. Unclonable segments are not skipped. Detection of mutations in an amplicon pool at a low sensitivity level. Each read of the GS20 is only 100 base pairs long ( ); The new FLX system does base pairs (2007) 454 has said they expect 500 in '08.

7 Biodiversity, examples Huber, J. A., D. B. Mark Welch, et al. (2007). "Microbial population structures in the deep marine biosphere." Science 318(5847): Sogin, M. L., H. G. Morrison, et al. (2006). "Microbial diversity in the deep sea and the underexplored "rare biosphere"." Proc. Natl. Acad. Sci. U S A 103(32): Roesch, L. F., R. R. Fulthorpe, et al. (2007). "Pyrosequencing enumerates and contrasts soil microbial diversity." ISME J. 1(4):

8 Possible variable domains in the 16S rRNA gene sequences

9 New tags as a function of sequencing effort MPS will sequence every PCR product present. But has PCR amplified every gene present in sample ?

10 Tag dereplication

11 MPS can sequence them all !

12 Clustering tags into OTU Total calculation time : 7 minutes Usual manner : align (Muscle), compute distances, phylogeny or cluster. Better : cluster according to words frequencies No alignement Much faster Much better

13 Assign each tag to a taxon GreenGenes. The greengenes web application provides access to a 16S rRNA gene sequence alignments for browsing, blasting, probing, and downloading. URL: RDP. The Ribosomal Database Project (RDP) provides ribosome related data services to the scientific community, including online data analysis, rRNA derived phylogenetic trees, and aligned and annotated rRNA sequences. URL: Silva. SILVA provides comprehensive, quality checked and regularly updated databases of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya). URL: Assignments done using first hit of blast.

14 BMC Microbiology 2007, 7:108 Assign each tag to a taxon

15 BMC Microbiology 2007, 7:108 Assign each tag to a taxon Simulated read resolution for varying read-lengths

16 Problems How many good 16S sequences for how many species ? How many long sequences annotated with a good taxonomy ? How good are the universal primers ? How to assign a taxonomy to a tag ? What are 16S sequences worth in ecology ?

17 Identify 16S rRNA sequences: LOL ? 16S (LSU) RIBOSOMAL RNA 16S LARGE RIBOSOMAL RNA16S LARGE SUBUNIT RIBOOSMAL RNA16S LARGE SUBUNIT RIBOSOMAL RNA ?

18 Some long sequences correspond to badly annotated sequences such as Z94013, annotated with keywords "16S ribosomal RNA; 16S rRNA gene" when in fact it is a 23S rRNA sequence... The 16S “gold standard”

19  Mostly PCR derived sequences ! gb 165 (june 2008) bacteria 728,358 16S rRNA seqs “named” 59,128 seqs >99 nt 49,678 seqs >500nt 39,217 seqs >1000 nt

20 Numbers of 16S rRNA sequences per species Most species are known from a single sequence !  Tags taxonomic specificities are over-evaluated.  Most species have not been sequenced at all.

21 Main taxa that were not amplified Primers need to be better designed !

22 Conclusions Lack of complete sequences to evaluate primers, a single sequence available for a majority of species. Most sequences have a poorly annotated taxonomy. –112,509 (16.8 %) only of the 670,401 bacterial 16S rRNA gene sequences of length >100 nt presently deposited have a taxonomic description down to the genus level, while 383,570 sequences (57 %) have "environmental samples" as sole description. –No links of 16S sequences to phenotypes. – MPS technologies have not been validated against samples of known compositions. MPS machines are not calibrated before, during or after a run. MPS experiments to estimate diversity are not reproduced (duplicated) ! Primers have to be improved. Degenerated primers should NOT be mixed (competition).

23 Final conclusions MPS technologies enable researchers to undertake the process of global sequencing in a single operation using bench-top instruments. The term ‘post-genomics’ has been prematurely coined and we are in fact on the beginning of a global sequencing era, which opens a long journey that will occupy a broad spectrum of the scientific community for decades. Global (pyro)sequencing will replace any other method for estimating biodiversity of microbes. 16S rRNA sequences do not offer a direct link to phenotypes or functions. A wide and generalized sequencing effort and ontology building of well- identified strains deposited in collections worldwide is required to form the basis of derived annotations of environmental sequences. Developing ecosystem predictive models is fundamental, but this is still a long-term objective, as connection of taxonomy to functions is still missing in most cases. A work not supported by ANR.

24