16S rRNA gene marker intra-gene variability primer selection size & information content Primer selection, information content, alignment and length
16s rRNA gene marker Conserved 2º structure Natural gene amplificationGenealogy reconstruction Ludwig and Schleifer, 1994 FEMS Rev 15:
Intra-gene variability secondary structure shows differences in the conservation of homologous sites highly conserved zones give information on deep-genealogies (higher resolution for distantly related) hypervariable zones give information on recent events (higher resolution for close relatives) Anderson et al., 2008 PLoS ONE, 3: e2836 Stahl and Amann, 1991 John Wiley and Sons
Primer selection universality Universal primers target highly conserved regions Universality depends on the known dataset Different phyla may have differences in the “universal” regions (e.g. EUB 338) Primers used for rRNA cloning may give biased results Metagenomics without amplification steps may reveal hidden diversity EUB338 I Most Bacteria GCTGCCTCCCGTAGGA GT EUB338 II Planctomycetales GCAGCCACCCGTAGGT GT EUB338 III Verrucomicrobiales GCTGCCACCCGTAGGT GT Daims et al System Appl Microbiol 22,
Primer selection size of the amplicon GM Valt 8 GM5 GM5-clamp F R 518 GM R F 945 Bac1055F R 1529 S 1505 ideally the almost complete gene (~ 1520 nucleotides) should be sequenced many amplifications skip sequencing the helix 50 (~ 1490 nucleotides) many clone libraries are based on just partial amplicons (~ 900 nucleotides) Pairs GM3 (8) – GM4 (1492) most widely used
16S rRNA sequencing has grown exponentially in parallel to the development of sequencing techniques Yarza et al., Nature Revs : Tamames & Rosselló-Móra 2012 TIM 20: rRNA cataloguing radioactive Sanger sequencing non- radioactive Sanger sequencing reverse transcription sequencing NSG The database is exponentially increasing 99% environmental sequences 1% cultured organisms 3.8 x10 6 sequences 700,000 / year (last three) Sources of sequences and quality rRNA Cataloguing (up to late 80’s), bad quality reverse transcription sequencing (up to late 90’s), bad quality Sanger methods (radioactive, biotin-labelled, terminal-dye… still in use) cloning DNA, good quality direct amplification, good quality DGGE/TGGE, short sequences, bad quality NSG, short sequences 454 technology (now up to 800nuc, mean of 500nuc), moderate quality illumina (now 2x 250nuc), too short
16S rRNA sequencing has grown exponentially in parallel to the development of sequencing techniques Quast et al., 2013, Nuc Acid Res. 41: D590-D596 SILVA release 119 (July 2014) rate of rejection of about 30% of the existing sequences short sequences are generally worse than long stretches
We divided the 16S rRNA gene into 6 regions of 250 nucleotides -Calculated taxa recovery in each stretch -Compare with that of the full sequence Regions V1 & V2 Regions V3 & V4 Regions V5 & V6 Categoryminimum Species98.7% Genus94.5% Family86.5% Order82.0% Class78.5% Phylum75.0% Yarza et al., Nature Revs :
-77% of the 16S rRNA gene sequences < 900pb -The 5‘region (V1-V2) overestimates species -The remaining regions tend to underestimate all taxa -Increases in length tend to mirror that of the full sequence Yarza et al., Nature Revs :
Size & information content complete sequences give complete information partial sequences lose phylogenetic signal short sequences lose resolution 1500 nuc 900 nuc 300 nuc
Primer selection & size of amplicons selection of primers is important for representative results the length of the amplified/sequenced gene adequate phylogenetic signal short sequences may lose resolution