16S rRNA Experimental Design

Slides:



Advertisements
Similar presentations
The Past, Present, and Future of DNA Sequencing
Advertisements

We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Current methods for high-throughput resequencing of custom targets Adam Gordon Nickerson Lab, UW Genome Sciences WHI Genetics SIG call 3/26/14.
Next–generation DNA sequencing technologies – theory & practice
+ Wrap up and future studies October 1st, Outcome: this could not have been more successful.
Metabarcoding 16S RNA targeted sequencing
MCB Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly.
Next-generation sequencing
Tools for Molecular Biology Amplification. The PCR reaction is a way to quickly drive the exponential amplification of a small piece of DNA. PCR is a.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
University of Oklahoma Genome Center4/14/12.
Sequencing Technologies and Applications at JGI
DNA basics DNA is a molecule located in the nucleus of a cell Every cell in an organism contains the same DNA Characteristics of DNA varies between individuals.
Discovery of new biomarkers as indicators of watershed health and water quality Anamaria Crisan & Mike Peabody.
Genomics – Next-Gen sequencing and Microarrays
Accurate estimation of microbial communities using 16S tags Julien Tremblay, PhD
 16S rRNA gene marker  intra-gene variability  primer selection  size & information content Primer selection, information content, alignment and length.
A statistical base-caller for the Illumina Genome Analyzer Wally Gilks University of Leeds.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
HaloPlexHS Get to Know Your DNA. Every Single Fragment.
SEQUENCING – THE BENCHTOPS. Roche 454 Junior Same technology as 454 FLX Read length: 400 bases Paired-end 100,000 reads 12 hours (instrument time) Output.
1. 2 VARIANTS OF PCR APPLICATIONS OF PCR MECHANICS OF PCR WHAT IS PCR? PRIMER DESIGN.
Elucidating factors behind pair wise distances discrepancies between short and near full-length sequences. We hypothesized that since the 16S rRNA molecule.
The Polymerase Chain Reaction (DNA Amplification)
Accurate estimation of microbial communities using 16S tags
Canadian Bioinformatics Workshops
Convenience Sample of 4 Adults and 6 Infants. Adults 4 visits over 2 weeks; infants 2 visits over 2 weeks Adult specimens: 1) plaque (by method, teeth,
Presented by Samuel Chapman. Pyrosequencing-Intro The core idea behind pyrosequencing is that it utilizes the process of complementary DNA extension on.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Library QA & QC Day 1, Video 3
Introduction to Illumina Sequencing
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
An Overview of Applications for the MiSeq and HiSeq 2500 April 4, 2016 Kevin Shianna, Ph.D. Sequencing Specialist - Illumina, Inc. MGC USERS GROUP.
Robert Edgar Independent scientist
Next-generation sequencing technology
Canadian Bioinformatics Workshops
DNA Sequencing Second generation techniques
Presented By: Emily Lamoureux
Metagenomic Species Diversity.
GENETIC MARKERS (RFLP, AFLP, RAPD, MICROSATELLITES, MINISATELLITES)
Short Read Sequencing Analysis Workshop
Lesson: Sequence processing
Next generation sequencing
Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.
Micelle PCR reduces artifact formation in 16S microbiota profiling
Illumina Processing Steven Leonard
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
Comparison of WGA methods for genotyping fetal nucleated red blood cells for the application of non-invasive prenatal diagnosis Zhouwei Huang­­1, Angela.
Gene expression from RNA-Seq
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Next-generation sequencing technology
Research in Computational Molecular Biology , Vol (2008)
Design and Analysis of Single-Cell Sequencing Experiments
Reverse Complement PCR: fast, low cost amplicon based NGS
Rosie Coates-Brown Final year Bioinformatics trainee
mRNA Sequencing Sample Preparation
Microbiome: 16S rRNA Sequencing
H = -Σpi log2 pi.
ULTRASEQUENCING. Next Generation Sequencing: methods and applications.
ChIP DNA Sample Preparation
Telomere-End Processing
Identification of Bacteria BBT203 Ach
Volume 21, Issue 8, Pages (August 2014)
The Right Tool for the Job: Two Platforms for Targeted DNA Sequencing
BF nd (Next) Generation Sequencing
DESIGN OF EXPERIMENTS by R. C. Baker
Genomic DNA Sample Preparation
Relative proportions of taxa and UPGMA hierarchical clustering of the mock communities. Relative proportions of taxa and UPGMA hierarchical clustering.
Toward Accurate and Quantitative Comparative Metagenomics
Relative abundance of taxa in the 16S rRNA PCR amplicon and gDNA mock communities. Relative abundance of taxa in the 16S rRNA PCR amplicon and gDNA mock.
Presentation transcript:

16S rRNA Experimental Design 2016 Metagenomics Course Dave baker and tom barker Platforms and Pipelines

Considerations Before embarking on an experiment there are many things to consider Choose your regions(s) There is no consensus on which subsets to use for community analysis PCR bias Community bias Size… some as large as 2.5kb (preferential amplification of smaller products) Cycles Chimeras Initial starting template Normalisation across all samples is important! Cost Replicates Biological Technical If time and costs permit do a small pilot or trial using several different strategies looking at different variable regions Depending on the community being studied and the hypotheses posed, different target regions and multiplexing strategies can be employed.

= Bias Bias Bias Bias 7 species 1 Mock pooling of cells 2 Mock pooling of gDNA 3 Mock pooling of PCR 1 2 3 Bias The observed community composition can be a severe distortion of the quantities of bacteria actually present in the microbiome, hampering analysis and threatening the validity of conclusions from metagenomic studies. Bias = Bias Brooks et al 2015

DNA Extraction Kit Conclusions This study demonstrates important differences in the yield and relative abundance of key bacterial families for kits used to isolate bacterial DNA from stool. This highlights the importance of ensuring that all samples to be analyzed together are prepared with the same DNA extraction method, and the need for caution when comparing studies that have used different methods

Choosing your platform Type Reads per run Size Common regions Comments Roche 454 Standard PCR Up to 1 million 400-700 bp V1-V3 No longer supported and expensive Illumina 25 000 000/ Miseq PE300/250 300-500 bp V1-V3, V3-V4, V4, V4-5 Tom to go into more detail Nextera PacBio PCR (1.5 kb) 75 000/cell 1.5 kb Entire region and less Next slides… Minion Barcoded (12) PCR 50 000 Entire region Not reproducible good for quick assessment

PacBio Single Molecule Real Time Sequencing Average amplicon polymerase reads around 15kb and increasing 15% raw error rate

Circular Consensus Sequence (CCS) Reads Based on 33% loading of 150 000 well/ZMW’s CCS Accuracy Full Length 16S reads per cell 90.0 % 50 000 99.0 % 25 000 99.9% 10 000 92 000 reads 2.5 Gbp raw data average read length of 27.4kb

PacBio Read length Improvement 384 barcodes with symmetric barcodes and a potential 73 536 using asymmetric barcodes (16 bp) Although cost per base higher than Illumina, supplementing databases with full-length 16S sequences continues to be important especially in generating niche specific databases Significant increases in several orders of taxa from FL PacBio data compared to short read Illumina data Better taxonomic resolution Less ambiguous classification New Sequel platform increasing reads from one cell by 7X New Sequel cells scalable… Currently 1 million ZMW’s Going up to 5 million in 2017 P7-C5 ??? Read length Improvement P6-C4

PacBio Primer Design Considerations The ability to classify sequences to genus or species level is a function of Read length Sample type Reference database

PacBio Further reading Because of the recent technologies focucussing on particular regions increased readlengths increase the accuracy and sensitivity of classificaction against databases

Illumina’s Two PCR Protocol 1st PCR 16S V4 TruSeq Adapter Overhang Highly Conserved Region Hypervariable Region 4 2nd PCR P5 P7 i5 i7 16S V4 TruSeq Adapter Overhang Amplicon Library Pros Only need to design a primer set for your region of interest. Use the same indexed primers for any region of interest. You can use Illumina’s Nextera XT Index kits for second PCR Cons Requires two PCRs and clean-ups making it more expensive. Sequencing through the region of interest primer loses ~20bp of sequencing from each read.

Kozich et al Dual Index Strategy P5 P7 i5 i7 16S V4 Amplicon Library Highly Conserved Region Hypervariable Region 4 Pad Single PCR Pros Single PCR and clean-up making it cheaper than the two PCR approach. Uses custom sequencing primers so you don’t have to sequence through the region of interest primer. Cons All primers are specific to the region of interest so a whole new set of primers needs to be ordered for each different region. Custom sequencing primers are required.

Phasing, Pre-Phasing, and Colour Matrix Empirical phasing correction algorithm Old versions of MCS/ HCS calculated phasing and pre-phasing corrections for the first 12 cycles and applied this value to the rest of the run. Current versions of software optimise phasing and pre-phasing correction for every cycle. Cross Talk Matrix There are two lasers that excite four dyes, one for each base. The emission spectra of the four dyes overlap slightly. Frequency cross-talk needs to be deconvolved using a frequency cross talk calibration Older versions of MCS/ HCS used the first 4 cycles but newer versions use 11 improving estimations for low diversity samples.

Sequencing Low Diversity Libraries 6.0 GB 85.5% 25.45% PhiX Data once PhiX is removed = 10.03 M Reads (4.5 GB) >=Q30 10.1 GB 89.2% 1.72% PhiX Data once PhiX is removed = 21.73 M Reads (9.9 GB) A C G T Same sequence for first 5 cycles so multiple clusters are called as one. Sequence diverges at later cycles and clusters do not pass filter.

Spacers, Molecular Barcodes, and Chimeras Illumina Sequencing Primer Spacer (0-7bp) Wu et al 2015 Index Two step PCR method reduces PCR biases caused by long barcoded primers. Spacers on each end totalling 7bp shift sequencing phases increasing sequence diversity. Single 12bp index. Average 10% more bases >Q30 ~15% more raw reads. Spacers Molecular Barcodes Randomers used to distinguish PCR duplicates from unique template molecules. Can be used to identify sequencing errors or true variation. Chimeras Amount of template DNA and PCR cycles should be optimised to reduce formation of chimeras. Too much DNA or too many cycles PCR can increase occurrence of chimeras. Use of a polymerase with high processivity has been shown to reduce chimera formation.

Further Reading