Presentation is loading. Please wait.

Presentation is loading. Please wait.

Next-Generation Sequencing of Microbial Genomes and Metagenomes

Similar presentations


Presentation on theme: "Next-Generation Sequencing of Microbial Genomes and Metagenomes"— Presentation transcript:

1 Next-Generation Sequencing of Microbial Genomes and Metagenomes
Christine King Farncombe Metagenomics Facility Human Microbiome Journal Club July 13, 2012

2 Overview Next-generation sequencing Project overview Applications
Instruments Library prep and sequencing chemistry Sequence quality Project overview Microbial genomes Microbial communities

3 DNA Sequencing 1st generation 2nd generation (NGS) 3rd generation
Sanger chain termination Capillary electrophoresis 2nd generation (NGS) High throughput, “massively parallel” Shorter reads Sequencing-by-synthesis 3rd generation Single molecule Nanopores

4 Applications DNA sequencing RNA sequencing De novo genomes
Resequencing Shotgun (e.g. mutant strains) Amplicon (e.g. HLA, cancer) Sequence capture (e.g. exome) Metagenome Amplicon (e.g. 16S, COI, viral) Shotgun ChIP RNA sequencing Gene expression Gene annotation, splice variants Metatranscriptome

5 Instruments

6 Instruments Instrument # of reads Read length (bp) Total output (Gb)
Cost per base Run Time Technology GS FLX 1M 450 0.5 $$$$ ++ emPCR, SBS, light detection GS FLX+ 650 0.6 GS Jr 100K 0.05 GAIIx 640M 2x 150 90 $$ +++ Bridge PCR, SBS, fluororphore HiSeq 2000 6B 2x 100 600 $ MiSeq 12M 2 PacBio RS >10K >1000 0.01 + Single-molecule seq, fluorophore SOLiD 5500xl 1.4B 155 emPCR, probe ligation, fluorophore Ion PGM - 316 >100 0.1 $$$ emPCR, SBS, pH change Ion PGM - 318 6M 1

7 Which instrument(s) to use?
Read length vs number of reads Cost per base, per sample, per project (multiplexing?) Accuracy Run time, wait time Application Length # Reads Accuracy Instruments Considerations De novo (small) +++ ++ MiSeq, 454, Ion Mix lengths De novo (large) HiSeq, 454, SOLiD Mix lengths, MP Re-seq (small) MiSeq, Ion Multiplex? Re-seq (large) HiSeq, SOLiD Enrichment? RNA-seq (count) + Illumina, SOLiD, Ion Ref? Size? Rare? Amplicons 454, MiSeq Size? Multiplex? Metagenomics Illumina, 454, SOLiD Length vs depth

8 Library Preparation Goal: fragments of DNA, each end flanked by adaptor sequences Adaptors contain amplification- and sequencing primer binding sites; platform- and chemistry-specific Optional: sample-specific barcodes/indexes/MIDs/tags allow multiplexing during sequencing Library QC: quantity, size

9 Library Preparation Library types: Shotgun (DNA) Mate pair (DNA)
May begin with ChIP May follow with sequence capture Mate pair (DNA) Amplicon (DNA) Total RNA May enrich for mRNA (poly-A enrichment, rRNA depletion) Convert to cDNA (then similar to DNA protocols) Small RNA RNA ligations, convert to cDNA after

10 Library Preparation: Shotgun
Fragmentation Sonication Nebulization Enzymatic End repair 3’ overhangs digested 5’ overhangs filled 5’ phosphate added

11 Library Preparation: Shotgun
Adapter ligation T-overhangs Forked structure controls orientation Library amplification Few cycles Enrich for correctly-adapted fragments Required to complete adapter structure in some protocols Size selection Gel excision, AMPure beads Limit insert size as needed, remove artifacts

12 Library Preparation: Amplicon
Amplify region of interest using PCR Primers contain adapter sequences

13 Library Preparation: Mate Pair
Begin with large fragments (e.g. 3kb, 20kb) Circularize and fragment again Illumina: direct ligation 454: Cre/Lox recombination Enrich for fragments containing the junction Proceed with shotgun library prep

14 Library Preparation: Mate Pair
Why? Paired sequences are a known distance apart; improves genome assembly Note: 454 calls these “paired end libraries”, not to be confused with Illumina’s “paired end sequencing”!

15 Sequencing: Illumina Cluster generation ~800K clusters/mm^2
Library fragments hybridize to oligos on the flow cell New strand synthesized, original denatured, removed Free end binds to adjacent oligos (bridge formation) Complimentary strand synthesized, denatured (both tethered to flow cell) Repeat to form clonal cluster Cleave one oligo, denature to leave ssDNA clusters ~800K clusters/mm^2

16 Sequencing: Illumina Variety of workflows: Single- or paired end reads
0, 1, or 2 index reads

17 Sequencing: Illumina At each cycle, all 4 fluorescently-labeled nucleotides pass over the flow cell Each cluster incorporates one nt (terminator) per cycle Fluor is imaged, then cleaved De-block and repeat

18 Sequencing: Illumina Other terminology: File format: fastq
cBot – accessory instrument that performs cluster generation Lanes – divisions (8) of HiSeq and GAIIx flow cells PhiX – bacteriophage with small, balanced genome; PhiX library spiked in with samples for QC Phasing/pre-phasing – nt incorporation falls behind or jumps ahead on a portion of strands in the cluster and contributes to noise Chastity filter – measures signal purity (after intensity corrections); if the background signal is high, cluster will be discarded BaseSpace – cloud computing site for processing MiSeq data File format: fastq

19 Sequencing: 454 emPCR: clonal amplification of bead- bound library in microdroplets Library input amounts critical! One molecule per bead Titration procedure

20 Sequencing: 454 Library capture: beads coated with complimentary oligo
Amplification: droplet contains PCR reagents and the other oligo Post-PCR: millions of identical fragments attached to the bead

21 Sequencing: 454 Bead Recovery: physical and chemical disruption
Enrichment: capture successfully amplified beads using biotinylated primers + magnetic, streptavidin beads

22 Sequencing: 454 Deposit bead layers onto PicoTiterPlate: Enzyme beads
Enriched DNA beads More enzyme beads PPiase beads

23 Sequencing: 454

24 Sequencing: 454 Pyrosequencing 4 nucleotides flow separately
If nt incorporation…PPi...light APS + PPi (sulfurylase)  ATP Luciferin + ATP (luciferase)  light + oxyluciferin Amount of light proportional to #nt incorporated Rinse and repeat with next nt

25 Sequencing: 454 Camera captures light emitted from every well during every nucleotide flow

26 Sequencing: 454 Flowgram: representation of a sequence, based on the pattern of light emitted from a single well

27 Sequencing: 454 Other terminology:
Lib-L/Lib-A: adapter variants, “ligated” or “annealed” Titanium chemistry: ~450 bp reads on all instruments XL+ chemistry: ~700 bp reads on the FLX+ instrument Flow: one of the four nucleotides flows over the PTP Cycle: a set of four flows, in order Valley flow: if number of bases incorporated in a given read during that flow is uncertain, e.g. 1.5 units of light (background signal, homopolymers) File format: sff (standard flowgram format)

28 Sequencing: Ion Torrent
Procedures and chemistry similar to 454 Instead of PPi, measure H+ release (pH change) via semiconductor chip No expensive camera or laser required, no modified nucleotides

29 Probability of Error (P)
Sequence Quality Error probabilities determined using training sets, platform- specific biases Expressed as a quality value (QV or Q score) per base Similar to PHRED scores: Q = -10 log10P P = 10 -Q/10 Phred (Q) Score Probability of Error (P) Base Call Accuracy 10 1 in 10 90% 20 1 in 100 99% 30 1 in 1K 99.9% 40 1 in 10K 99.99% 50 1 in 100K 99.999%

30 Project 1: Microbial Genome
Considerations: Reference genome? How much coverage do I want? How big is the genome How much data do I need? bp needed = genome size X coverage Which instrument/chemistry configuration to use? Coverage Depth (number of times a particular base is “covered” by a read (e.g. 25X) Breadth (% of genome with at least 1X coverage)

31 Project 1: Microbial Genome
Sample preparation Isolate high quality (not degraded) and high purity (no RNA) gDNA Verify on a gel Quantify using dsDNA-specific dye Library preparation Can do this yourself if you like ~ $200 per sample for Nextera Cheaper protocols Cheaper in bulk Barcode compatibility

32 Project 1: Microbial Genome
Library QC Insert size confirmed on BioAnalyzer (within range, no artifacts) Pool barcoded libraries (normalize based on PicoGreen quantification) Absolute quantification of library pools using qPCR

33 Project 1: Microbial Genome
MiSeq sequencing Dilute and denature library pool (optimal concentration requires titration...) Spike in PhiX library as needed (e.g. 1%) Prepare and load reagents, flow cell Basic filtering and de-multiplexing performed automatically Download fastq files from BaseSpace

34 Project 1: Microbial Genome
Data processing Additional filtering Trim the ends Remove PCR duplicates Assembly: overlapping reads are assembled to eachother based on sequence similarity = contigs

35 Project 1: Microbial Genome
What’s next? Polish the genome (hybrid assemblies, mate pair libraries) Annotate (ORFs, RNA- seq) Compare

36 Project 2: Microbial Community
Shotgun metagenomics Unbiased survey of community content Random library fragments may provide very little taxonomic resolution (e.g. conserved, unknown) Identify genes, classify by function Targeted metagenomics Limited survey of community content Targeted loci provide excellent taxonomic resolution, but may exclude certain taxa Identify OTUs, classify by taxonomy

37 Project 2: Microbial Community
16S rRNA Multi-copy gene (1.5 kb) Conserved and hypervariable regions Extensive databases from known species

38 Project 2: Microbial Community
Considerations: Biases in sampling methods, culturing, DNA isolation, PCR...replicate Available SOPs How many reads per sample? Read length matters! Sample preparation: Isolate DNA PCR amplify, purify High-fidelity polymerase Barcoded primers No primer dimers! Normalize PCR products and pool

39 Project 2: Microbial Community
454 Sequencing emPCR titrations with different library input Bulk emPCR Sequence Basic filtering Collect sff files Data processing De-multiplexing Additional filtering Trim the barcodes, primers Check for chimeras

40 Project 2: Microbial Community
Clustering Sequences grouped by similarity = OTUs

41 Project 2: Microbial Community
Taxonomic identification OTUs are classifed by comparing to known 16S sequences Level of classification (e.g. family vs genus)? Diversity Within sample Between samples


Download ppt "Next-Generation Sequencing of Microbial Genomes and Metagenomes"

Similar presentations


Ads by Google