Presentation is loading. Please wait.

Presentation is loading. Please wait.

Short Read Sequencing Analysis Workshop

Similar presentations


Presentation on theme: "Short Read Sequencing Analysis Workshop"— Presentation transcript:

1 Short Read Sequencing Analysis Workshop
Day 1 Considerations for Sequencing

2 Different types of sequencing libraries
Whole genome sequencing RNA Sequencing/GRO-Seq ChIP-seq DNAse 1, ATAC-seq Exome sequencing Methyl-Seq Metagenomic/Amplicon (low diversity)

3 Platform Comparsion

4 Platform Comparison Platform Comparison MiniSeq MiSeq NextSeq
HiSeq 2500 HiSeq 3000/4000 HiSeq X Output per run 1.65Gb – 7.5Gb 0.5Gb – 15Gb 16Gb – 120Gb 9Gb – 500Gb 105Gb – 750Gb 800Gb – 900 Gb Reads per run 7M – 25M 12M – 25M 130M – 400M 300M – 4B 2.1M – 2.5B 2.6B – 3B Max read length 2 x 150 2 x 300 2 x 250 Time per run 7h – 24h 5h – 56h 11h – 30h 7h – 6d 1d – 3.5d <3d 2 color/4color 2 color 4 color Flowcell PE SR / PE Pattern Samples/FC 1 2 or 8 8

5 How does Illumina sequencing work?
Library generation and affixing library to flow cell

6 How does Illumina sequencing work?
Cluster Generation

7 How does Illumina sequencing work?
Sequencing by synthesis with reversible terminators

8 How does Illumina sequencing work?

9 Output: Millions of short read sequences
ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… Index Read 1 (i7) TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA ACGTTCTC Index Read 2 (i5) ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC Read 2 CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC…

10 Current Illumina kits allow up to 384 unique indexes to be pooled
Demultiplexing Read 1 ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… Index Read 1 (i7) TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA ACGTTCTC Index Read 2 (i5) ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC Read 2 CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC… Current Illumina kits allow up to 384 unique indexes to be pooled

11 Demultiplexing Read 1 ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC…
CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… Index Read 1 (i7) TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA Index Read 2 (i5) ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC Read 2 CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC… Sample 1 Read Read 2 ATCGACGGTTAACTGATCG… CTGGTGACAACTGATGCTT… CGTGGACCAAATGGCACAT… CCAGTGAACGTGAGCAAGT… Sample 3 Read Read 2 CTGTGAAACAATTGGGGAT… GGTTGACCATTGGGGTGAC… Sample 2 Read Read 2 ATGCGTGCTGCAGTGCCAC… TGACCATTGGGTACAACCC…

12 What to do with the data? Short Read Sequencing
Quality Metrics & Trimming Assembly Align to reference genome Variant Calling Expression/Read Depth Alternative splicing Peak/Region identification Metagenomics

13 Quality Assessment & Trimming
Pinpoint problems with library prep/sequencing Identify possible biases Improve mapping through trimming

14 Align to reference genome
Chr Sample 1 reads Sample 2 reads Sample 3 reads Bowtie2 Tophat2 BWA

15 Variant Calling Reference Chr A C C C C C C

16 Differential Expression
Reference Chr

17 Alternative Splicing

18 Peak/Region identification
Reference Chr Peak

19 Experimental Design considerations
Genome Size Read Length Sequencing Depth # of Replicates Single-end vs. Paired-end Insert Size

20 Coverage & Read-depth Coverage = estimate of average number of reads covering a single base Avg Coverage = (# reads) x (read length) size of genome Reference Depth D E P T H

21 Typical Coverage Requirements
DNA-Resequencing (SNPs, small indels) 30X with paired-end reads De novo DNA-Seq 100X minimum, longest paired-end, multiple insert size runs Exome X of the exome

22 What that means in reads...
30X Coverage with 2 x 150 bp reads For E. coli, ~4.6 Mb 138 Mbp, 0.46 Million reads ~3% of a MiSeq run For Human, ~3.2 Gb 96 Gbp, 320 Million reads 80% of a NextSeq High Output run or 1.3 lanes of HiSeq 2500 run

23 RNA-Seq Requirements Can’t use coverage as a measure
Differential Expression (highly expressed) Small genomes: 5 Million reads Large genomes: Million reads De novo Assembly/DE (lowly expressed) Small genomes: Million reads Large genomes: Million reads ***For RNA-Seq, replicates typically more powerful than read depth, read length

24 Which Sequencer should I use?
MiSeq 15-25 M reads/run 8h – 4 days/run 1x50 to 2x300 $$$/bp NextSeq M reads/run 12 – 30 h/run 1x75 to 2x150 $$/bp HiSeq 2500 250 M reads/lane, 8 lanes/run 7h – 3 d/run 1x36 to 2x125 $$/bp HiSeq 4000 312 M reads/lane, 8 lanes/run 1 – 3.5 d/run 1x50 to 2x150 $/bp HiSeq X Ten 350 M reads/lane, 8 lanes/run 3 d/run 2x150 $/bp BUT minimums on orders

25 Other considerations Base diversity (at each position)
Custom versus kitted libraries – kit biases PCR/PCR-free libraries How unique is the run-type you want Queue times/Data delivery times Many more....

26 Questions?


Download ppt "Short Read Sequencing Analysis Workshop"

Similar presentations


Ads by Google