Download presentation
Presentation is loading. Please wait.
1
Short Read Sequencing Analysis Workshop
Day 1 Considerations for Sequencing
2
Different types of sequencing libraries
Whole genome sequencing RNA Sequencing/GRO-Seq ChIP-seq DNAse 1, ATAC-seq Exome sequencing Methyl-Seq Metagenomic/Amplicon (low diversity)
3
Platform Comparsion
4
Platform Comparison Platform Comparison MiniSeq MiSeq NextSeq
HiSeq 2500 HiSeq 3000/4000 HiSeq X Output per run 1.65Gb – 7.5Gb 0.5Gb – 15Gb 16Gb – 120Gb 9Gb – 500Gb 105Gb – 750Gb 800Gb – 900 Gb Reads per run 7M – 25M 12M – 25M 130M – 400M 300M – 4B 2.1M – 2.5B 2.6B – 3B Max read length 2 x 150 2 x 300 2 x 250 Time per run 7h – 24h 5h – 56h 11h – 30h 7h – 6d 1d – 3.5d <3d 2 color/4color 2 color 4 color Flowcell PE SR / PE Pattern Samples/FC 1 2 or 8 8
5
How does Illumina sequencing work?
Library generation and affixing library to flow cell
6
How does Illumina sequencing work?
Cluster Generation
7
How does Illumina sequencing work?
Sequencing by synthesis with reversible terminators
8
How does Illumina sequencing work?
9
Output: Millions of short read sequences
ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… Index Read 1 (i7) TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA ACGTTCTC Index Read 2 (i5) ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC Read 2 CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC…
10
Current Illumina kits allow up to 384 unique indexes to be pooled
Demultiplexing Read 1 ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC… CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… Index Read 1 (i7) TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA ACGTTCTC Index Read 2 (i5) ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC Read 2 CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC… Current Illumina kits allow up to 384 unique indexes to be pooled
11
Demultiplexing Read 1 ATCGACGGTTAACTGATCG… ATGCGTGCTGCAGTGCCAC…
CGTGGACCAAATGGCACAT… CTGTGAAACAATTGGGGAT… Index Read 1 (i7) TCAGTGCT ACGTTCTA TCAGTGGG CTCGGCGA Index Read 2 (i5) ACGTTCAT CAACGTTC ATTCAGTG GCCTCGGC Read 2 CTGGTGACAACTGATGCTT… TGACCATTGGGTACAACCC… CCAGTGAACGTGAGCAAGT… GGTTGACCATTGGGGTGAC… Sample 1 Read Read 2 ATCGACGGTTAACTGATCG… CTGGTGACAACTGATGCTT… CGTGGACCAAATGGCACAT… CCAGTGAACGTGAGCAAGT… Sample 3 Read Read 2 CTGTGAAACAATTGGGGAT… GGTTGACCATTGGGGTGAC… Sample 2 Read Read 2 ATGCGTGCTGCAGTGCCAC… TGACCATTGGGTACAACCC…
12
What to do with the data? Short Read Sequencing
Quality Metrics & Trimming Assembly Align to reference genome Variant Calling Expression/Read Depth Alternative splicing Peak/Region identification Metagenomics
13
Quality Assessment & Trimming
Pinpoint problems with library prep/sequencing Identify possible biases Improve mapping through trimming
14
Align to reference genome
Chr Sample 1 reads Sample 2 reads Sample 3 reads Bowtie2 Tophat2 BWA
15
Variant Calling Reference Chr A C C C C C C
16
Differential Expression
Reference Chr
17
Alternative Splicing
18
Peak/Region identification
Reference Chr Peak
19
Experimental Design considerations
Genome Size Read Length Sequencing Depth # of Replicates Single-end vs. Paired-end Insert Size
20
Coverage & Read-depth Coverage = estimate of average number of reads covering a single base Avg Coverage = (# reads) x (read length) size of genome Reference Depth D E P T H
21
Typical Coverage Requirements
DNA-Resequencing (SNPs, small indels) 30X with paired-end reads De novo DNA-Seq 100X minimum, longest paired-end, multiple insert size runs Exome X of the exome
22
What that means in reads...
30X Coverage with 2 x 150 bp reads For E. coli, ~4.6 Mb 138 Mbp, 0.46 Million reads ~3% of a MiSeq run For Human, ~3.2 Gb 96 Gbp, 320 Million reads 80% of a NextSeq High Output run or 1.3 lanes of HiSeq 2500 run
23
RNA-Seq Requirements Can’t use coverage as a measure
Differential Expression (highly expressed) Small genomes: 5 Million reads Large genomes: Million reads De novo Assembly/DE (lowly expressed) Small genomes: Million reads Large genomes: Million reads ***For RNA-Seq, replicates typically more powerful than read depth, read length
24
Which Sequencer should I use?
MiSeq 15-25 M reads/run 8h – 4 days/run 1x50 to 2x300 $$$/bp NextSeq M reads/run 12 – 30 h/run 1x75 to 2x150 $$/bp HiSeq 2500 250 M reads/lane, 8 lanes/run 7h – 3 d/run 1x36 to 2x125 $$/bp HiSeq 4000 312 M reads/lane, 8 lanes/run 1 – 3.5 d/run 1x50 to 2x150 $/bp HiSeq X Ten 350 M reads/lane, 8 lanes/run 3 d/run 2x150 $/bp BUT minimums on orders
25
Other considerations Base diversity (at each position)
Custom versus kitted libraries – kit biases PCR/PCR-free libraries How unique is the run-type you want Queue times/Data delivery times Many more....
26
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.