Download presentation
1
Next Generation DNA Sequencing
Illumina HiSeq X 1.8 Tbp (3 billion reads) in ~3 days (as of 11/6/2014)
2
Whole Genome Shotgun Sequencing
Randomly Fragment Genomic DNA Sequence Fragments Genome Assembly ...ATCCGTAAATGGGCTGATACTACTAATGC TGGGCTGATACTACTAATGCCAAACTGTACTAGTCCTG... ...ATCCGTAAATGGGCTGATACTACTAATGCCAAACTGTACTAGTCCTG... Contiguous Sequence (Contig)
3
RNA Sequencing (RNA-Seq)
cDNA made from RNA Sequence Fragments cDNA Garber et al, Nat Methods (2011) Characterize all RNA in sample Gene expression level proportional to number of reads Detect alternatively spliced transcripts
4
Typical Next Gen Experiments
Genome sequencing Novel genomes Resequencing Transcriptome sequencing (RNA-seq) Characterize transcripts with or without reference genome Typical length Short (microRNAs, …) Find differentially expressed transcripts Other Methyl-seq ChIP-seq
6
Illumina Sequencing Sequencing by Synthesis DNA Sample Construct
Library Cluster Generation in Flow Cell Batching effect per lane, per flow cell. Design your experiment to avoid batching samples in a particular sample group into one lane. 200+ million reads per lane (>100 bp reads)
7
Types of Sequencing Libraries
Single-End Reads - 5’ or 3’ (random) Paired-End Reads - 5’ and 3’ bp Mate-Pair Reads - 5’ and 3’ Analysis software take parameters of how library was constructed into account. 2-5 kbp
8
dUTP replaces dUTT in second strand synthesis
Taken from GIGA Newsletter 13 – Universite de Liège
9
What Does the Data Look Like? FASTQ File Format
Sequence Quality (ASCII character for each base) > 200 million reads in one lane Files so big that they break them up in 40 million reads per file
10
Example Analysis Workflow
Paired-End FASTQ Files FASTQ (_R1.txt) FASTQ (_R2.txt) Align Reads to Genome FastQC (Diagnostics) SAM File Trim Reads (if needed) BAM File
11
Sequence Composition Diagnostics
Unbiased Reads Biased Reads First Position Nearly Always “T” 11
12
GC Bias in First ~15 bp Due to Random Hexamer Priming
13
Trim Sequences Prior To Analysis
Make sure sequencing adapters are removed Trim ends of sequence based on quality scores
14
FastX Toolkit – Hannon Lab at CSHL
Trimmomatic
15
Example Analysis Workflow
Paired-End FASTQ Files FASTQ (_R1.txt) FASTQ (_R2.txt) Align Reads to Genome FastQC (Diagnostics) SAM File Trim Reads (if needed) BAM File
16
Sequence Alignment/Map (SAM) Format
Common file format to store: - Reads - Quality of each base - How reads align to a reference sequence Generated by most next gen analysis software samtools software package
17
samtools Used to Manipulate SAM Files
BAM File PileUp File Call Variants … Pileup output file chr1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<& chr1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+ chr1 274 T 23 ,.$....,,.,.,...,,,., <7;<;<<<<<<<<<=<;<;<<6 chr1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<< chr1 276 G 22 TTTTTTTTTTTTTTTTTTTTTTT 33;+<<7=7<<7<&<<1;<<6< chr1 277 T ,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&< chr1 278 G ,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<< chr1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<
18
Binary Alignment (BAM) Files
Common file format to store reads and their alignment to a reference sequence Generated by most next gen analysis software samtools software package UCSC Genome Browser and Ensembl can display them as a custom track IGV from Broad very useful
19
UCSC Genome Browser with 1,000 Genomes Project Data
20
Integrated Genomics Viewer (IGV)
21
LookSeq at Sanger Mouse Genomes Project
22
Glo1 CNV Present in Mouse Genomes Data for A/J
Proximal Flank Chr17: 30.5Mb Max ~50x coverage Glo1 Locus Chr17: 30.7Mb Max >100x coverage Distal Flank Chr17: 31.2Mb Max ~50x coverage 50kb 50kb 50kb
23
Glo1 CNV Not Present in Mouse Genomes Data for NZO
Proximal Flank Chr17: 30.5Mb Max ~25x coverage Glo1 Locus Chr17: 30.7Mb Max ~25x coverage Distal Flank Chr17: 31.2Mb Max ~25x coverage 50kb 50kb 50kb
24
Galaxy (http://main.g2.bx.psu.edu)
25
Public Data Repositories
NCBI EBI SRA Formatted Files FASTQ Files Automatically Forward FASTQ Files to Galaxy SRA ToolKit FASTQ Files
26
NCBI BioProject
27
NCBI Gene Expression Omnibus
28
Overall Analysis Workflow
FASTQ Files Secondary Analysis Read Preprocessing & Diagnostics Align Reads to Reference Analysis of Aligned Reads e.g., Read counts per gene from RNA-Seq Tertiary Analysis Analysis of Read Counts e.g., Differentially expressed genes Analysis of Gene Lists Enrichment Pathway and networks Analysis of Expression Patterns
29
Push-Button Bioinformatics … Be Careful
30
Third Generation Sequencing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.