Presentation is loading. Please wait.

Presentation is loading. Please wait.

Next Generation DNA Sequencing

Similar presentations


Presentation on theme: "Next Generation DNA Sequencing"— Presentation transcript:

1 Next Generation DNA Sequencing
Illumina HiSeq X 1.8 Tbp (3 billion reads) in ~3 days (as of 11/6/2014)

2 Whole Genome Shotgun Sequencing
Randomly Fragment Genomic DNA Sequence Fragments Genome Assembly ...ATCCGTAAATGGGCTGATACTACTAATGC TGGGCTGATACTACTAATGCCAAACTGTACTAGTCCTG... ...ATCCGTAAATGGGCTGATACTACTAATGCCAAACTGTACTAGTCCTG... Contiguous Sequence (Contig)

3 RNA Sequencing (RNA-Seq)
cDNA made from RNA Sequence Fragments cDNA Garber et al, Nat Methods (2011) Characterize all RNA in sample Gene expression level proportional to number of reads Detect alternatively spliced transcripts

4 Typical Next Gen Experiments
Genome sequencing Novel genomes Resequencing Transcriptome sequencing (RNA-seq) Characterize transcripts with or without reference genome Typical length Short (microRNAs, …) Find differentially expressed transcripts Other Methyl-seq ChIP-seq

5

6 Illumina Sequencing Sequencing by Synthesis DNA Sample Construct
Library Cluster Generation in Flow Cell Batching effect per lane, per flow cell. Design your experiment to avoid batching samples in a particular sample group into one lane. 200+ million reads per lane (>100 bp reads)

7 Types of Sequencing Libraries
Single-End Reads - 5’ or 3’ (random) Paired-End Reads - 5’ and 3’ bp Mate-Pair Reads - 5’ and 3’ Analysis software take parameters of how library was constructed into account. 2-5 kbp

8 dUTP replaces dUTT in second strand synthesis
Taken from GIGA Newsletter 13 – Universite de Liège

9 What Does the Data Look Like? FASTQ File Format
Sequence Quality (ASCII character for each base) > 200 million reads in one lane Files so big that they break them up in 40 million reads per file

10 Example Analysis Workflow
Paired-End FASTQ Files FASTQ (_R1.txt) FASTQ (_R2.txt) Align Reads to Genome FastQC (Diagnostics) SAM File Trim Reads (if needed) BAM File

11 Sequence Composition Diagnostics
Unbiased Reads Biased Reads First Position Nearly Always “T” 11

12 GC Bias in First ~15 bp Due to Random Hexamer Priming

13 Trim Sequences Prior To Analysis
Make sure sequencing adapters are removed Trim ends of sequence based on quality scores

14 FastX Toolkit – Hannon Lab at CSHL
Trimmomatic

15 Example Analysis Workflow
Paired-End FASTQ Files FASTQ (_R1.txt) FASTQ (_R2.txt) Align Reads to Genome FastQC (Diagnostics) SAM File Trim Reads (if needed) BAM File

16 Sequence Alignment/Map (SAM) Format
Common file format to store: - Reads - Quality of each base - How reads align to a reference sequence Generated by most next gen analysis software samtools software package

17 samtools Used to Manipulate SAM Files
BAM File PileUp File Call Variants Pileup output file chr1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<& chr1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+ chr1 274 T 23 ,.$....,,.,.,...,,,., <7;<;<<<<<<<<<=<;<;<<6 chr1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<< chr1 276 G 22 TTTTTTTTTTTTTTTTTTTTTTT 33;+<<7=7<<7<&<<1;<<6< chr1 277 T ,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&< chr1 278 G ,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<< chr1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<

18 Binary Alignment (BAM) Files
Common file format to store reads and their alignment to a reference sequence Generated by most next gen analysis software samtools software package UCSC Genome Browser and Ensembl can display them as a custom track IGV from Broad very useful

19 UCSC Genome Browser with 1,000 Genomes Project Data

20 Integrated Genomics Viewer (IGV)

21 LookSeq at Sanger Mouse Genomes Project

22 Glo1 CNV Present in Mouse Genomes Data for A/J
Proximal Flank Chr17: 30.5Mb Max ~50x coverage Glo1 Locus Chr17: 30.7Mb Max >100x coverage Distal Flank Chr17: 31.2Mb Max ~50x coverage 50kb 50kb 50kb

23 Glo1 CNV Not Present in Mouse Genomes Data for NZO
Proximal Flank Chr17: 30.5Mb Max ~25x coverage Glo1 Locus Chr17: 30.7Mb Max ~25x coverage Distal Flank Chr17: 31.2Mb Max ~25x coverage 50kb 50kb 50kb

24 Galaxy (http://main.g2.bx.psu.edu)

25 Public Data Repositories
NCBI EBI SRA Formatted Files FASTQ Files Automatically Forward FASTQ Files to Galaxy SRA ToolKit FASTQ Files

26 NCBI BioProject

27 NCBI Gene Expression Omnibus

28 Overall Analysis Workflow
FASTQ Files Secondary Analysis Read Preprocessing & Diagnostics Align Reads to Reference Analysis of Aligned Reads e.g., Read counts per gene from RNA-Seq Tertiary Analysis Analysis of Read Counts e.g., Differentially expressed genes Analysis of Gene Lists Enrichment Pathway and networks Analysis of Expression Patterns

29 Push-Button Bioinformatics … Be Careful

30 Third Generation Sequencing


Download ppt "Next Generation DNA Sequencing"

Similar presentations


Ads by Google