Presentation is loading. Please wait.

Presentation is loading. Please wait.

Next Generation Sequencing Lenka Veselovská Laboratory of Developmental Biology and Genomics.

Similar presentations


Presentation on theme: "Next Generation Sequencing Lenka Veselovská Laboratory of Developmental Biology and Genomics."— Presentation transcript:

1 Next Generation Sequencing Lenka Veselovská Laboratory of Developmental Biology and Genomics

2 Next Generation Sequencing (NGS) Modern high-throughput DNA sequencing technologies parallel, rapid Decreasing price, time, workflow complexity, error rate Increasing data quantity and quality, read lenght (data storage capacity), repertoire of bioinformatics tools Wide range of applications Third Generation Sequencing (single molecule, real time, in situ...)

3 Next Generation Sequencing (NGS) Starting material: - DNA (DNA-seq) - RNA (RNA-seq) - DNA fragments bound to selected protein – to analyse thesequences of DNA-binding sites of protein of interest or localisation of histone modifications (ChIP-seq)

4 DNA sequencing De novo genome sequencing and assembly chromosome l eukaryotic viral prokaryotic 2000 – draft human genome sequence 2003 – completed (kind of) 3300 books of 1000 pages with 1000 bp per page Ensembl genomes: -69 higher animals + other model animals -55 insects and lower metazoans -39 plants -563 fungi -Over 200 protist species and subspecies -Over 20 000 bacteria species and subspecies + regular updates

5 DNA sequencing Sequencing of microbial diversity Sorcerer II expedition Microbial communities in oceans, desserts, hot springs, inside bodies

6 DNA sequencing Sequencing of extinct species Neanderthal toe bone

7 DNA sequencing Polymorphisms and associations with diseases

8 Genomi cs Area of genetics that concerns the sequencing and analysis of an organism’s genetic information DNA sequencing + bioinformatics => sequence, assemble and analyze the function and structure of genomes (the complete set of DNA within a single cell of an organism) Bacterial genome Human genome

9 Where can I find genome sequences? Websites „genome browsers“ (include annotations of genes) Ensembl genome browser UCSC genome browser NCBI genome browser

10 Where can I find genome sequences?

11 RNA sequencing = sequencing of RNAs present in the cell mRNA Seq – polyA selection Whole RNA-seq with ribosomal depletion Sequencing of other RNA species Nuclear RNA-seq Total RNA Coding RNA 4 % of total Functional RNA 96 % of total Pre-mRNA (hnRNA) mRNA Pre-rRNA Pre-tRNAsnRNAsnoRNA miRNAsiRNA All organisms Eukaryotes only rRNAtRNA

12 Study of the transcriptome - the complete set of RNA transcripts produced from the genome, under specific circumstances at particular place and time Methods: Microarrays, RNA seq Comparison of gene expression under different conditions (before/after treatment, during development, cancer vs normal cells,….) Transcriptome assembly – discoveries of novel genes, non-coding RNAs, novel splicing variants of known genes an alternative to genome sequencing and assembly of a species with unknown genome, when we are interested only in expressed genes Transcriptomics

13 NGS workflow SampleLibrary preparationSequencing Bioinformatics What you get out is never better than what you put in!!

14 Sample Use sample of as high quality as possible – freshly harvested cells/tissues, or cells/tissues stored in -80°C Final data from old and degraded samples (e.g. Neanderthal bone, RNAs from dead soft tissues or cells) or formalin/paraffin-fixed samples (to solidify tissues to be able to make sections) are never as nice as from fresh samples DNA/RNA extraction -> processing quality of the starting total RNA - RNA integrity number (RIN) RIN unequal read distribution along 5’ and 3’ ends => bad sequencing results Number of reads RIN < 7 RIN > 9 Agilent Bioanalyzer profiles

15 quality of the starting total RNA - RNA integrity number (RIN) RIN unequal read distribution along 5’ and 3’ ends => bad sequencing results Number of reads RNA Quality RIN < 7 RIN > 9 454 reads distribution Agilent Bioanalyzer traces

16 Sample DNA Sonication (using energy of sound) – usually results in fragments ~700 bp If suitable fragment size not achieved after shearing, can use gel size-selection

17 RNA Sample Total RNAmRNA Fragmented mRNA cDNA polyA mRNA selection rRNA depletion Temperature based fragmentation Reverse transcription

18 RNA Sample polyA selection rRNA depletion

19 RNA Sample Total RNAmRNA Fragmented mRNA cDNA polyA mRNA selection rRNA depletion Temperature based fragmentation Reverse transcription

20 Library preparation PCR Similar for DNA and RNA (=cDNA) sequencing

21 Library preparation Suboptimal size range results in suboptimal sequencing results (Illumina sequencing – ideal size range of fragments is 300-700 bp)

22 Sequencing Principles Sequencing by Synthesis Sanger/Dideoxy chain termination (Life Technologies, Applied Biosystems) Pyrosequencing (Roche/454) Reversible terminator (Illumina ) Ion torrent (Life Technologies) Zero Mode Waveguide (Pacific Biosciences) 3rd generation sequencing Sequencing by Oligo Ligation Detection SOLiD (Applied Biosystems) Direct reading of DNA sequence Nanopore sequencing 3rd generation sequencing Electron microscope 3rd generation sequencing

23 Actual Sequencing Platforms Roche/454 (GS FLX+/GS Junior) Illumina Genome Analyzer (HiSeq/MiSeq/NextSeq) Life Technologies (3500 Genetic Analyzer, Ion Torrent Proton/PGM) Pacific Biosciences (PACBIO RSII) Applied Biosystems (SOLiD, 3730xl DNA Analyzer ) First developed in 1986

24 Sequencing Matrices Sanger, 96-well, 8 capillaries 96 x 600 bp / 24 h 1400 € Pyrosequencing, 2 regions 1,000,000 x 600 bp / 20 h 5500 € Revers. terminator, MiSeq 10,000,000 x 250 bp / 40 h 1150 €

25 Sanger (3500 GA, 3730xl DNA Analyzer) Sequencing by synthesis Long individual reads (one 500-1000bp long sequence per clone), not practical for whole genomes

26 dGTP dTTP dATP dCTP ddGTP Target DNA, oligonucleotide primer & DNApol 3’-GGACCCTATGACATGATCGATGAATTGGAAACTAGCTAGATCGGCAC-5’ 5’-CTGGGATACTGTACTAGC-3’ DNApol 3’-GGACCCTATGACATGATCGATGAATTGGAAACTAGCTAGATCGGCAC-5’ 5’-CTGGGATACTGTACTAGC 3’-GGACCCTATGACATGATCGATGAATTGGAAACTAGCTAGATCGGCAC-5’ 5’-CTGGGATACTGTACTAGC 3’-GGACCCTATGACATGATCGATGAATTGGAAACTAGCTAGATCGGCAC-5’ 5’-CTGGGATACTGTACTAGC 3’-GGACCCTATGACATGATCGATGAATTGGAAACTAGCTAGATCGGCAC-5’ 5’-CTGGGATACTGTACTAGC TACTTAACCTTTG Generation of a series of differently sized fragments synthesised from the target DNA molecule that all end with radio-labelled dideoxy-G (specified by C in the target DNA) ddGTP is radioactively labelled TACTTAACCTTTGATCG TACTTAACCTTTGATCGATCTAG TACTTAACCTTTGATCGATCTAGCCG Sanger (3500 GA, 3730xl DNA Analyzer)

27 Reversible Terminator (HiSeq, MiSeq, NextSeq) Sequencing by synthesis The cluster contains copies of both strands of the original DNA (i.e. it’s complementary). Therefore prior to cluster sequencing one strand is removed by cleaving with a restriction enzyme that recognises a sequence within either the pink or blue adapter.

28 Sequencing DNA clusters one base at a time A mix of sequencing primers (complementary to one of the adapter sequences), DNA polymerase and differentially fluorescent labelled reversible chain terminator dNTPs (A, C, T and G) are added to flow cell Depending on the first nucleotide in the cluster, a specific fluorescent reversible chain terminator dNTP is incorporated leading to a stop in DNA synthesis! After washing unincorporated nucleotides away, a laser excites the flow cell and detects which of the four fluorescent chain terminator dNTPs were incorporated in each cluster on the flow cell. i.e. decodes the first sequenced base Once an image recording what was the first nucleotide to be incorporated in each cluster has been taken, both the fluorescent dyes and the blocking group that prevents extension of the DNA are removed (hence ‘reversible chain terminator dNTPs) and the cycle is repeated Reversible Terminator (HiSeq, MiSeq, NextSeq)

29 http://www.illumina.com/technology/next- generation-sequencing.html (video on the right side of the website) http://www.youtube.com/watch?v=77r5p8IBwJk

30 Pyrosequencing (GS FLX, GS Junior) Sequencing by synthesis

31 Pyrosequencing (GS FLX, GS Junior) Sequencing by synthesis

32 Ion torrent sequencing At each time, a chip is flooded with a single nucleotide. If the nucleotide matches the sequence, H+ is released and pH is changed. If it does not match the sequence, pH is not changed. Change in the pH is measured. Sequencing by synthesis

33 Oligo Ligation Detection (SOLiD) Sequencing by ligation

34 Zero Mode Waveguide (Single molecule real time seq) Sequencing by synthesis, also 3rd generation sequencing

35 Nanopore sequencing (direct reading) 3rd generation sequencing

36 Comparison of next-generation sequencing methods MethodRead length Accuracy (single read not consensus) Reads per runTime per run Cost per 1 million bases (in US$) AdvantagesDisadvantages Single-molecule real-time sequencing (Pacific Biosciences) 10,000 bp to 15,000 bp avg (14,000 bp N50); maximum read length >40,000 bases [61][62][63]N50 [61][62][63] 87% single-read accuracy [64] [64] 50,000 per SMRT cell, or 500–1000 megabases [65][66] [65][66] 30 minutes to 4 hours [67] [67] $0.13–$0.60 Longest read length. Fast. Detects 4mC, 5mC, 6mA. [68] [68] Moderate throughput. Equipment can be very expensive.Low accuracy. Ion semiconductor (Ion Torrent sequencing) up to 400 bp98%up to 80 million2 hours$1 Less expensive equipment. Fast. Homopolymer errors. Pyrosequencing (454) 700 bp99.9%1 million24 hours$10 Long read size. Fast. Runs are expensive. Homopolymer errors. Sequencing by synthesis (Illumina) 50 to 300 bp99.9% (Phred30) up to 6 billion (TruSeq paired- end) 1 to 11 days, depending upon sequencer and specified read length [69] [69] $0.05 to $0.15 Potential for high sequence yield, depending upon sequencer model and desired application. Equipment can be very expensive. Requires high concentrations of DNA. Sequencing by ligation (SOLiD sequencing) 50+35 or 50+50 bp99.9%1.2 to 1.4 billion1 to 2 weeks$0.13Low cost per base. Slower than other methods. Has issues sequencing palindromic sequences. [70] [70] Chain termination (Sanger sequencing) 400 to 900 bp99.9%N/A 20 minutes to 3 hours $2400 Long individual reads. Useful for many applications. More expensive and impractical for larger sequencing projects. This method also requires the time consuming step of plasmid cloning or PCR.

37 Next generation sequencing vocabulary Base-pair - basic building block of double-stranded DNA, unit of DNA segment length (bp) Read - continuous sequence produced by sequencer Coverage - the number of short reads that overlap each other within a specific genomic region (how many times the particular base or region is read) Consensus sequence - idealised sequence in which each position represents the base most often found when many sequences are compared Contig - set of overlapping segments (reads) of DNA sequences forming continuous consensus sequence Assembly - aligning and merging fragments of DNA sequence (reads, contigs) in order to reconstruct the original sequence Scaffold - set of linked non-contiguous series of genomic sequences, consisting of contigs separated by gaps of roughly known length Single vs paired-end sequencing Directional vs undirectional libraries/reads

38 Bioinformatic s Analysis of direct sequencing output (e.g. image analysis) to obtain read sequences De novo genome (or transcriptome) assembly Mapping of reads to the known genome sequence (RNA-seq data, ChIP-seq, DNA-seq when looking for polymorphisms etc)

39 Genome assembly Read - continuous sequence produced by sequencer Coverage - the number of short reads that overlap each other within a specific genomic region (how many times the particular base or region is read) Contig - set of overlapping segments (reads) of DNA sequences forming continuous consensus sequence Scaffold - set of linked non-contiguous series of genomic sequences, consisting of contigs separated by gaps of roughly known length

40 Genome assembly

41 Paired-end x Mate- pair Paired-end – sequencing from both fragment ends (< 1 kb) Mate-pair – longer (3-20 kb) molecules circularized via internal adapter x

42 RNA-seq analysis

43 Alternative splicing Differences in gene expression

44 Take-home message NGS - high-throughput, parallel, rapid DNA sequencing Third generation – single molecule, real time, reduced chemistry Basic NGS principles – synthesis, ligation Basic workflow sample - fragmentation - library prep - seq run - data analysis Applications – de novo DNA seq, RNA seq Choose the right one application and prepare sample appropriately Basic data analysis pipeline image acquisition, quality metrics - filtering - contig building - annotation


Download ppt "Next Generation Sequencing Lenka Veselovská Laboratory of Developmental Biology and Genomics."

Similar presentations


Ads by Google