Download presentation
Presentation is loading. Please wait.
Published byGarey Nichols Modified over 9 years ago
1
DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing
2
DM ChurchLast Updated: 7 May 2012 http://omicsmaps.com/ Nick Loman and James Hadfield
3
DM ChurchLast Updated: 7 May 2012
4
DM ChurchLast Updated: 7 May 2012 Koboldt et al., 2010 (Figure 3)
5
DM ChurchLast Updated: 7 May 2012
6
DM ChurchLast Updated: 7 May 2012 Bench work to build libraries and sequence Clean up and QA reads Alignments to Genome or Transcriptome Analysis of Alignments
7
DM ChurchLast Updated: 7 May 2012 Koboldt et al., 2010 Sample Contamination Library chimeras Sample mix-ups Tumor-normal switches Run quality
8
DM ChurchLast Updated: 7 May 2012 Koboldt et al, (Fig 4A)
9
DM ChurchLast Updated: 7 May 2012
10
DM ChurchLast Updated: 7 May 2012 Chor et al., 2009
11
DM ChurchLast Updated: 7 May 2012 CCL Bio
12
DM ChurchLast Updated: 7 May 2012 GCTACGGCATTCAGGCATCAGGCATTAGCAG GGCATTCAGGGATCAGGCATTAGC-> <-CATGGCATTCAGGGATCAGGCATT <-GCCATGGCATTCAGGGATCAGGC CATTCAGGGATCAGGCATTAGCAG-> GGCATTCAGGGATCAGGCATTAGC-> CATTCAGGGATCAGGCATTAGCAG-> GGCATTCAGGGATCAGGCATT-> <-GGATCAGGCATTAGCAG <-GATCAGGCATTAGCAG <-GGATCAGGCATTAGCAG
13
DM ChurchLast Updated: 7 May 2012 High Coverage: qualities may not be needed
14
DM ChurchLast Updated: 7 May 2012 Low Coverage: qualities are important
15
DM ChurchLast Updated: 7 May 2012 Custodia-Lora et al., 2003
16
DM ChurchLast Updated: 7 May 2012 FASTQ Example FASTQ example from: Cock et al. (2009). Nuc Acids Res 38:1767-1771. For analysis, it may be necessary to convert to the Sanger form of FASTQ…For example, Illumina stores quality scores ranging from 0-62; Sanger quality scores range from 0-93. Solexa quality scores have to be converted to PHRED quality scores.
17
DM ChurchLast Updated: 7 May 2012 SAM (Sequence Alignment/Map) It may not be necessary to align reads from scratch…you can instead use existing alignments in SAM format – SAM is the output of aligners that map reads to a reference genome – Tab delimited w/ header section and alignment section Header sections begin with @ (are optional) Alignment section has 11 mandatory fields – BAM is the binary format of SAM http://samtools.sourceforge.net/
18
DM ChurchLast Updated: 7 May 2012 http://samtools.sourceforge.net/SAM1.pdf Mandatory Alignment Fields
19
DM ChurchLast Updated: 7 May 2012 http://samtools.sourceforge.net/SAM1.pdf Alignment Examples Alignments in SAM format
20
DM ChurchLast Updated: 7 May 2012 chr18611426586116346nsv433165 chr218417741846089nsv433166 chr1629504462955264nsv433167 chr171435038714351933nsv433168 chr173283169432832761nsv433169 chr173283169432832761nsv433170 chr186188055061881930nsv433171 chr11675982916778548chr1:21667704270866- chr11676319416784844chr1:146691804407277+ chr11676319416784844chr1:144004664408925- chr11676319416779513chr1:142857141291416- chr11676319416779513chr1:143522082293473- chr11676319416778548chr1:146844175284555- chr11676319416778548chr1:147006260284948- chr11676341116784844chr1:144747517405362+ Valid BED files
21
DM ChurchLast Updated: 7 May 2012 GTF
22
DM ChurchLast Updated: 7 May 2012 ##gff-version 3 ##gvf-version 1.02 ##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090 ##genome-build NCBI MGSCv36 ##assembly-name MGSCv36 ##assembly-accession GCF_000001635.15 ##file-date 2011-11-18 # Study_accession: Combined studies on MGSCv36 # Display_name: Combined studies on MGSCv36 # Study_description: Combined studies on MGSCv36 chr1dbVarcopy_number_variation9004444290114410... ID=nsv433533;Name=nsv433533;Start_range=.,90044442;End_range=90114410,. chr4dbVarcopy_number_variation121483931121646639... ID=nsv433534;Name=nsv433534;Start_range=.,121483931;End_range=121646639,. chr9dbVarcopy_number_variation109128634109146964... ID=nsv433535;Name=nsv433535;Start_range=.,109128634;End_range=109146964,. chr17dbVarcopy_number_variation3024062730614866... ID=nsv433536;Name=nsv433536;Start_range=.,30240627;End_range=30614866,. chr17dbVarcopy_number_variation3098372231036099... ID=nsv433537;Name=nsv433537;Start_range=.,30983722;End_range=31036099,. chr17dbVarcopy_number_variation3490708834962504... ID=nsv433538;Name=nsv433538;Start_range=.,34907088;End_range=34962504,. GVF format
23
DM ChurchLast Updated: 7 May 2012 http://www.ncbi.nlm.nih.gov/dbvar http://www.ebi.uk/dgva http://www.ncbi.nlm.nih.gov/snp Derived data
24
DM ChurchLast Updated: 7 May 2012 Derived data
25
DM ChurchLast Updated: 7 May 2012 Actual data
26
DM ChurchLast Updated: 7 May 2012 Getting exponential growth under control
27
DM ChurchLast Updated: 7 May 2012 Trace Organization seq1 seq2 FASTA Quality Chromatogram Experimental info Sample FASTA Quality Chromatogram Experimental info Sample SRA Organization Experiments Samples Sequences and Qualities
28
DM ChurchLast Updated: 7 May 2012 Era of NGS Explosion FASTQ Era Bits/Base Era As of April 10, 2012 SRA contains less bytes then bases
29
DM ChurchLast Updated: 7 May 2012 New Cycle Decision Circle What data series to store Redundancy removal Normalization Lossy vs Lossless Compression tuning Practical Application BAM and similar formats containing both raw reads and alignments become primary output of raw sequencing Increases the number of data series Compression By Reference reduces sizes of other data series New sets of tradeoffs New compression algorithms
30
DM ChurchLast Updated: 7 May 2012 Analyzing New Compression Method Data from 1000 Genome Project All available combinations of samples, platforms, and aligners 3114 files 27 Tb of disk space after compression BAMs from 1000 Genome Project Names are dropped after restoring mates Only sequencing quality score is saved None of non-redundant optional tags are preserved BAM treatment Occasional alignments to stretches of Ns on the reference and beyond the reference were converted to unaligned Different PCR duplicate flags for mates Correction of BAM inconsistencies
31
DM ChurchLast Updated: 7 May 2012 Changes To SRA Run Browser
32
DM ChurchLast Updated: 7 May 2012 http://aws.amazon.com/datasets/4383
33
DM ChurchLast Updated: 7 May 2012 https://main.g2.bx.psu.edu/
34
DM ChurchLast Updated: 7 May 2012 http://www.genomespace.org/
35
DM ChurchLast Updated: 7 May 2012 Science 1 July 2011: Vol. 333 no. 6038 pp. 53-58 DOI: 10.1126/science.1207018
36
DM ChurchLast Updated: 7 May 2012 Li et al., 2011, Figure 1
37
DM ChurchLast Updated: 7 May 2012 Li et al., 2011 Fig. 2
38
DM ChurchLast Updated: 7 May 2012 Kleinman et al., 2012 Fig 1
39
DM ChurchLast Updated: 7 May 2012 Kleinman et al., 2012 Table 1
40
DM ChurchLast Updated: 7 May 2012 Lin et al., 2012 Fig 1
41
DM ChurchLast Updated: 7 May 2012 Lin et al., 2012 Fig 2
42
DM ChurchLast Updated: 7 May 2012 Pickrell et al., 2012 Fig 1
43
DM ChurchLast Updated: 7 May 2012 Li et al, 2012 Fig 1
44
DM ChurchLast Updated: 7 May 2012 Li et al., 2012 Fig 2
45
DM ChurchLast Updated: 7 May 2012 Li et al., 2012 Fig 3
46
DM ChurchLast Updated: 7 May 2012 Li et al, 2012 Fig 4
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.