Presentation is loading. Please wait.

Presentation is loading. Please wait.

DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

Similar presentations


Presentation on theme: "DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing."— Presentation transcript:

1 DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing

2 DM ChurchLast Updated: 7 May 2012 http://omicsmaps.com/ Nick Loman and James Hadfield

3 DM ChurchLast Updated: 7 May 2012

4 DM ChurchLast Updated: 7 May 2012 Koboldt et al., 2010 (Figure 3)

5 DM ChurchLast Updated: 7 May 2012

6 DM ChurchLast Updated: 7 May 2012 Bench work to build libraries and sequence Clean up and QA reads Alignments to Genome or Transcriptome Analysis of Alignments

7 DM ChurchLast Updated: 7 May 2012 Koboldt et al., 2010 Sample Contamination Library chimeras Sample mix-ups Tumor-normal switches Run quality

8 DM ChurchLast Updated: 7 May 2012 Koboldt et al, (Fig 4A)

9 DM ChurchLast Updated: 7 May 2012

10 DM ChurchLast Updated: 7 May 2012 Chor et al., 2009

11 DM ChurchLast Updated: 7 May 2012 CCL Bio

12 DM ChurchLast Updated: 7 May 2012 GCTACGGCATTCAGGCATCAGGCATTAGCAG GGCATTCAGGGATCAGGCATTAGC-> <-CATGGCATTCAGGGATCAGGCATT <-GCCATGGCATTCAGGGATCAGGC CATTCAGGGATCAGGCATTAGCAG-> GGCATTCAGGGATCAGGCATTAGC-> CATTCAGGGATCAGGCATTAGCAG-> GGCATTCAGGGATCAGGCATT-> <-GGATCAGGCATTAGCAG <-GATCAGGCATTAGCAG <-GGATCAGGCATTAGCAG

13 DM ChurchLast Updated: 7 May 2012 High Coverage: qualities may not be needed

14 DM ChurchLast Updated: 7 May 2012 Low Coverage: qualities are important

15 DM ChurchLast Updated: 7 May 2012 Custodia-Lora et al., 2003

16 DM ChurchLast Updated: 7 May 2012 FASTQ Example FASTQ example from: Cock et al. (2009). Nuc Acids Res 38:1767-1771. For analysis, it may be necessary to convert to the Sanger form of FASTQ…For example, Illumina stores quality scores ranging from 0-62; Sanger quality scores range from 0-93. Solexa quality scores have to be converted to PHRED quality scores.

17 DM ChurchLast Updated: 7 May 2012 SAM (Sequence Alignment/Map) It may not be necessary to align reads from scratch…you can instead use existing alignments in SAM format – SAM is the output of aligners that map reads to a reference genome – Tab delimited w/ header section and alignment section Header sections begin with @ (are optional) Alignment section has 11 mandatory fields – BAM is the binary format of SAM http://samtools.sourceforge.net/

18 DM ChurchLast Updated: 7 May 2012 http://samtools.sourceforge.net/SAM1.pdf Mandatory Alignment Fields

19 DM ChurchLast Updated: 7 May 2012 http://samtools.sourceforge.net/SAM1.pdf Alignment Examples Alignments in SAM format

20 DM ChurchLast Updated: 7 May 2012 chr18611426586116346nsv433165 chr218417741846089nsv433166 chr1629504462955264nsv433167 chr171435038714351933nsv433168 chr173283169432832761nsv433169 chr173283169432832761nsv433170 chr186188055061881930nsv433171 chr11675982916778548chr1:21667704270866- chr11676319416784844chr1:146691804407277+ chr11676319416784844chr1:144004664408925- chr11676319416779513chr1:142857141291416- chr11676319416779513chr1:143522082293473- chr11676319416778548chr1:146844175284555- chr11676319416778548chr1:147006260284948- chr11676341116784844chr1:144747517405362+ Valid BED files

21 DM ChurchLast Updated: 7 May 2012 GTF

22 DM ChurchLast Updated: 7 May 2012 ##gff-version 3 ##gvf-version 1.02 ##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090 ##genome-build NCBI MGSCv36 ##assembly-name MGSCv36 ##assembly-accession GCF_000001635.15 ##file-date 2011-11-18 # Study_accession: Combined studies on MGSCv36 # Display_name: Combined studies on MGSCv36 # Study_description: Combined studies on MGSCv36 chr1dbVarcopy_number_variation9004444290114410... ID=nsv433533;Name=nsv433533;Start_range=.,90044442;End_range=90114410,. chr4dbVarcopy_number_variation121483931121646639... ID=nsv433534;Name=nsv433534;Start_range=.,121483931;End_range=121646639,. chr9dbVarcopy_number_variation109128634109146964... ID=nsv433535;Name=nsv433535;Start_range=.,109128634;End_range=109146964,. chr17dbVarcopy_number_variation3024062730614866... ID=nsv433536;Name=nsv433536;Start_range=.,30240627;End_range=30614866,. chr17dbVarcopy_number_variation3098372231036099... ID=nsv433537;Name=nsv433537;Start_range=.,30983722;End_range=31036099,. chr17dbVarcopy_number_variation3490708834962504... ID=nsv433538;Name=nsv433538;Start_range=.,34907088;End_range=34962504,. GVF format

23 DM ChurchLast Updated: 7 May 2012 http://www.ncbi.nlm.nih.gov/dbvar http://www.ebi.uk/dgva http://www.ncbi.nlm.nih.gov/snp Derived data

24 DM ChurchLast Updated: 7 May 2012 Derived data

25 DM ChurchLast Updated: 7 May 2012 Actual data

26 DM ChurchLast Updated: 7 May 2012 Getting exponential growth under control

27 DM ChurchLast Updated: 7 May 2012 Trace Organization seq1 seq2 FASTA Quality Chromatogram Experimental info Sample FASTA Quality Chromatogram Experimental info Sample SRA Organization Experiments Samples Sequences and Qualities

28 DM ChurchLast Updated: 7 May 2012 Era of NGS Explosion FASTQ Era Bits/Base Era As of April 10, 2012 SRA contains less bytes then bases

29 DM ChurchLast Updated: 7 May 2012 New Cycle Decision Circle What data series to store Redundancy removal Normalization Lossy vs Lossless Compression tuning Practical Application BAM and similar formats containing both raw reads and alignments become primary output of raw sequencing Increases the number of data series Compression By Reference reduces sizes of other data series New sets of tradeoffs New compression algorithms

30 DM ChurchLast Updated: 7 May 2012 Analyzing New Compression Method Data from 1000 Genome Project All available combinations of samples, platforms, and aligners 3114 files 27 Tb of disk space after compression BAMs from 1000 Genome Project Names are dropped after restoring mates Only sequencing quality score is saved None of non-redundant optional tags are preserved BAM treatment Occasional alignments to stretches of Ns on the reference and beyond the reference were converted to unaligned Different PCR duplicate flags for mates Correction of BAM inconsistencies

31 DM ChurchLast Updated: 7 May 2012 Changes To SRA Run Browser

32 DM ChurchLast Updated: 7 May 2012 http://aws.amazon.com/datasets/4383

33 DM ChurchLast Updated: 7 May 2012 https://main.g2.bx.psu.edu/

34 DM ChurchLast Updated: 7 May 2012 http://www.genomespace.org/

35 DM ChurchLast Updated: 7 May 2012 Science 1 July 2011: Vol. 333 no. 6038 pp. 53-58 DOI: 10.1126/science.1207018

36 DM ChurchLast Updated: 7 May 2012 Li et al., 2011, Figure 1

37 DM ChurchLast Updated: 7 May 2012 Li et al., 2011 Fig. 2

38 DM ChurchLast Updated: 7 May 2012 Kleinman et al., 2012 Fig 1

39 DM ChurchLast Updated: 7 May 2012 Kleinman et al., 2012 Table 1

40 DM ChurchLast Updated: 7 May 2012 Lin et al., 2012 Fig 1

41 DM ChurchLast Updated: 7 May 2012 Lin et al., 2012 Fig 2

42 DM ChurchLast Updated: 7 May 2012 Pickrell et al., 2012 Fig 1

43 DM ChurchLast Updated: 7 May 2012 Li et al, 2012 Fig 1

44 DM ChurchLast Updated: 7 May 2012 Li et al., 2012 Fig 2

45 DM ChurchLast Updated: 7 May 2012 Li et al., 2012 Fig 3

46 DM ChurchLast Updated: 7 May 2012 Li et al, 2012 Fig 4


Download ppt "DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing."

Similar presentations


Ads by Google