Presentation is loading. Please wait.

Presentation is loading. Please wait.

Next generation sequencing Xusheng Wang 4/29/2010.

Similar presentations


Presentation on theme: "Next generation sequencing Xusheng Wang 4/29/2010."— Presentation transcript:

1 Next generation sequencing Xusheng Wang 4/29/2010

2 Outline Background Technologies Data analysis and Applications

3 Why sequence DNA?

4 De novo sequencing genome Homo sapiens Mus musculus Rattus norvegicus Pan troglodytes Macaca mulatta Drosophila melanogaster Danio rerio Takifugu rubripes Arabidopsis thaliana oryza sativa Caenorhabditis elegans

5 Individual human genome sequencing

6 Interests to me… C57BL/6J (B) DBA/2J (D) F1 20 generations brother-sister matings BXD1 BXD2 BXD80 + … + F2 BXD RI Strain set BXD RI Strain set fully inbred fully inbred isogenic hetero- geneous hetero- geneous Recombined genomes are needed for mapping female male chromosome pair Inbred Isogenic siblings Inbred Isogenic siblings BXD

7 Cancer genome sequencing

8 Map-and-count experiments RNA-seqChIP-seq

9 History of DNA sequencing Messing & Llaca, PNAS (1998) Sanger Sequencer

10 Next generation sequencing technologies ABI SOLiDIllumina GA2 Roche 454

11 Single molecular sequencing technologies Helicos Single Molecule Real Time (SMRT) DNA sequencing

12

13 Comparison of NGS platforms Michael L. Metzker Nature Reviews Genetics 11, 31-46 (Jan 2010)

14 Roche (454) GS FLX sequencer

15

16

17 Hiseq 2000

18 A A T T A A T T Prepare library Illumina Genome Analyzer

19 Prepare clusters Illumina Genome Analyzer

20 Prepare clusters Illumina Genome Analyzer

21 Sequencing Illumina Genome Analyzer A T C G

22 Sequencing Illumina Genome Analyzer A T C G

23 Sequencing Illumina Genome Analyzer A

24 Sequencing Illumina Genome Analyzer A T C G A

25 Sequencing Illumina Genome Analyzer A T

26

27 14-06-2009 xwang39 2 21 0 144 0 2 TGGAAGAATATAGAGCCTGTCACAATCCTCCCTTTGAGCAGCATTAGTCTACAAAGGAAAAGAAAGT TCTCATGACTCTAGTGCCACCCTCACATACTTAC `_ab_ZaabbbaY`\_\a[[_a`aaa]`_aa\`aa[a\\aW]``a`VW \aa`aZ__Y]Z_aWZV_a][]][a`Y^X[\\[FKT``F\[^W`^TVZTVXODD 0 14-06-2009 xwang39 2 21 0 151 0 2 GGGCACTCTTGTGCGGCAACGGCTGGGTGAGGACTCAACGGGGCCCCGTCCTGTCTAGCCTCGC CCTCGCTTGCGGGACCAGACCGGACACTGGCGAAGTA X\\V_aa_aaa_a_R[[aa^U^`aV^_HXT[NMYPU_\PU]VTRZU[P K`_HIV[GG]IHSDG\_XYGPDW_LFHIOJT`ROJGTDDTZPIGVKJIJDGMD 0 14-06-2009 xwang39 2 21 0 182 0 2 CTTGCAGCAGATGTCTGGACTCCTCCAAAATACATGCCTAGGCGTCAACGCAGTTACCACCTGCTTT CCGCCAGTGATGCGTCCTCCTGGG.TCGCGTCTC VFHMMZOMZKMMZRGFFTHDZRFYMFHFDOTFJDDMEWKYHMMOHDDJ ZIDIGDDV]QIPNFGDODHHMFDDFDIFKDP_DHWDFDRHXHFDRGGGGHYDZ 0 14-06-2009 xwang39 2 21 0 209 0 2 ATGCTGACCAATCCGGAAC.CTCGGCTAGAAAACGCCAGGGGTCGAGAGAAGAATAATCTACAATCC GAAACAGCCAGGGAGTGAAACAGTATACGTGTAT [MRD[WKIPGSMJDQUVRDDJJPSDDMGPPYY_GGMDRFNFDDMHHDJ JHLPZMKDMOMJJDJDWIDRMNHIDHHDHLDNMDDRMMMDSKDDKHHNOFHDR 0 Data from Illumina GA2 LaneTileX Y Filter: 0-No; 1-Yes 1:Single; 2: paired endIndexMechine name

28 SOLiD system

29 SOLID sequencing

30 Ligation-based sequencing

31 Decoding color space Raw error rate = ~3% Corrected error rate = ~0.1%

32 Single SNP detection

33 Data from SOLiD (.csfasta) >4_27_99_F3 T23102012303131123113023203111122120111212222221212 >4_27_1062_F3 T10031230200330020110103303123302223021310021101121 >4_27_1570_F3 T10010101232211103131321213132002223221311002230222 >4_28_935_F3 T10012212103120133333312233123230232102201222200222 >4_28_1306_F3 T32132100010000102200020013302020303331122123002203 >4_29_429_F3 T01111120122121111203111221111111121122112112121212 >4_29_506_F3 T02333113221132010233032202322221300033200222232222 >4_29_636_F3 T10123212011001211211101021013300100021201111220212 >4_29_940_F3 T31100100201111232200321212002103322333232202200222 >4_29_1957_F3 T10022212230311301132100201012012221332220021210222 >4_31_522_F3 T10031122210120122201300211123321220302200222130202 >4_31_1523_F3 T10023101111111301210313312012303131123223022233213

34 Quality value (_QV.qual) >4_27_99_F3 8 8 3 5 17 16 18 2 5 23 2 14 2 21 25 14 12 7 6 5 25 8 12 9 10 7 20 9 5 25 14 5 19 15 8 3 10 16 9 9 5 4 9 9 8 2 7 9 17 13 >4_27_1062_F3 27 25 12 29 27 28 32 26 26 25 27 29 25 18 21 26 27 27 16 23 26 23 14 14 20 29 23 16 12 21 29 23 8 19 20 18 25 16 8 25 24 19 11 11 11 18 18 5 26 7 >4_27_1570_F3 18 10 5 13 8 12 12 8 9 21 14 15 24 22 15 16 25 7 7 8 13 20 5 24 22 11 16 5 16 13 20 25 7 7 5 14 19 10 5 11 8 25 10 3 5 4 24 17 8 3 >4_28_935_F3 12 3 5 17 26 11 7 11 11 23 2 17 12 13 11 4 5 12 10 21 7 2 11 2 14 19 9 7 5 20 8 4 2 10 16 12 10 16 3 6 5 2 14 17 9 3 4 11 20 17 >4_28_1306_F3 14 14 8 23 13 18 31 18 20 20 3 22 17 11 21 8 22 26 20 28 16 22 21 26 25 5 24 26 19 28 4 11 10 6 19 2 22 7 12 20 5 5 6 12 22 4 28 21 11 14 >4_29_429_F3 23 29 26 26 23 26 25 14 25 16 27 26 22 26 23 24 25 24 5 22 27 25 24 9 21 27 24 25 27 26 23 18 16 4 8 16 7 4 10 19 18 18 16 5 6 13 10 16 13 8

35 Comparing Sequencers read length bases per machine run 10 bp1,000 bp200 bp 1 Gb 100 Mb 1Mb 100Gb Illumina, AB/SOLiD short-read sequencers ABI capillary sequencer 454 pyrosequencer (20-100 Mb in 300-400 bp reads) (100Gb in 25-50 bp reads)

36 Comparing Sequencers Roche (454)Illumina GASOLiD ChemistryPyrosequencingReverse terminatorLigation-based AmplificationEmulsion PCRBridge AmpEmulsion PCR Paired ends/sepYes/3kbYes/200 bpYes/3 kb Mb/run100 Mb20 Gb100 Gb Time/run7 h4 days/8days7 days / 14 days Read length400 bp32-100 bp50 bp

37 Data volume

38 … and they give you the picture on the box Read mapping Read mapping is like doing a jigsaw puzzle… …you get the pieces… Problem is, some pieces are easier to place than others…

39 Alignment of reads Reads generated from sequencing is mapped to a reference genome Conventional tools like Blast or Blat do not work well with short sequence reads. Modification of existing alignment algorithms to handle short reads.

40 Alignment Tools ELAND MAQ Bowtie SOAP Bioscope/Corona Lite pipeline Tophat BFAST

41

42 SAM/BAM format File format version Sequence name; Sequence length read group

43 Sequence variations SNPsInsertionDeletion Medium Insertion

44 SNPs between the C57BL/6J and DBA/2J 4,553,000 SOLiDIllumina

45 Evaluating SNPs calls 5% adjacent SNPs 20% adjacent SNPs

46 21 August 2015 D 321 ChrID 0 56173880 56174202 Supports: 15 70 130.916 TAAGAATGAGTTGGCAAATAAAGAGTTTGGTGAGTTTATAGAAATATAGGggccg ataggACAAGGTACAAGGAATGGCTGAAGGAGAGAGGTTG GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG + 56173670 GTGAGTTTATAGAAATATAGG ACAAGGTACAAGGAA + 56173677 GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG + 56173681 TGGTGAGTTTATAGAAATATAGG ACAAGGTACAAGG + 56173687 GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG + 56173690 AGTTTGGTGAGTTTATAGAAATATAGG ACAAGGTACAAGGA + 56173697 GTGAGTTTATAGAAATATAGG ACAAGGTACAAGGAA + 56173700 AGTTTATAGAAATATAGG ACAAGGTACAAGGAATGG + 56173710 TTTGGTGAGTTTATAGAAATATAGG ACAAGGTACAA - 56174339 TGAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG - 56174356 TGAGTTTATAGAAATATAGG ACAAGGTACAAGGAAT - 56174357 GTTTATAGAAATATAGG ACAAGGTACAAGGAATGGC - 56174358 GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG - 56174365 AGTTTATAGAAATATAGG ACAAGGTACAAGGAATGG - 56174373 1base - 1million bases Medium indels detection

47 Large indels detection K. Chen et al., Nature Methods 6: 677-81 (2009) Concordance Insertion Deletion Clone inserted size

48 InDels between the C57BL/6J and DBA/2J

49 Inversion detected by paired-end data Total Inversions Span exon(s) or gene(s) IntronsIntergenic 2915078163

50 Copy Number Variations (CNVs) Total CNVsGainsLosses 21,7397,18214,557 Graubert, et al. 2007 PLoS Genetics Anderson, et al. 2005 Genes & Immunity Several gene members of Klra family was deleted in DBA/2J

51 De novo assembly ABySS ALLPATHS Euler-SR SHRAP SSAKE Velvet SOAP

52 Variation viewed at a genome scale

53 RNA sequencing

54 .csfasta Filtering Ribosomal RNA tRNA TTTT / AAAA Adapters Alignments Reference sequences Merging and sorting Counting reads Novel transcripts RNA sequencing analysis pipeline

55 Alignment methods Transcriptome reads that cross splice junctions Anchor Extend method

56 Alternative splicing Novel Transcribed Region (NTR) Definition: a segment of genomic sequence that is transcribed but is not currently annotated as an exon in a database

57 Our RNAseq data on UCSC genome browser http://genome.ucsc.edu/cgi- bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Williamslab&hgS_otherUserSessionName=eye _RNAseq

58 Finding the new SNP data http://www.genenetwork.org/webqtl/snpBrowser.py

59 Finding the indel data

60 Using the new sequence data

61


Download ppt "Next generation sequencing Xusheng Wang 4/29/2010."

Similar presentations


Ads by Google