Download presentation
Presentation is loading. Please wait.
Published byAlyson Wood Modified over 9 years ago
1
Next generation sequencing Xusheng Wang 4/29/2010
2
Outline Background Technologies Data analysis and Applications
3
Why sequence DNA?
4
De novo sequencing genome Homo sapiens Mus musculus Rattus norvegicus Pan troglodytes Macaca mulatta Drosophila melanogaster Danio rerio Takifugu rubripes Arabidopsis thaliana oryza sativa Caenorhabditis elegans
5
Individual human genome sequencing
6
Interests to me… C57BL/6J (B) DBA/2J (D) F1 20 generations brother-sister matings BXD1 BXD2 BXD80 + … + F2 BXD RI Strain set BXD RI Strain set fully inbred fully inbred isogenic hetero- geneous hetero- geneous Recombined genomes are needed for mapping female male chromosome pair Inbred Isogenic siblings Inbred Isogenic siblings BXD
7
Cancer genome sequencing
8
Map-and-count experiments RNA-seqChIP-seq
9
History of DNA sequencing Messing & Llaca, PNAS (1998) Sanger Sequencer
10
Next generation sequencing technologies ABI SOLiDIllumina GA2 Roche 454
11
Single molecular sequencing technologies Helicos Single Molecule Real Time (SMRT) DNA sequencing
13
Comparison of NGS platforms Michael L. Metzker Nature Reviews Genetics 11, 31-46 (Jan 2010)
14
Roche (454) GS FLX sequencer
17
Hiseq 2000
18
A A T T A A T T Prepare library Illumina Genome Analyzer
19
Prepare clusters Illumina Genome Analyzer
20
Prepare clusters Illumina Genome Analyzer
21
Sequencing Illumina Genome Analyzer A T C G
22
Sequencing Illumina Genome Analyzer A T C G
23
Sequencing Illumina Genome Analyzer A
24
Sequencing Illumina Genome Analyzer A T C G A
25
Sequencing Illumina Genome Analyzer A T
27
14-06-2009 xwang39 2 21 0 144 0 2 TGGAAGAATATAGAGCCTGTCACAATCCTCCCTTTGAGCAGCATTAGTCTACAAAGGAAAAGAAAGT TCTCATGACTCTAGTGCCACCCTCACATACTTAC `_ab_ZaabbbaY`\_\a[[_a`aaa]`_aa\`aa[a\\aW]``a`VW \aa`aZ__Y]Z_aWZV_a][]][a`Y^X[\\[FKT``F\[^W`^TVZTVXODD 0 14-06-2009 xwang39 2 21 0 151 0 2 GGGCACTCTTGTGCGGCAACGGCTGGGTGAGGACTCAACGGGGCCCCGTCCTGTCTAGCCTCGC CCTCGCTTGCGGGACCAGACCGGACACTGGCGAAGTA X\\V_aa_aaa_a_R[[aa^U^`aV^_HXT[NMYPU_\PU]VTRZU[P K`_HIV[GG]IHSDG\_XYGPDW_LFHIOJT`ROJGTDDTZPIGVKJIJDGMD 0 14-06-2009 xwang39 2 21 0 182 0 2 CTTGCAGCAGATGTCTGGACTCCTCCAAAATACATGCCTAGGCGTCAACGCAGTTACCACCTGCTTT CCGCCAGTGATGCGTCCTCCTGGG.TCGCGTCTC VFHMMZOMZKMMZRGFFTHDZRFYMFHFDOTFJDDMEWKYHMMOHDDJ ZIDIGDDV]QIPNFGDODHHMFDDFDIFKDP_DHWDFDRHXHFDRGGGGHYDZ 0 14-06-2009 xwang39 2 21 0 209 0 2 ATGCTGACCAATCCGGAAC.CTCGGCTAGAAAACGCCAGGGGTCGAGAGAAGAATAATCTACAATCC GAAACAGCCAGGGAGTGAAACAGTATACGTGTAT [MRD[WKIPGSMJDQUVRDDJJPSDDMGPPYY_GGMDRFNFDDMHHDJ JHLPZMKDMOMJJDJDWIDRMNHIDHHDHLDNMDDRMMMDSKDDKHHNOFHDR 0 Data from Illumina GA2 LaneTileX Y Filter: 0-No; 1-Yes 1:Single; 2: paired endIndexMechine name
28
SOLiD system
29
SOLID sequencing
30
Ligation-based sequencing
31
Decoding color space Raw error rate = ~3% Corrected error rate = ~0.1%
32
Single SNP detection
33
Data from SOLiD (.csfasta) >4_27_99_F3 T23102012303131123113023203111122120111212222221212 >4_27_1062_F3 T10031230200330020110103303123302223021310021101121 >4_27_1570_F3 T10010101232211103131321213132002223221311002230222 >4_28_935_F3 T10012212103120133333312233123230232102201222200222 >4_28_1306_F3 T32132100010000102200020013302020303331122123002203 >4_29_429_F3 T01111120122121111203111221111111121122112112121212 >4_29_506_F3 T02333113221132010233032202322221300033200222232222 >4_29_636_F3 T10123212011001211211101021013300100021201111220212 >4_29_940_F3 T31100100201111232200321212002103322333232202200222 >4_29_1957_F3 T10022212230311301132100201012012221332220021210222 >4_31_522_F3 T10031122210120122201300211123321220302200222130202 >4_31_1523_F3 T10023101111111301210313312012303131123223022233213
34
Quality value (_QV.qual) >4_27_99_F3 8 8 3 5 17 16 18 2 5 23 2 14 2 21 25 14 12 7 6 5 25 8 12 9 10 7 20 9 5 25 14 5 19 15 8 3 10 16 9 9 5 4 9 9 8 2 7 9 17 13 >4_27_1062_F3 27 25 12 29 27 28 32 26 26 25 27 29 25 18 21 26 27 27 16 23 26 23 14 14 20 29 23 16 12 21 29 23 8 19 20 18 25 16 8 25 24 19 11 11 11 18 18 5 26 7 >4_27_1570_F3 18 10 5 13 8 12 12 8 9 21 14 15 24 22 15 16 25 7 7 8 13 20 5 24 22 11 16 5 16 13 20 25 7 7 5 14 19 10 5 11 8 25 10 3 5 4 24 17 8 3 >4_28_935_F3 12 3 5 17 26 11 7 11 11 23 2 17 12 13 11 4 5 12 10 21 7 2 11 2 14 19 9 7 5 20 8 4 2 10 16 12 10 16 3 6 5 2 14 17 9 3 4 11 20 17 >4_28_1306_F3 14 14 8 23 13 18 31 18 20 20 3 22 17 11 21 8 22 26 20 28 16 22 21 26 25 5 24 26 19 28 4 11 10 6 19 2 22 7 12 20 5 5 6 12 22 4 28 21 11 14 >4_29_429_F3 23 29 26 26 23 26 25 14 25 16 27 26 22 26 23 24 25 24 5 22 27 25 24 9 21 27 24 25 27 26 23 18 16 4 8 16 7 4 10 19 18 18 16 5 6 13 10 16 13 8
35
Comparing Sequencers read length bases per machine run 10 bp1,000 bp200 bp 1 Gb 100 Mb 1Mb 100Gb Illumina, AB/SOLiD short-read sequencers ABI capillary sequencer 454 pyrosequencer (20-100 Mb in 300-400 bp reads) (100Gb in 25-50 bp reads)
36
Comparing Sequencers Roche (454)Illumina GASOLiD ChemistryPyrosequencingReverse terminatorLigation-based AmplificationEmulsion PCRBridge AmpEmulsion PCR Paired ends/sepYes/3kbYes/200 bpYes/3 kb Mb/run100 Mb20 Gb100 Gb Time/run7 h4 days/8days7 days / 14 days Read length400 bp32-100 bp50 bp
37
Data volume
38
… and they give you the picture on the box Read mapping Read mapping is like doing a jigsaw puzzle… …you get the pieces… Problem is, some pieces are easier to place than others…
39
Alignment of reads Reads generated from sequencing is mapped to a reference genome Conventional tools like Blast or Blat do not work well with short sequence reads. Modification of existing alignment algorithms to handle short reads.
40
Alignment Tools ELAND MAQ Bowtie SOAP Bioscope/Corona Lite pipeline Tophat BFAST
42
SAM/BAM format File format version Sequence name; Sequence length read group
43
Sequence variations SNPsInsertionDeletion Medium Insertion
44
SNPs between the C57BL/6J and DBA/2J 4,553,000 SOLiDIllumina
45
Evaluating SNPs calls 5% adjacent SNPs 20% adjacent SNPs
46
21 August 2015 D 321 ChrID 0 56173880 56174202 Supports: 15 70 130.916 TAAGAATGAGTTGGCAAATAAAGAGTTTGGTGAGTTTATAGAAATATAGGggccg ataggACAAGGTACAAGGAATGGCTGAAGGAGAGAGGTTG GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG + 56173670 GTGAGTTTATAGAAATATAGG ACAAGGTACAAGGAA + 56173677 GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG + 56173681 TGGTGAGTTTATAGAAATATAGG ACAAGGTACAAGG + 56173687 GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG + 56173690 AGTTTGGTGAGTTTATAGAAATATAGG ACAAGGTACAAGGA + 56173697 GTGAGTTTATAGAAATATAGG ACAAGGTACAAGGAA + 56173700 AGTTTATAGAAATATAGG ACAAGGTACAAGGAATGG + 56173710 TTTGGTGAGTTTATAGAAATATAGG ACAAGGTACAA - 56174339 TGAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG - 56174356 TGAGTTTATAGAAATATAGG ACAAGGTACAAGGAAT - 56174357 GTTTATAGAAATATAGG ACAAGGTACAAGGAATGGC - 56174358 GAGTTTATAGAAATATAGG ACAAGGTACAAGGAATG - 56174365 AGTTTATAGAAATATAGG ACAAGGTACAAGGAATGG - 56174373 1base - 1million bases Medium indels detection
47
Large indels detection K. Chen et al., Nature Methods 6: 677-81 (2009) Concordance Insertion Deletion Clone inserted size
48
InDels between the C57BL/6J and DBA/2J
49
Inversion detected by paired-end data Total Inversions Span exon(s) or gene(s) IntronsIntergenic 2915078163
50
Copy Number Variations (CNVs) Total CNVsGainsLosses 21,7397,18214,557 Graubert, et al. 2007 PLoS Genetics Anderson, et al. 2005 Genes & Immunity Several gene members of Klra family was deleted in DBA/2J
51
De novo assembly ABySS ALLPATHS Euler-SR SHRAP SSAKE Velvet SOAP
52
Variation viewed at a genome scale
53
RNA sequencing
54
.csfasta Filtering Ribosomal RNA tRNA TTTT / AAAA Adapters Alignments Reference sequences Merging and sorting Counting reads Novel transcripts RNA sequencing analysis pipeline
55
Alignment methods Transcriptome reads that cross splice junctions Anchor Extend method
56
Alternative splicing Novel Transcribed Region (NTR) Definition: a segment of genomic sequence that is transcribed but is not currently annotated as an exon in a database
57
Our RNAseq data on UCSC genome browser http://genome.ucsc.edu/cgi- bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Williamslab&hgS_otherUserSessionName=eye _RNAseq
58
Finding the new SNP data http://www.genenetwork.org/webqtl/snpBrowser.py
59
Finding the indel data
60
Using the new sequence data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.