Download presentation
Presentation is loading. Please wait.
Published byFrederick Arnold Modified over 9 years ago
1
NGS sequencing and Genome Assemblies from Animals and Large Plants Zemin Ning The Wellcome Trust Sanger Institute
2
NGS sequencing technologies Oxford Nanopore Assembly algorithms and Assemblers Phusion2 pipeline Tasmanian Devil genome project Assemblies of Large plant genomes Future work Outline of the Talk:
3
Next-Generation Sequencing
4
NGS Platforms & Performances
5
Oxford Nanopore End of Short Read Sequencing? Read length: upto 100Kb Human genome 50x in 15 Minutes $10 per GB
6
PacBio Capillary Illumina Can we really trust Single Molecule Sequencing?
7
Kmer Size and Assemblability
8
Assembly Method 1ACCTGATC 2CTGATCAA 3TGATCAAT 4AGCGATCA 5CGATCAAT 6GATCAATG 7TCAATGTG 8CAATGTGA 1. Overlap graph Sequencing reads: 2. de Bruijn graph 3. String graph
9
Various Assembly Pipelines
10
Phusion2 Assembly Pipeline Illumina Reads Assembly Contigs Consensus Generation 2x75 or 2x100bp Base Correction Data Process Reads Group
11
Phusion2 Assembly Pipeline Illumina Reads Assembly Mate Pair Reads BAC Ends Supercontig Contigs 2x75 or 2x100bp AGPcontig Flow-sorting Reads Map Markers
12
Spinner – a scaffolding tool Spinner uses mate pair data to scaffold contigs. Contigs, and pairs of contigs connected by pairs, define a bi-directional graph: Using expected insert size, a estimate of the gap size can be given for each contig.
13
Spinner – still to do These techniques alone produces useful results. Further stages will be used to resolve repeats pairs that “jump over” repeats, and graph flow concepts.
14
Tasmanian devil Tasmanian tiger Tasmanian Australian
16
Tasmanian devil Opossum Wallaby Tasmanian devil
17
Tasmanian devil facial tumour disease (DFTD) n Transmissible cancer characterised by the growth of large tumours on the face, neck and mouth of Tasmanian devils n Transmitted by biting n Commonly metastasises n First observed in 1996 n Primarily affects adults >1yr n Death in 4 – 6 months
18
Reedy Marsh 2007 Mangalore 2007 Mt William 2007 or 2008 Coles Bay Upper Natone 2007 Narawntapu 2007 Strain 1, tetraploid Strain 2 Strain 3 DFTD samples for sequencing DFTD originated here c.1996 Area still DFTD free Unknown strain “Evolved” Forestier 2007
19
Reedy Marsh 2007 Mangalore 2007 Mt William Coles Bay Upper Natone 2007 Narawntapu 2007 Devil Genomes Sequenced Forestier 2007 Tumour 2 (53T) Tumour 1 (87T) Salem - A female Tasmanian Devil lived Taronga Zoo in Sydney.
21
Sequencing T. Devil on Illumina: Strategy Tumour or normal genomic DNA Fragments of defined size 0.5, 2, 5, 7, 8, 10 kb Sequencing 2x100bp reads short insert 2x50bp mate pairs Sequencing performed at Illumina Salem (91H)Joey (31H)Cancer 1 (87T)Cancer 2 (53T) Read Coverage85x40x56x84x
22
12345 678X 1 4 2a 3a 6 2b 3b 5 Devil – Opossum Homology Map Based on Hybridisation Results of Devil Paints onto Opossum Chromosomes X Opossum Devil Opossum chromosome images were taken from Duke et a. 2007, Chromosome Res 15:361-370
23
Flow cytometry analysis of chromosomal mixture of devil and opossum Opossum Tasmanian devil 1 2 3 4 5 6 X 1 2 3 4 5+8 7 6 X OpossumDevil ChrSeqFC 1748611571 2541484610 3526483556 4430423450 5309321341 6245296277 7263264 8308321 X61116121 Total 343133192926 Genome size
24
Table 1 Run ID, Template names, Number of reads and Chromosome size 4972_1 chr1 IL20_4972:1 19.8 571 4967_1 chr2 IL21_4967:1 20.0 610 4971_1 chr3 IL30_4971:1 21.7 556 4964_1 chr4 IL14_4964:1 7.26 450 4969_1 chr5 IL17_4969:1 7.06 341 4969_2 chr6 IL17_4969:2 8.59 277 4969_3 chrx IL17_4969:3 9.43 122 Read mapping coefficient: e = Size_of_Chr/Num_reads_in_lane
25
Perfect - Reads from the same library were mapped to the contig
26
Acceptable - Majority of the reads were from the same library, but there were reads from other libraries
27
Bad – mis-assembly error Majority of the reads in one region were from one library. But there is a transition from which we see a new library, i.e. switch to another chromosome.
28
Unassigned contigs were placed by supercontigs using mate pairs
29
Chr_ID Chr_size Scaffolds_assigned Bases_assigned Mb Chr1 571 6729684 Chr2 610 8381 740 Chr3 556 7197 641 Chr4 450 4817 487 Chr5 341 3188 300 Chr6 277 2844 263 Chrx 122 2378 86.6 Unassigned 440 1.23 Scaffolds Assigned to Chromosomes using Flow-sorting Data
30
Solexa reads : Number of read pairs: 1130 Million; Finished genome size: 3.1 GB; Read length:2x100bp; Estimated read coverage: ~80X; Insert size: 410/50-600 bp; Mate pair data:2k,4k,5k,6k,8k,10k Number of reads clustered:1010 Million Assembly features: - stats Contigs Supercontigs Total number of contigs: 178,71126,954 Total bases of contigs: 2.95 Gb3.08 Gb N50 contig size: 28,9212,244,460 Largest contig:214,4566,014,864 Averaged contig size: 16,511114,451 Contig coverage on genome: ~94%>99% Ratio of placed PE reads:~92%? Genome Assembly Normal – T. Devil
31
Solexa reads :Tumour_53T Tumour_87T Number of read pairs: 760 Million669 M; Finished genome size: 3.1 GB3.1 GB; Read length:2x1002x100; Estimated read coverage: ~75X~56X; Insert size: 300bp 300bp; Number of reads clustered:710 Million603 M Assembly features: - stats Tumour_53T Tumour_87T Total number of contigs: 335,215335,531 Total bases of contigs: 3.05 Gb2.98 Gb N50 contig size: 21,58219,346 Largest contig:175,353139,414 Averaged contig size: 9,0968,892 Contig coverage on genome: ~95%~95% Ratio of placed PE reads:~92%~92% Devil Tumour Genome Assemblies
32
Salem (91H)Joey (31H)Cancer 1 (87T)Cancer 2 (53T) Coverage35.5828.8040.4933.14 Total SNPs615,084646,186758,023738,793 Het SNPs 524,040371,412465,630462,722 Hom SNPs91,044274,774292,393276,071 Total indels 235,632262,461320,820 312,287 Het indels 183,978146,299186,094183,747 Hom indels 51,65481,120 / 116,162 134,726128,540 Variant calling : catalogue of variants in all 4 genomes *Data source: Illumina. Variants removed within 500bp of a contig end, Q(indel) < 30 and Q(GT) < 5.
33
Homozygous SNPs
35
46039 Candidates 40689 Base changed Homozygous Base Corrections
36
51654 Candidates 45337 Del changed Homozygous Indel Corrections
37
DFTD1 1 I J M1 M3 der2 F1 K 3 G/H 4 F M4 A 5 FE der5 der1 M2? 6 F2 D der6 X 2 X? 6 5 2 5 5 2 1 X 2 X 6
38
DFTD2 B J M M3 2 K1/K2 3 D JH M2 5 der5 FG 6 der6 L K3 1 der1 I 4 1 X 2 Xp 2 X 6 X 2 2 2 M1 Xq 5 1
39
N_scaffolds:358,99861,232 N_bases 2.08 Gb0.88 Gb N50 contigs11,88240,353 N50 scaffolds 321,7292.37Mb Bamboo Grass carp Miscanthus Wild rice
40
Acknowledgements: Elizabeth Murchuson Joe Henson German Tischler Fengtang Yang Mike Stratton Han Bin Feng Qi Zhao Qiang Ole Schulz-Trieglaff David Bentley
41
BGI - FINISHED SPECIES fish bird mammal SPECIES #SPECIES COMMON NAME SEQUENCING DEPTH DETAIL 18Cynoglossus semilaevisTongue sole female:145X male:141X contigN50=37K , scaffoldN50=734K contigN50=24.5K , scaffoldN50=577K 19Paralichthys olivaceusBastard halibut119X contigN50=20K , scaffoldN50=1.2M 55 Anas platyrhynchos domestica Peking duck80XcontigN50=26K,scaffoldN50=1.2M 74Ailuropoda melanoleucaGiant panda56XcontigN50=39.9K,scaffoldN50=1.3M 75Ursus maritimusPolar bear102XcontigN50=32.4K,scaffoldN50=15.9M 78Bos grunniensDomestic yak119XcontigN50=20.4K,scaffoldN50=1.5M 79Pantholops hodgsoniiChiru88XcontigN50=18K,scaffoldN50=2.76M 80Capra aegagrus hircusGoat93XcontigN50=18.7K,scaffoldN50=3.06M 81Ovis ariesSheep80XcontigN50=17.4K,scaffoldN50=5.67M 83Camelus dromedariusArabian camel78X contigN50=54K , scaffoldN50=4.12M 97Macaca fascicularis Crab-eating macaque 54XcontigN50=12.7K, scaffoldN50=652K
42
Preliminary assembled species mammal reptile fish bird SPECIES #SPECIES COMMON NAME SEQUENCING DEPTH DETAIL 11 Hypophthalmichthys molitrixSilver carp 152XcontigN50=19.9K,scaffoldN50=972.8K 17 Pseudosciaena crocea Large yellow croaker 61XcontigN50=922bp,scaffoldN50=15K 21 Epinephelus coioidesGrouper 34X contigN50=20K , scaffoldN50=700K 24 Monopterus albusFinless eel 55XcontigN50=1.3K,scaffoldN50=21K 39 Alligator sinensisChinese alligator 53XcontigN50=5.6K,scaffoldN50=24.7K 48 Trionyx (Pelodiscus) sinensis Chinese softshell turtle 30XcontigN50=1.1K,scaffoldN50=10K 56 Anser anser domesticusDomestic goose 47XcontigN50=6.6K,scaffoldN50=23.2K 58 Nipponia nipponCrested ibis 106XcontigN50=22K,scaffoldN50=5M 60 Falco peregrinusPeregrine falcon 130XcontigN50=28.6K,scaffoldN50=4.47M 61 Falco cherrugSaker falcon 41XcontigN50=9.2K,scaffoldN50=42.7K 66Pygoscelis adeliaeAdelie penguin 90X contigN50=19K,scaffoldN50=5M 67 Aptenodytes forsteriEmperor penguin 67XcontigN50=30K,scaffoldN50=5M 70 Panthera tigris altaica Amur tiger 39XcontigN50=4.1K,scaffoldN50=27.7K 71 Acinonyx jubatusCheetah 61XcontigN50=30K,scaffoldN50=3M 72 Panthera leoLion 70XcontigN50=11.6K,scaffoldN50=1.32M 82 Camelus bactrianus Bactrian camel 62XcontigN50=8.4K,scaffoldN50=61.5K
43
Sequencing of species mammal reptile fish bird SPECIES #SPECIESCOMMON NAMEDETAIL 4 Polypterus senegalusBichirsequencing 9 Aristichthys nobilisBighead carpsequencing 13 Hippocampus comesTiger tail seahorsesequencing 15 Scleropages formosusGolden arowanasequencing 25 Mola molaSunfishsequencing 50 Chelonia mydasGreen turtlesequencing 53Calypte annaAnna's hummingbirdsample arrived 68 Struthio camelusOstrichsequencing 84 Elaphurus davidianusPere David's deersequencing 94 Tachyglossus aculeatusShort-beaked echidnasequencing
47
Dipus Genome Project
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.