Presentation is loading. Please wait.

Presentation is loading. Please wait.

NGS sequencing and Genome Assemblies from Animals and Large Plants Zemin Ning The Wellcome Trust Sanger Institute.

Similar presentations


Presentation on theme: "NGS sequencing and Genome Assemblies from Animals and Large Plants Zemin Ning The Wellcome Trust Sanger Institute."— Presentation transcript:

1 NGS sequencing and Genome Assemblies from Animals and Large Plants Zemin Ning The Wellcome Trust Sanger Institute

2  NGS sequencing technologies  Oxford Nanopore  Assembly algorithms and Assemblers  Phusion2 pipeline  Tasmanian Devil genome project  Assemblies of Large plant genomes  Future work Outline of the Talk:

3 Next-Generation Sequencing

4 NGS Platforms & Performances

5 Oxford Nanopore End of Short Read Sequencing? Read length: upto 100Kb Human genome 50x in 15 Minutes $10 per GB

6 PacBio Capillary Illumina Can we really trust Single Molecule Sequencing?

7 Kmer Size and Assemblability

8 Assembly Method 1ACCTGATC 2CTGATCAA 3TGATCAAT 4AGCGATCA 5CGATCAAT 6GATCAATG 7TCAATGTG 8CAATGTGA 1. Overlap graph Sequencing reads: 2. de Bruijn graph 3. String graph

9 Various Assembly Pipelines

10 Phusion2 Assembly Pipeline Illumina Reads Assembly Contigs Consensus Generation 2x75 or 2x100bp Base Correction Data Process Reads Group

11 Phusion2 Assembly Pipeline Illumina Reads Assembly Mate Pair Reads BAC Ends Supercontig Contigs 2x75 or 2x100bp AGPcontig Flow-sorting Reads Map Markers

12 Spinner – a scaffolding tool Spinner uses mate pair data to scaffold contigs. Contigs, and pairs of contigs connected by pairs, define a bi-directional graph: Using expected insert size, a estimate of the gap size can be given for each contig.

13 Spinner – still to do These techniques alone produces useful results. Further stages will be used to resolve repeats pairs that “jump over” repeats, and graph flow concepts.

14 Tasmanian devil Tasmanian tiger Tasmanian Australian

15

16 Tasmanian devil Opossum Wallaby Tasmanian devil

17 Tasmanian devil facial tumour disease (DFTD) n Transmissible cancer characterised by the growth of large tumours on the face, neck and mouth of Tasmanian devils n Transmitted by biting n Commonly metastasises n First observed in 1996 n Primarily affects adults >1yr n Death in 4 – 6 months

18 Reedy Marsh 2007 Mangalore 2007 Mt William 2007 or 2008 Coles Bay Upper Natone 2007 Narawntapu 2007 Strain 1, tetraploid Strain 2 Strain 3 DFTD samples for sequencing DFTD originated here c.1996 Area still DFTD free Unknown strain “Evolved” Forestier 2007

19 Reedy Marsh 2007 Mangalore 2007 Mt William Coles Bay Upper Natone 2007 Narawntapu 2007 Devil Genomes Sequenced Forestier 2007 Tumour 2 (53T) Tumour 1 (87T) Salem - A female Tasmanian Devil lived Taronga Zoo in Sydney.

20

21 Sequencing T. Devil on Illumina: Strategy Tumour or normal genomic DNA Fragments of defined size 0.5, 2, 5, 7, 8, 10 kb Sequencing 2x100bp reads short insert 2x50bp mate pairs Sequencing performed at Illumina Salem (91H)Joey (31H)Cancer 1 (87T)Cancer 2 (53T) Read Coverage85x40x56x84x

22 12345 678X 1 4 2a 3a 6 2b 3b 5 Devil – Opossum Homology Map Based on Hybridisation Results of Devil Paints onto Opossum Chromosomes X Opossum Devil Opossum chromosome images were taken from Duke et a. 2007, Chromosome Res 15:361-370

23 Flow cytometry analysis of chromosomal mixture of devil and opossum Opossum Tasmanian devil 1 2 3 4 5 6 X 1 2 3 4 5+8 7 6 X OpossumDevil ChrSeqFC 1748611571 2541484610 3526483556 4430423450 5309321341 6245296277 7263264 8308321 X61116121 Total 343133192926 Genome size

24 Table 1 Run ID, Template names, Number of reads and Chromosome size 4972_1 chr1 IL20_4972:1 19.8 571 4967_1 chr2 IL21_4967:1 20.0 610 4971_1 chr3 IL30_4971:1 21.7 556 4964_1 chr4 IL14_4964:1 7.26 450 4969_1 chr5 IL17_4969:1 7.06 341 4969_2 chr6 IL17_4969:2 8.59 277 4969_3 chrx IL17_4969:3 9.43 122 Read mapping coefficient: e = Size_of_Chr/Num_reads_in_lane

25 Perfect - Reads from the same library were mapped to the contig

26 Acceptable - Majority of the reads were from the same library, but there were reads from other libraries

27 Bad – mis-assembly error Majority of the reads in one region were from one library. But there is a transition from which we see a new library, i.e. switch to another chromosome.

28 Unassigned contigs were placed by supercontigs using mate pairs

29 Chr_ID Chr_size Scaffolds_assigned Bases_assigned Mb Chr1 571 6729684 Chr2 610 8381 740 Chr3 556 7197 641 Chr4 450 4817 487 Chr5 341 3188 300 Chr6 277 2844 263 Chrx 122 2378 86.6 Unassigned 440 1.23 Scaffolds Assigned to Chromosomes using Flow-sorting Data

30 Solexa reads : Number of read pairs: 1130 Million; Finished genome size: 3.1 GB; Read length:2x100bp; Estimated read coverage: ~80X; Insert size: 410/50-600 bp; Mate pair data:2k,4k,5k,6k,8k,10k Number of reads clustered:1010 Million Assembly features: - stats Contigs Supercontigs Total number of contigs: 178,71126,954 Total bases of contigs: 2.95 Gb3.08 Gb N50 contig size: 28,9212,244,460 Largest contig:214,4566,014,864 Averaged contig size: 16,511114,451 Contig coverage on genome: ~94%>99% Ratio of placed PE reads:~92%? Genome Assembly Normal – T. Devil

31 Solexa reads :Tumour_53T Tumour_87T Number of read pairs: 760 Million669 M; Finished genome size: 3.1 GB3.1 GB; Read length:2x1002x100; Estimated read coverage: ~75X~56X; Insert size: 300bp 300bp; Number of reads clustered:710 Million603 M Assembly features: - stats Tumour_53T Tumour_87T Total number of contigs: 335,215335,531 Total bases of contigs: 3.05 Gb2.98 Gb N50 contig size: 21,58219,346 Largest contig:175,353139,414 Averaged contig size: 9,0968,892 Contig coverage on genome: ~95%~95% Ratio of placed PE reads:~92%~92% Devil Tumour Genome Assemblies

32 Salem (91H)Joey (31H)Cancer 1 (87T)Cancer 2 (53T) Coverage35.5828.8040.4933.14 Total SNPs615,084646,186758,023738,793 Het SNPs 524,040371,412465,630462,722 Hom SNPs91,044274,774292,393276,071 Total indels 235,632262,461320,820 312,287 Het indels 183,978146,299186,094183,747 Hom indels 51,65481,120 / 116,162 134,726128,540 Variant calling : catalogue of variants in all 4 genomes *Data source: Illumina. Variants removed within 500bp of a contig end, Q(indel) < 30 and Q(GT) < 5.

33 Homozygous SNPs

34

35 46039 Candidates 40689 Base changed Homozygous Base Corrections

36 51654 Candidates 45337 Del changed Homozygous Indel Corrections

37 DFTD1 1 I J M1 M3 der2 F1 K 3 G/H 4 F M4 A 5 FE der5 der1 M2? 6 F2 D der6 X 2 X? 6 5 2 5 5 2 1 X 2 X 6

38 DFTD2 B J M M3 2 K1/K2 3 D JH M2 5 der5 FG 6 der6 L K3 1 der1 I 4 1 X 2 Xp 2 X 6 X 2 2 2 M1 Xq 5 1

39 N_scaffolds:358,99861,232 N_bases 2.08 Gb0.88 Gb N50 contigs11,88240,353 N50 scaffolds 321,7292.37Mb Bamboo Grass carp Miscanthus Wild rice

40 Acknowledgements:  Elizabeth Murchuson  Joe Henson  German Tischler  Fengtang Yang  Mike Stratton  Han Bin  Feng Qi  Zhao Qiang  Ole Schulz-Trieglaff  David Bentley

41 BGI - FINISHED SPECIES fish bird mammal SPECIES #SPECIES COMMON NAME SEQUENCING DEPTH DETAIL 18Cynoglossus semilaevisTongue sole female:145X male:141X contigN50=37K , scaffoldN50=734K contigN50=24.5K , scaffoldN50=577K 19Paralichthys olivaceusBastard halibut119X contigN50=20K , scaffoldN50=1.2M 55 Anas platyrhynchos domestica Peking duck80XcontigN50=26K,scaffoldN50=1.2M 74Ailuropoda melanoleucaGiant panda56XcontigN50=39.9K,scaffoldN50=1.3M 75Ursus maritimusPolar bear102XcontigN50=32.4K,scaffoldN50=15.9M 78Bos grunniensDomestic yak119XcontigN50=20.4K,scaffoldN50=1.5M 79Pantholops hodgsoniiChiru88XcontigN50=18K,scaffoldN50=2.76M 80Capra aegagrus hircusGoat93XcontigN50=18.7K,scaffoldN50=3.06M 81Ovis ariesSheep80XcontigN50=17.4K,scaffoldN50=5.67M 83Camelus dromedariusArabian camel78X contigN50=54K , scaffoldN50=4.12M 97Macaca fascicularis Crab-eating macaque 54XcontigN50=12.7K, scaffoldN50=652K

42 Preliminary assembled species mammal reptile fish bird SPECIES #SPECIES COMMON NAME SEQUENCING DEPTH DETAIL 11 Hypophthalmichthys molitrixSilver carp 152XcontigN50=19.9K,scaffoldN50=972.8K 17 Pseudosciaena crocea Large yellow croaker 61XcontigN50=922bp,scaffoldN50=15K 21 Epinephelus coioidesGrouper 34X contigN50=20K , scaffoldN50=700K 24 Monopterus albusFinless eel 55XcontigN50=1.3K,scaffoldN50=21K 39 Alligator sinensisChinese alligator 53XcontigN50=5.6K,scaffoldN50=24.7K 48 Trionyx (Pelodiscus) sinensis Chinese softshell turtle 30XcontigN50=1.1K,scaffoldN50=10K 56 Anser anser domesticusDomestic goose 47XcontigN50=6.6K,scaffoldN50=23.2K 58 Nipponia nipponCrested ibis 106XcontigN50=22K,scaffoldN50=5M 60 Falco peregrinusPeregrine falcon 130XcontigN50=28.6K,scaffoldN50=4.47M 61 Falco cherrugSaker falcon 41XcontigN50=9.2K,scaffoldN50=42.7K 66Pygoscelis adeliaeAdelie penguin 90X contigN50=19K,scaffoldN50=5M 67 Aptenodytes forsteriEmperor penguin 67XcontigN50=30K,scaffoldN50=5M 70 Panthera tigris altaica Amur tiger 39XcontigN50=4.1K,scaffoldN50=27.7K 71 Acinonyx jubatusCheetah 61XcontigN50=30K,scaffoldN50=3M 72 Panthera leoLion 70XcontigN50=11.6K,scaffoldN50=1.32M 82 Camelus bactrianus Bactrian camel 62XcontigN50=8.4K,scaffoldN50=61.5K

43 Sequencing of species mammal reptile fish bird SPECIES #SPECIESCOMMON NAMEDETAIL 4 Polypterus senegalusBichirsequencing 9 Aristichthys nobilisBighead carpsequencing 13 Hippocampus comesTiger tail seahorsesequencing 15 Scleropages formosusGolden arowanasequencing 25 Mola molaSunfishsequencing 50 Chelonia mydasGreen turtlesequencing 53Calypte annaAnna's hummingbirdsample arrived 68 Struthio camelusOstrichsequencing 84 Elaphurus davidianusPere David's deersequencing 94 Tachyglossus aculeatusShort-beaked echidnasequencing

44

45

46

47 Dipus Genome Project


Download ppt "NGS sequencing and Genome Assemblies from Animals and Large Plants Zemin Ning The Wellcome Trust Sanger Institute."

Similar presentations


Ads by Google