Presentation is loading. Please wait.

Presentation is loading. Please wait.

Denovo genome assembly of Moniliophthora roreri

Similar presentations


Presentation on theme: "Denovo genome assembly of Moniliophthora roreri"— Presentation transcript:

1 Denovo genome assembly of Moniliophthora roreri
Group 4. Chen, Demeke, Habte, Namrata, Rajdeep, Xu

2 Introduction M.roreri is a fungal pathogen that causes frosty pod rot in cacao (Theobroma cacao) mainly in central and south America Genomic information is important to enhance our understanding of the pathogen biology Genomic assembly is more important and challenging when there is no reference sequence

3 Assembly Pipeline Gap filling (Gap filler) Gene Prediction (Quast)
Quality Control (FastQC) Scaffolding (SSPACE) Adapters Remove (Trimmomatic) Contaminant cleaning (Bowtie) Contig assembly (Minia)

4 Introduction to minia An ultra-low memory DNA sequence assembly
Human genome can be assembled using 4 GB of memory Produces results of similar contiguity and accuracy to other de Bruijn assemblers like velvet Takes set of short genomic sequences (typically - Illumina DNA sequencer) Version used: Minia maxk128

5 Recommended k-mer based contig assembly
K-mer estimation (Kmergenie) Minia assembly Library Recommended K Minimum coverage Predicted assembly size N50 Longest # Contigs Total length PE reads 77 4 55,981,704 5879 54,716 14,236 47,131,570 MP reads 93 7 56,496,136 1384 21,495 35,033 43,387,625 Unpaired reads 71 56,004,620 7431 54,409 11,989 45,327,642 PE and MP reads 87 16 57,157,776 6488 45,213 13,355 48,323,321 All 11 56,949,824 4017 35,607 18,455 48,488,928

6 Optimizing the k-mer selection for final assembly
All library - all (PE, MP and unpaired) Minimum coverage set to 4 K-mer size Minia assembly N50 (bp) Longest (bp) # Contigs Total length (bp) 51 16,767 187,673 10,050 47,937,871 61 18,316 255,017 9,593 50,039,743 71 19,720 189,801 9,008 51,488,056 81 20,068 155,025 8,476 52,624,232

7 Effect of k-mer, data type on the assembly
Data used k-mer Abundance threshold N50 (kb) Longest Contig (kb) # Contigs Assembly length (Mb) All 81 9 19.7 155.0 8595 52.4 12 18.8 147.9 8571 51.4 19 17.3 114.2 8458 48.2

8 Scaffolding: SSPACE used
Standalone scaffolding program Extend and scaffold pre-assembled contigs Uses Bowtie to map paired libraries to a pre-assembled contigs Use positions and orientations for scaffolding Pairs are found within the allowed distance Together with their orientations - used for contig pairing & ordering

9 Effect of the library and insert size on scaffolding
Library (insert size) # scaffolds N50 (kb) Longest Scaffold (Mb) Total Length (Actual sequence) Ns/100 kb (Total Ns) MP (3500) 899 217.2 1.32 65.1 (52.4) 19.5 kb (12.7 Mb) PE (400) 3417 50.6 0.58 52.24 (52.23) 10.25 (5354 b) MP (2500) 763 233.3 1.91 58.4 (52.4) 10.3 kb (6.02 Mb)

10 Introduction to GapFiller v 1.11 (Boetzer et al 2012 Genome Biology)
Close gaps within previously created scaffolds Gaps within scaffolds are defined as unknown nucleotides (N's) the unknown nucleotides are filled with true nucleotides in order to (try) close the gap

11 Gap filling pipeline 1st cycle of gap filling 3 iterations
# scaffolds = 763 Total Ns: 737 kb (1280/100 kb) Total length (with/without Ns): / Mb N50: Kb Longest scaffold: 1.90 Mb Gaps closed : = 3481 1st cycle of gap filling 3 iterations PE and MP libraries # scaffolds = 459 Total Ns: 892 kb (1546/100 kb) Total length: / Mb N50: Kb Longest scaffold: 3.76 Mb 2nd cycle of scaffolding with MP libraries # scaffolds = 459 Total Ns: kb (1138/100 kb) Total length: 57.9 / 57.2 Mb N50: Kb Longest scaffold: Mb Gaps closed : = 215 2nd cycle of gap filling 8 iterations PE and MP libraries

12 Gene Prediction using quast

13 Summary Fairly good genome assembly pipeline
Longest N50 and scaffold, 3.7MB Lowest # scaffolds, < 500 Fairly low # Ns

14 Assembly: All data set; K=93; M=11; SSPACE
Statistics without ref. Minia 1st Scaffold 2nd Scaffold # contigs 18455 831 472 # contigs (>= 1000 bp) 13394 734 410 # contigs (>= bp) 290 225 Largest contig 35607 Total length T. length (>= bp) T. length (>= bp) N50 4017 182599 320808 L50 3663 76 47 GC (%) 46.71 46.81 46.8 # N's

15 Thanks


Download ppt "Denovo genome assembly of Moniliophthora roreri"

Similar presentations


Ads by Google