M. roreri de novo genome assembly using abyss/1.9.0-maxk96 Abyss 1.9.0: introduces a new tool called Sealer for closing scaffold gaps. Also, it has Konnector, a fast and memory-efficient tool to fill the gap between paired-end reads. GROUP 5 Hyeim Jung Pedro Pablo Parra Diana Vanessa Sarria Zuniga Jacob Shoemake
Construction of contigs Solving Ambiguities and merging contigs without using the paired-end information Solving Ambiguities and merging contigs Using paired-end information 1 2 HOW ABySS WORKS… Assembly algorithm: two major steps Required Select a ABySS compiled version depending on a maximum k-mer size K-mer size: Kmergenie Input library files Paired-end Unpaired(Single-end) Mate pair The assembly is performed in two major steps. First, without using the paired-end information, contigs are extended until either they cannot be unambiguously extended or come to a blunt end due to a lack of coverage. In the second step the paired-end information is used to resolve ambiguities and merge contigs. The paired-end information is used to identify contigs that can be linked together. Two contigs are considered to be linked if at least p pairs (by default p = 5) join the contigs Contain Konnector: to fill the gap between paired-end reads Sealer: for closing scaffold gaps
OUR ASSEMBLY STRATEGIES… Two assembly types abyss-pe k=87 name=assembly5 lib='pe1' mp='mp1' pe1=‘paired PE.1.fq paired PE2.fq’ se=’unpaired PE-MP’ mp1=‘paired MP.1.fq paired MP.2.fq’ Assembly 3 abyss-pe k=81 name=assembly3 lib='pe1 pe2' mp='mp1' pe1=‘paired PE.1.fq paired PE2.fq’ pe2=‘paired MP.1.fq paired MP.2.fq’ se=’unpaired PE-MP’ mp1=‘paired MP.1.fq paired MP.2.fq’ Paired PE and Unpaired PE-MP 87 Paired PE-MP and Unpaired PE-MP 81 Note: mp1 is used for scaffolding. Do not contribute to the consensus sequence.
Assembly 3 Assembly 5 Contigs Contigs Scaffolds Scaffolds Paired MP Paired PE Paired PE Paired MP Unpaired PE&MP Unpaired PE&MP Scaffolds Scaffolds Paired MP Paired MP
Evaluation of best assemblies Quast Report without reference genome Bowtie2 Assembly File # contigs Largest Total Length N50 # N's Predicted genes Mapped PE reads assembly_5 contigs.fa (total, --min-contig 500bp) 4328 (>= 0 bp) 9711 (>= 1000 bp) 3544 (>= 5000 bp) 1887 (>= 10000 bp) 1181 (>= 25000 bp) 604 (>= 50000 bp) 268 553,471 (total, --min-contig 500bp) 57.68Mb (>= 0 bp) 58.59Mb (>= 1000 bp) 57.12Mb (>= 5000 bp) 52.99Mb (>= 10000 bp) 47.96Mb (>= 25000 bp) 38.75Mb (>= 50000 bp) 27.02Mb 45,432 46,124 (unique) 17734 (>= 0 bp) 104288 (>= 300 bp) 21553 (>= 1500 bp) 1189 (>= 3000 bp) 6 60.40% aligned concordantly exactly 1 time 22.51% aligned concordantly >1 times Total 82.91% scaffolds.fa (total, --min-contig 500bp) 3061 3987 (>= 0 bp) 8242 9654 (>= 1000 bp) 2404 3162 (>= 5000 bp) 1182 1724 (>= 10000 bp) 809 1142 (>= 25000 bp) 503 600 (>= 50000 bp) 301 278 1,036,496 587,564 (total, --min-contig 500bp) 57.84 57.15 Mb (>= 0 bp) 58.70 58.13 Mb (>= 1000 bp) 57.37Mb 56.56 Mb (>= 5000 bp) 54.52 53.09 Mb (>= 10000 bp) 51.82 48.90 Mb (>= 25000 bp) 46.94 Mb 40.24 Mb (>= 50000 bp) 39.60 28.82 Mb 99,290 51,001 568,877 945 (unique) 17465 17507 (>= 0 bp) 103878 103379 (>= 300 bp) 21545 21414 (>= 1500 bp) 1198 1192 (>= 3000 bp) 66 66 60.41% aligned concordantly exactly 1 time 22.54% aligned concordantly >1 times Total 82.95 % assembly_3 (total, --min-contig 500bp) 4816 (>= 0 bp) 40245 (>= 1000 bp) 3514 (>= 5000 bp) 1642 (>= 10000 bp) 1078 (>= 25000 bp) 567 (>= 50000 bp) 256 1,035,772 (total, --min-contig 500bp) 56.36Mb (>= 0 bp) 61.10Mb (>= 1000 bp) 55.45Mb (>= 5000 bp) 50.87Mb (>= 10000 bp) 46.79Mb (>= 25000 bp) 38.70Mb (>= 50000 bp) 27.77Mb 48,947 247,454 (unique) 17570 (>= 0 bp) 103123 (>= 300 bp) 21274 (>= 1500 bp) 1171 (>= 3000 bp) 63 58.95% aligned concordantly exactly 1 time 22.55% aligned concordantly >1 times Total 81.5% (total, --min-contig 500bp) 3632 4820 (>= 0 bp) 38169 40049 (>= 1000 bp) 2629 3402 (>= 5000 bp) 1158 1573 (>= 10000 bp) 773 1037 (>= 25000 bp) 467 552 (>= 50000 bp) 276 254 1,771,018 701,868 (total, --min-contig 500bp) 57.87 56.05 Mb (>= 0 bp) 62.38 60.78 Mb (>= 1000 bp) 57.17 55.06 Mb (>= 5000 bp) 53.63 50.73 (>= 10000 bp) 50.85 46.82 Mb (>= 25000 bp) 46.16 39.16 Mb (>= 50000 bp) 39.24 28.51 Mb 102,079 51,480 1,600,849 806 (unique) 17398 17578 (>= 0 bp) 103404 103318 (>= 300 bp) 21315 21317 (>= 1500 bp) 1182 1172 (>= 3000 bp) 63 63 58.97% aligned concordantly exactly 1 time 22.62% aligned concordantly >1 times Total 81.59% Evaluation of best assemblies PE: 126-662, peak 301 MP: 832-6140, peak 1700 Quast options: quast/3.2 --gene-finding --eukaryote Bowtie2 options: bowtie2/2.2.9 --very-sensitive-local --no-unal --phred33 -p 20
conclusions Total Length of Assembly # Scaffolds Largest scaffold N50 Abyss assembly Broken Comment Total Length of Assembly (~) Assembly 5 Assemblies: Same Broken: Assembly 3 has 1.1 Mb less. # Scaffolds Assembly 3 has many Scaffolds <500 bp compared with Assembly 5. Largest scaffold Assembly 3 N50 Assembly 3 (~) Abyss: Assemb. 3 has 2,789 bp more. Broken: Assemb. 3 has 479 bp more. # N's Abyss: Assemb. 3 has 1 Mb more N's. Broken: Assemb. 5 has 139 more N's. # Unique predicted genes Assembly 5 (~) Abyss: Assemb. 5 has 67 genes more Broken: Assemb. 3 has 71 genes more Mapped paired end reads Assemb. 5 has 1.36% more (82.95% vs 81.59%).
25298314 reads; of these: 25298314 (100.00%) were paired; of these: 4322365 (17.09%) aligned concordantly 0 times 15280202 (60.40%) aligned concordantly exactly 1 time 5695747 (22.51%) aligned concordantly >1 times ---- 4322365 pairs aligned concordantly 0 times; of these: 2648376 (61.27%) aligned discordantly 1 time 1673989 pairs aligned 0 times concordantly or discordantly; of these: 3347978 mates make up the pairs; of these: 37310 (1.11%) aligned 0 times 725071 (21.66%) aligned exactly 1 time 2585597 (77.23%) aligned >1 times 99.93% overall alignment rate