Download presentation
Presentation is loading. Please wait.
1
Professors: Dr. Gribskov and Dr. Weil
AGRY-600 Genomics Genome Assembly Professors: Dr. Gribskov and Dr. Weil Group 3: Brett Lane Amanpreet Kaur Stefanie Griebel Yulu Chen Rupesh Gaire Akanksha Singh
2
Cleaned Data as Input Files
Genome Assembly Pipeline Data Cleaning REAPR QUAST Gene finding GapFiller SSPACE SPAdes Kmergenie Kmer size Cleaned Data as Input Files Assembly Group 3
3
Results Cleaning Steps - MP reads
Group 3
4
Kmergenie Kmergenie was used to determine the size of k-mer for assembly. All the reads – paired end and mate paired were used to predict k-mer size Best kmer size predicted: 87 Predicted assembly size: 56.6 Mb Group 3
5
SPAdes Genome Assembler - Why?
SPAdes is suitable for: Illumina reads Bacterial and fungal data Small genomes not large genomes Paired-end reads, Mate-pair reads and unpaired reads Group 3
6
Assembly Statistics Group 3
N50 Longest Contigs/ Scaffolds Total Length Program Data Comments 70,639 (Contigs) 74,736 (Scaffolds) 608,380 3293(Contigs) 3187(Scaffolds) 55.9 Mb SPAdes used multiple k-mer values: 51,61,71,81,83,87 PE, MP, both unpaired #N's=2873 5.14 Ns/100 kbp SPAdes allows De Bruijn graph assembly at multiple k-mer sizes, not just a single fixed one. Merges different k-mer assemblies Group 3
7
Scaffolding using SSPACE v3.0
The scaffolding was done using SSPACE v 3.0 using the Mate Pair Reads Programme Scaffolds N50 Longest contig Total length #N’s per 100 kb SPAdes 8073 (>=500bp=3187) 74.7 Kb 608 Kb 55.9 Mb 5.14 SSPACE v 3.0 5183 (>=500bp=967) 193 Kb 1.56 Mb 61.6 Mb Group 3
8
Bridging the gaps using GapFiller
No. of N’s was very high after scaffolding : #N’s per 100 Kb: GapFiller was used for filling the gaps using Mate Pair reads It reduced the number of N’s: /100 Kb Scaffolds N50 Longest contig Total length #N’s per 100 kb Before gap filling 5183 (>=500bp=967) 193Kb 1.56 Mb 61.6 Mb After gap filling GapFiller highly reduced the gaps Group 3
9
Mapping the PE reads to the Assembly
We used bowtie 2 to map the Paired end reads to our final assembly 59.58 % aligned concordantly exactly 1 time 24.01% aligned concordantly >1 times Total: % Overall alignment rate : % Group 3
10
REAPR Error free bases: 85.75% Total Number of errors: 2652
FCD errors within a contig: 615 FCD errors over a gap: 46 Low fragment coverage within a contig: 146 Low fragment coverage over a gap: 1845 85 % of the bases are error free which is good Group 3
11
Gene Prediction Genes were predicted using Quast
No. of predicted genes Unique 17455 (>= 0 bp) 107219 (>= 300 bp) 22348 (>= 1500 bp) 1241 (>= 3000 bp) 77 Group 3
12
Conclusion Good assembly N50: 193Kb Longest contig : 1.56 Mb
Less no. of gaps 85.75 % bases are error free Only concern is the total length of the genome (61 Mb) Group 3
13
Thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.