Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick
454 raw reads PRE-PROCESSING Illumina raw reads Pre-processing 454 reads Illumina reads Statistical analysis Read stats Published Genomes from public databases V. vulnificus YJ016 V. vulnificus CMCP6 V. vulnificus MO6-24/O Align Illumina against the reference Fastqc Prinseq NGS QC Compare mapping statistics Reference genome samstats bwa REFERENCE SELECTION Hybrid DeNovo Ray MIRA Illumina/ 454/ Hybrid DeNovo assembly 454 DeNovo Newbler CABOG SUTTA Illumina DeNovo Allpaths LG SOAP DeNovo Velvet Taipan SUTTA contigs * 3 Align illumina reads against 454 contigs Unmapped reads Mac vector CLC wb contigs Unmapped reads Evaluation GAGE Hawk-eye Illumina/(454?) reference based assembly AMOScmp contigs Unmapped reads DENOVO ASSEMBLY REFERENCE BASED ASSEMBLY Draft/ Finished genome Reference evaluation DNA Diff MUMmer Parameter optimization CONTIG MERGING All possible combinations of the best 3 Mimimus MAIA PAGIT Mauve Finished genome Scaffolds GAGE GENOME FINISHING Gap filling Nulceotide identity MUMmer GRASS Built-in Process 454 Illumina Info. Chosen Ref. Assemblers Illumina 454 LEGEND hybrid Original Pipeline
Read Visualization – spot the differences Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results Comparison of 454 Reads for (low coverage) and (improved coverage)
Read Visualization - more is better! Nav reads compared to Nav Illumina reads. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Read Visualization – cousins or siblings? Nav_ and Vul_ (454 and Illumina reads) coverage comparison. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Data Quality Effect of pre-processing data (using prinseq)
V. navarensis (454; non-preprocessed|pre-processed) Metric Per Base Seq. Quality Per Seq. Quality Sc Per Base Seq. Content Per Base GC Content Per Seq. GC Content Per Base N Content Seq. Length Dist. Seq. Dup. Levels Overreprese nted Seqs. Kmer Content Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
V. Vulnificus (454; non-preprocessed|preprocessed) Metric 2009 V_ Per Base Seq. Quality Per Seq. Quality Score Per Base Seq. Content Per Base GC Content Per Seq. GC Content Per Base N Content Seq. Length Dist. Seq. Dup. Levels Overrepresente d Seqs. Kmer Content Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
V. navarensis (Illumina; non-preprocessed|preprocessed) Metric Per Base Seq. Quality Per Seq. Quality Score Per Base Seq. Content Per Base GC Content Per Seq. GC Content Per Base N Content Seq. Length Dist. Seq. Dup. Levels Overrepresented Seqs. Kmer Content Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
V. vulnificus (Illumina; non-preprocessed|preprocessed) Metric2009V_ Per Base Seq. Quality Per Seq. Quality Score Per Base Seq. Content Per Base GC Content Per Seq. GC Content Per Base N Content Seq. Length Dist. Seq. Dup. Levels Overrepresented Seqs. Kmer Content Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Assembly Reference-guided and de-Novo
Reference guided assembly Comparison of reference guided assembly vs de-novo assembly
ARE – Assembly Score Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Reference-guided vs de-Novo assembly ARE Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Summary of Reference-guided assembly Using V. vulnificus (CMCP6) reference strain 84% coverage De-Novo assemblers overall provided higher assembly score than reference based assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De-Novo Assembler Comparison (Optimal Parameters) ARE Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Final Results – V. vulnificus Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable. Newbler (dn) has been removed to show variance in other tools. Span Ratio CABOG Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Final Results – V. vulnificus Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results 1000/(Break Points)
Summary of de-Novo results OLC assemblers showed considerable differences in ARE than de-Brujin based assemblers Cabog/Newbler vs Soap de-Novo/Velvet Hybrid assembler, Ray, did not perform as well in terms of assembly score Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Merging-Vul_ AMOScmpCABOGNewbler (dn;454) Newbler (ref;454) Newbler ref ill Ray (454)Ray(Ill)Ray (hybrid) SOAPdnVelvet AMOScmp CABOG Newbler (dn;454) ND311.98ND Newbler (ref;454) Newbler (ref;Illumina) ND ND Ray (454) Ray (Illumina) ND 64.99ND33.81ND Ray (hybrid) ND49.94ND SOAPdn ND22.92ND Velvet ND37.68ND Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Merging-Nav_ AMOScmpCabogNewblerdnNewbler (ref;454) Newbler (ref;Illumina ) Ray (454)Ray (Illumina) Ray (hybrid) SOAPdnVelvet AMOScmp ND Cabog ND Newblerdn ND Newbler (ref;454) Newbler (ref;Illumina) ND Ray (454) Ray (Illumina) Ray (hybrid) ND SOAP denov o Velvet ND8.31 Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Assembler Review AssemblerStatus454IlluminaHybridAlgorithm Allpaths LGPaired-end onlyDBG AMOScmpBB CABOGOLC MIRAZEBRA NewblerOLC RayDBG SOAPdenovoDBG SUTTAUnresolved errorsBB VelvetDBG BB = branch-and-bound; OLC = overlap consensus; DBG = de Bruijn Graph; ZEBRA Mira worked as good as our merged contigs but it is impractical – 40hr run time Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
454 raw reads PRE-PROCESSING Illumina raw reads Pre-processing 454 reads Illumina reads Statistical analysis Read stats Fastqc Prinseq Hybrid DeNovo Ray Mira Illumina/ 454/ Hybrid DeNovo assembly 454 DeNovo Newbler CABOG Illumina DeNovo Velvet contigs Align illumina reads against 454 contigs contigs DENOVO ASSEMBLY CONTIG MERGING Merge Ray –hyb/ Newbler Merge CABOG/Velvet MIRA-hyb Mimimus Draft genome Process 454 Illumina Info. Assemblers Illumina 454 LEGEND hybrid Final Pipeline Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Splinter Pipeline 1 Pipeline 2 NUMAVGN50 Assembly Size Assembly Score Nav_ Nav_ Nav_ Nav_ Vul_2009 v Vul_ Vul_ Vul_ Vul_ NUMAVGN50 Assembly Size Assembly Score Nav_ Nav_ Nav_ Nav_ Vul_2009 v Vul_ Vul_ Vul_ Vul_ Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Visualization Merged Newbler Ray Hybrid Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Demo