Presentation is loading. Please wait.

Presentation is loading. Please wait.

Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington,

Similar presentations


Presentation on theme: "Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington,"— Presentation transcript:

1 Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick

2 454 raw reads PRE-PROCESSING Illumina raw reads Pre-processing 454 reads Illumina reads Statistical analysis Read stats Published Genomes from public databases V. vulnificus YJ016 V. vulnificus CMCP6 V. vulnificus MO6-24/O Align Illumina against the reference Fastqc Prinseq NGS QC Compare mapping statistics Reference genome samstats bwa REFERENCE SELECTION Hybrid DeNovo Ray MIRA Illumina/ 454/ Hybrid DeNovo assembly 454 DeNovo Newbler CABOG SUTTA Illumina DeNovo Allpaths LG SOAP DeNovo Velvet Taipan SUTTA contigs * 3 Align illumina reads against 454 contigs Unmapped reads Mac vector CLC wb contigs Unmapped reads Evaluation GAGE Hawk-eye Illumina/(454?) reference based assembly AMOScmp contigs Unmapped reads DENOVO ASSEMBLY REFERENCE BASED ASSEMBLY Draft/ Finished genome Reference evaluation DNA Diff MUMmer Parameter optimization CONTIG MERGING All possible combinations of the best 3 Mimimus MAIA PAGIT Mauve Finished genome Scaffolds GAGE GENOME FINISHING Gap filling Nulceotide identity MUMmer GRASS Built-in Process 454 Illumina Info. Chosen Ref. Assemblers Illumina 454 LEGEND hybrid Original Pipeline

3 Read Visualization – spot the differences Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results Comparison of 454 Reads for 08-2462 (low coverage) and 2541-90 (improved coverage)

4 Read Visualization - more is better! Nav 08-2462 454 reads compared to Nav 08-2462 Illumina reads. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

5 Read Visualization – cousins or siblings? Nav_2541-90 and Vul_06-2432 (454 and Illumina reads) coverage comparison. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

6 Data Quality Effect of pre-processing data (using prinseq)

7 V. navarensis (454; non-preprocessed|pre-processed) Metric2423-0108-24622541-902756-81 Per Base Seq. Quality Per Seq. Quality Sc Per Base Seq. Content Per Base GC Content Per Seq. GC Content Per Base N Content Seq. Length Dist. Seq. Dup. Levels Overreprese nted Seqs. Kmer Content Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

8 V. Vulnificus (454; non-preprocessed|preprocessed) Metric 2009 V_13 68 06-243208-243508-243907-2444 Per Base Seq. Quality Per Seq. Quality Score Per Base Seq. Content Per Base GC Content Per Seq. GC Content Per Base N Content Seq. Length Dist. Seq. Dup. Levels Overrepresente d Seqs. Kmer Content Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

9 V. navarensis (Illumina; non-preprocessed|preprocessed) Metric2423-0108-24622541-902756-81 Per Base Seq. Quality Per Seq. Quality Score Per Base Seq. Content Per Base GC Content Per Seq. GC Content Per Base N Content Seq. Length Dist. Seq. Dup. Levels Overrepresented Seqs. Kmer Content Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

10 V. vulnificus (Illumina; non-preprocessed|preprocessed) Metric2009V_136806-243208-243508-243907-2444 Per Base Seq. Quality Per Seq. Quality Score Per Base Seq. Content Per Base GC Content Per Seq. GC Content Per Base N Content Seq. Length Dist. Seq. Dup. Levels Overrepresented Seqs. Kmer Content Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

11 Assembly Reference-guided and de-Novo

12 Reference guided assembly Comparison of reference guided assembly vs de-novo assembly

13 ARE – Assembly Score Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

14 Reference-guided vs de-Novo assembly ARE Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

15 Summary of Reference-guided assembly  Using V. vulnificus (CMCP6) reference strain  84% coverage  De-Novo assemblers overall provided higher assembly score than reference based assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

16 De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

17 De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

18 De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

19 De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

20 De-Novo Assembler Comparison (Optimal Parameters) ARE Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

21 Final Results – V. vulnificus Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable. Newbler (dn) has been removed to show variance in other tools. Span Ratio CABOG Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

22 Final Results – V. vulnificus Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results 1000/(Break Points)

23 Summary of de-Novo results  OLC assemblers showed considerable differences in ARE than de-Brujin based assemblers  Cabog/Newbler vs Soap de-Novo/Velvet  Hybrid assembler, Ray, did not perform as well in terms of assembly score Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

24 Merging-Vul_06-2432 AMOScmpCABOGNewbler (dn;454) Newbler (ref;454) Newbler ref ill Ray (454)Ray(Ill)Ray (hybrid) SOAPdnVelvet AMOScmp 164.00 234.69 6.354.6963.51 55.13 64.5144.3867.22 CABOG 164.00225.12101.3062.6673.2393.8898.1175.98113.08 Newbler (dn;454) 234.69 221.89 5.48ND311.98ND419.76 104.46127.01 Newbler (ref;454) 6.35 99.30 5.48 1.4467.72 64.99 72.79 35.0772.34 Newbler (ref;Illumina) 4.6962.66 ND 1.4435.28ND Ray (454) 63.50 72.56 311.99 67.7235.2833.8149.9422.9237.68 Ray (Illumina) 55.1393.88 ND 64.99ND33.81ND Ray (hybrid) 64.51 97.17 419.76 72.79ND49.94ND SOAPdn 44.3875.98104.4635.07ND22.92ND Velvet 67.22113.08127.0172.34ND37.68ND Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

25 Merging-Nav_2541-90 AMOScmpCabogNewblerdnNewbler (ref;454) Newbler (ref;Illumina ) Ray (454)Ray (Illumina) Ray (hybrid) SOAPdnVelvet AMOScmp 133.95ND0.03 15.2614.0015.7711.2345.32 Cabog 133.95 ND 107.60 114.6082.6292.4492.5380.73123.02 Newblerdn ND 54.2159.8160.4733.1794.89 Newbler (ref;454) 0.03 107.6059.94 0.1111.611.7811.8610.1739.2 Newbler (ref;Illumina) 0.03114.60ND0.2812.6612.1512.419.639.60 Ray (454) 15.2682.6254.2111.6012.6659.1976.36 13.6563.75 Ray (Illumina) 14.0192.4459.8111.7812.1533.7924.21 11.54 39.84 Ray (hybrid) 15.7792.5360.4711.8612.4140.3336.79 14.06 ND SOAP denov o 11.2280.7333.1710.049.54 13.6111.4013.91 8.47 Velvet 45.32123.0294.8939.2039.84 64.54 39.84ND8.31 Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

26 Assembler Review AssemblerStatus454IlluminaHybridAlgorithm Allpaths LGPaired-end onlyDBG AMOScmpBB CABOGOLC MIRAZEBRA NewblerOLC RayDBG SOAPdenovoDBG SUTTAUnresolved errorsBB VelvetDBG BB = branch-and-bound; OLC = overlap consensus; DBG = de Bruijn Graph; ZEBRA Mira worked as good as our merged contigs but it is impractical – 40hr run time Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

27 454 raw reads PRE-PROCESSING Illumina raw reads Pre-processing 454 reads Illumina reads Statistical analysis Read stats Fastqc Prinseq Hybrid DeNovo Ray Mira Illumina/ 454/ Hybrid DeNovo assembly 454 DeNovo Newbler CABOG Illumina DeNovo Velvet contigs Align illumina reads against 454 contigs contigs DENOVO ASSEMBLY CONTIG MERGING Merge Ray –hyb/ Newbler Merge CABOG/Velvet MIRA-hyb Mimimus Draft genome Process 454 Illumina Info. Assemblers Illumina 454 LEGEND hybrid Final Pipeline Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

28 Splinter Pipeline 1 Pipeline 2 NUMAVGN50 Assembly Size Assembly Score Nav_2423 -0110642657.21560644.52136.53 Nav_08- 246214925736.8512303.8319.48 Nav_2541 -9016626172.51303864.3462.57 Nav_2756 -8110742939.41315914.59122.31 Vul_2009 v-13688357787.24019734.80345.03 Vul_06- 24325785122.73225254.85419.76 Vul_08- 243511142872.92303734.76144.01 Vul_08- 24399850885.72507894.99210.94 Vul_07- 24447073255.14927065.13656.10 NUMAVGN50 Assembly Size Assembly Score Nav_2423 -0112535357.01643054.42111.36 Nav_08- 2462451311.922530.140.09 Nav_2541 -9010640547.51697814.30123.02 Nav_2756 -8111141840.81321194.64124.55 Vul_2009 v-13689749705.82284084.82170.81 Vul_06- 243216728489.7783534.7632.53 Vul_08- 243519324903.72041784.8575.19 Vul_08- 243911444047.91808895.02134.64 Vul_07- 244414335905.11309425.1385.93 Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

29 Visualization Merged Newbler Ray Hybrid Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

30 Demo


Download ppt "Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington,"

Similar presentations


Ads by Google