Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington,

Slides:



Advertisements
Similar presentations
Click to edit Master title style Irys data analysis January 10 th, 2014.
Advertisements

Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick Genome Assembly.
Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy Microbial Genome Assembly 1.
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Henrik Lantz - BILS/SciLife/Uppsala University
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM.
Workshop Schedule Schedule has links to introductory presentations and the FungiDB workshops Tuesday 3rdWednesday.
Genome sequencing and assembly Mayo/UIUC Summer Course in Computational Biology Genome sequencing and assembly.
Data Formats & QC Analysis for NGS Rosana O. Babu 8/19/20151.
JAMES LINDSAY*, HAMED SALOOTI, ALEX ZELIKOVSKI, ION MANDOIU* ACM-BCB 2012 Scaffolding Large Genomes Using Integer Linear Programming University of Connecticut*Georgia.
De-novo Assembly Day 4.
CS 394C March 19, 2012 Tandy Warnow.
Bacterial Genome Assembly C. Victor Jongeneel Bacterial Genome Assembly | C. Victor Jongeneel | PowerPoint by Casey Hanson.
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Genome Assembly Preliminary Results
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick Genome Assembly.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Eran Yanowski, Eran Hornstein’s: Monitor drug impact on the transcriptome of mouse beta cells (primary and cell-line) using Transeq/RNA-Seq Report.
Metagenomics Assembly Hubert DENISE
The iPlant Collaborative
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Theobroma cacao Integrated Physical and Genetic Map 2 BAC Libraries 250 Genetic Markers.
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
Gena Tang Pushkar Pande Tianjun Ye Xing Liu Racchit Thapliyal Robert Arthur Kevin Lee.
Bombus terrestris, the buff-tailed bumble bee Native to Europe A managed pollinator Commercially available Reared in greenhouses Important pollinator in.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
1.Data production 2.General outline of assembly strategy.
University of Connecticut School of Engineering Assembler Reference Abyss Simpson et al., J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones,
Robert Arthur Kevin Lee Xing Liu Pushkar Pande Gena Tang Racchit Thapliyal Tianjun Ye.
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
Denovo Sequencing Practical. Overview Very small dataset from Staphylococcus aureus – 4 million x 75 base-pair, paired end reads Cover basic aspects of.
QC and pre-assembly analyses
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
Assembly S.O.P. Overlap Layout Consensus. Reference Assembly 1.Align reads to a reference sequence 2.??? 3.PROFIT!!!!!
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
De-novo Bacterial draft genome de-novo asembly, from the sequencing machine (Illumina) to a genome database (NCBI) An example case: Assembly of Stenotrophomonas.
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
Bacterial Genome Assembly Tutorial: C. Victor Jongeneel Bacterial Genome Assembly v9 | C. Victor Jongeneel1 Powerpoint: Casey Hanson.
1 Aplicação de metodologias genómicas na detecção de polimorfismos no sobreiro Ciência 2010 Octávio S. Paulo Computational Biology and Population Genomics.
De Novo Assembly of Mitochondrial Genomes from Low Coverage Whole-Genome Sequencing Reads Fahad Alqahtani and Ion Mandoiu University of Connecticut Computer.
Sequencing and Assembly of the WheatD Genome using BAC Pools A Preliminary Study Daniela Puiu Sept 23rd 2013.
Next-generation sequencing data analysis using open source software
Assembly algorithms for next-generation sequencing data
Sequence Assembly.
MGmapper A tool to map MetaGenomics data
Quality Control & Preprocessing of Metagenomic Data
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
Cross_genome: Assembly Scaffolding using Cross-species Synteny
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
A Fast Hybrid Short Read Fragment Assembly Algorithm
Denovo genome assembly of Moniliophthora roreri
Professors: Dr. Gribskov and Dr. Weil
Assembly.
Pre-assembly analyses
Kallisto: near-optimal RNA seq quantification tool
Gene Prediction.
Genome Sequencing and Assembly
Roye Rozov Shamir group meeting 3/7/13
Presentation transcript:

Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick

454 raw reads PRE-PROCESSING Illumina raw reads Pre-processing 454 reads Illumina reads Statistical analysis Read stats Published Genomes from public databases V. vulnificus YJ016 V. vulnificus CMCP6 V. vulnificus MO6-24/O Align Illumina against the reference Fastqc Prinseq NGS QC Compare mapping statistics Reference genome samstats bwa REFERENCE SELECTION Hybrid DeNovo Ray MIRA Illumina/ 454/ Hybrid DeNovo assembly 454 DeNovo Newbler CABOG SUTTA Illumina DeNovo Allpaths LG SOAP DeNovo Velvet Taipan SUTTA contigs * 3 Align illumina reads against 454 contigs Unmapped reads Mac vector CLC wb contigs Unmapped reads Evaluation GAGE Hawk-eye Illumina/(454?) reference based assembly AMOScmp contigs Unmapped reads DENOVO ASSEMBLY REFERENCE BASED ASSEMBLY Draft/ Finished genome Reference evaluation DNA Diff MUMmer Parameter optimization CONTIG MERGING All possible combinations of the best 3 Mimimus MAIA PAGIT Mauve Finished genome Scaffolds GAGE GENOME FINISHING Gap filling Nulceotide identity MUMmer GRASS Built-in Process 454 Illumina Info. Chosen Ref. Assemblers Illumina 454 LEGEND hybrid Original Pipeline

Read Visualization – spot the differences Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results Comparison of 454 Reads for (low coverage) and (improved coverage)

Read Visualization - more is better! Nav reads compared to Nav Illumina reads. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Read Visualization – cousins or siblings? Nav_ and Vul_ (454 and Illumina reads) coverage comparison. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Data Quality Effect of pre-processing data (using prinseq)

V. navarensis (454; non-preprocessed|pre-processed) Metric Per Base Seq. Quality Per Seq. Quality Sc Per Base Seq. Content Per Base GC Content Per Seq. GC Content Per Base N Content Seq. Length Dist. Seq. Dup. Levels Overreprese nted Seqs. Kmer Content Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

V. Vulnificus (454; non-preprocessed|preprocessed) Metric 2009 V_ Per Base Seq. Quality Per Seq. Quality Score Per Base Seq. Content Per Base GC Content Per Seq. GC Content Per Base N Content Seq. Length Dist. Seq. Dup. Levels Overrepresente d Seqs. Kmer Content Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

V. navarensis (Illumina; non-preprocessed|preprocessed) Metric Per Base Seq. Quality Per Seq. Quality Score Per Base Seq. Content Per Base GC Content Per Seq. GC Content Per Base N Content Seq. Length Dist. Seq. Dup. Levels Overrepresented Seqs. Kmer Content Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

V. vulnificus (Illumina; non-preprocessed|preprocessed) Metric2009V_ Per Base Seq. Quality Per Seq. Quality Score Per Base Seq. Content Per Base GC Content Per Seq. GC Content Per Base N Content Seq. Length Dist. Seq. Dup. Levels Overrepresented Seqs. Kmer Content Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Assembly Reference-guided and de-Novo

Reference guided assembly Comparison of reference guided assembly vs de-novo assembly

ARE – Assembly Score Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Reference-guided vs de-Novo assembly ARE Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Summary of Reference-guided assembly  Using V. vulnificus (CMCP6) reference strain  84% coverage  De-Novo assemblers overall provided higher assembly score than reference based assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

De-Novo Assembler Comparison (Optimal Parameters) ARE Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Final Results – V. vulnificus Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable. Newbler (dn) has been removed to show variance in other tools. Span Ratio CABOG Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Final Results – V. vulnificus Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results 1000/(Break Points)

Summary of de-Novo results  OLC assemblers showed considerable differences in ARE than de-Brujin based assemblers  Cabog/Newbler vs Soap de-Novo/Velvet  Hybrid assembler, Ray, did not perform as well in terms of assembly score Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Merging-Vul_ AMOScmpCABOGNewbler (dn;454) Newbler (ref;454) Newbler ref ill Ray (454)Ray(Ill)Ray (hybrid) SOAPdnVelvet AMOScmp CABOG Newbler (dn;454) ND311.98ND Newbler (ref;454) Newbler (ref;Illumina) ND ND Ray (454) Ray (Illumina) ND 64.99ND33.81ND Ray (hybrid) ND49.94ND SOAPdn ND22.92ND Velvet ND37.68ND Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Merging-Nav_ AMOScmpCabogNewblerdnNewbler (ref;454) Newbler (ref;Illumina ) Ray (454)Ray (Illumina) Ray (hybrid) SOAPdnVelvet AMOScmp ND Cabog ND Newblerdn ND Newbler (ref;454) Newbler (ref;Illumina) ND Ray (454) Ray (Illumina) Ray (hybrid) ND SOAP denov o Velvet ND8.31 Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Assembler Review AssemblerStatus454IlluminaHybridAlgorithm Allpaths LGPaired-end onlyDBG AMOScmpBB CABOGOLC MIRAZEBRA NewblerOLC RayDBG SOAPdenovoDBG SUTTAUnresolved errorsBB VelvetDBG BB = branch-and-bound; OLC = overlap consensus; DBG = de Bruijn Graph; ZEBRA Mira worked as good as our merged contigs but it is impractical – 40hr run time Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

454 raw reads PRE-PROCESSING Illumina raw reads Pre-processing 454 reads Illumina reads Statistical analysis Read stats Fastqc Prinseq Hybrid DeNovo Ray Mira Illumina/ 454/ Hybrid DeNovo assembly 454 DeNovo Newbler CABOG Illumina DeNovo Velvet contigs Align illumina reads against 454 contigs contigs DENOVO ASSEMBLY CONTIG MERGING Merge Ray –hyb/ Newbler Merge CABOG/Velvet MIRA-hyb Mimimus Draft genome Process 454 Illumina Info. Assemblers Illumina 454 LEGEND hybrid Final Pipeline Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Splinter Pipeline 1 Pipeline 2 NUMAVGN50 Assembly Size Assembly Score Nav_ Nav_ Nav_ Nav_ Vul_2009 v Vul_ Vul_ Vul_ Vul_ NUMAVGN50 Assembly Size Assembly Score Nav_ Nav_ Nav_ Nav_ Vul_2009 v Vul_ Vul_ Vul_ Vul_ Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Visualization Merged Newbler Ray Hybrid Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results

Demo