M. roreri de novo genome assembly using abyss/1.9.0-maxk96

Slides:



Advertisements
Similar presentations
JAMES LINDSAY*, HAMED SALOOTI, ALEX ZELIKOVSKI, ION MANDOIU* Scaffolding Large Genomes Using Integer Linear Programming University of Connecticut*Georgia.
Advertisements

Introduction to Short Read Sequencing Analysis
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Elephant Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Assembly.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Genome sequencing and assembling
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
JAMES LINDSAY*, HAMED SALOOTI, ALEX ZELIKOVSKI, ION MANDOIU* ACM-BCB 2012 Scaffolding Large Genomes Using Integer Linear Programming University of Connecticut*Georgia.
Assembling Genomes BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
De-novo Assembly Day 4.
CS 394C March 19, 2012 Tandy Warnow.
Todd J. Treangen, Steven L. Salzberg
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Introduction to Short Read Sequencing Analysis
How I learned to quit worrying Deanna M. Church Staff Scientist, Short Course in Medical Genetics 2013 And love multiple coordinate.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
University of Connecticut School of Engineering Assembler Reference Abyss Simpson et al., J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones,
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
De novo assembly validation
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Accessing and visualizing genomics data
1. Assembly by alignment Instead of overlap-layout-consensus we use alignment-consensus 2.
OPERA highthroughput paired-end sequences Reconstructing optimal genomic scaffolds with.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
Genome Research 12:1 (2002), Assembly algorithm outline ● Input and trimming ● Overlap detection ● Error correction ● Evaluation of alignments.
Assembly S.O.P. Overlap Layout Consensus. Reference Assembly 1.Align reads to a reference sequence 2.??? 3.PROFIT!!!!!
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Short Read Workshop Day 5: Mapping and Visualization
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
Short Read Workshop Day 5: Mapping and Visualization Video 3 Introduction to BWA.
Sequence Assembly.
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
Phylogeny - based on whole genome data
Cross_genome: Assembly Scaffolding using Cross-species Synteny
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
A Fast Hybrid Short Read Fragment Assembly Algorithm
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Metafast High-throughput tool for metagenome comparison
Denovo genome assembly of Moniliophthora roreri
Fragment Assembly (in whole-genome shotgun sequencing)
Genome sequence assembly
Professors: Dr. Gribskov and Dr. Weil
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Introduction to Genome Assembly
Removing Erroneous Connections
CSE182-L12 Gene Finding.
CS 598AGB Genome Assembly Tandy Warnow.
Padova sequencing contribution:
Maximize read usage through mapping strategies
CSCI 1810 Computational Molecular Biology 2018
IWGS workflow. iWGS workflow. A typical iWGS analysis consists of four steps: (1) data simulation (optional); (2) preprocessing (optional); (3) de novo.
Presentation transcript:

M. roreri de novo genome assembly using abyss/1.9.0-maxk96 Abyss 1.9.0: introduces a new tool called Sealer for closing scaffold gaps. Also, it has Konnector, a fast and memory-efficient tool to fill the gap between paired-end reads. GROUP 5 Hyeim Jung Pedro Pablo Parra Diana Vanessa Sarria Zuniga Jacob Shoemake

Construction of contigs Solving Ambiguities and merging contigs without using the paired-end information Solving Ambiguities and merging contigs Using paired-end information 1 2 HOW ABySS WORKS… Assembly algorithm: two major steps Required Select a ABySS compiled version depending on a maximum k-mer size K-mer size: Kmergenie Input library files Paired-end Unpaired(Single-end) Mate pair The assembly is performed in two major steps. First, without using the paired-end information, contigs are extended until either they cannot be unambiguously extended or come to a blunt end due to a lack of coverage. In the second step the paired-end information is used to resolve ambiguities and merge contigs. The paired-end information is used to identify contigs that can be linked together. Two contigs are considered to be linked if at least p pairs (by default p = 5) join the contigs Contain Konnector: to fill the gap between paired-end reads Sealer: for closing scaffold gaps

OUR ASSEMBLY STRATEGIES… Two assembly types abyss-pe k=87 name=assembly5 lib='pe1' mp='mp1' pe1=‘paired PE.1.fq paired PE2.fq’ se=’unpaired PE-MP’ mp1=‘paired MP.1.fq paired MP.2.fq’ Assembly 3 abyss-pe k=81 name=assembly3 lib='pe1 pe2' mp='mp1' pe1=‘paired PE.1.fq paired PE2.fq’ pe2=‘paired MP.1.fq paired MP.2.fq’ se=’unpaired PE-MP’ mp1=‘paired MP.1.fq paired MP.2.fq’ Paired PE and Unpaired PE-MP 87 Paired PE-MP and Unpaired PE-MP 81 Note: mp1 is used for scaffolding. Do not contribute to the consensus sequence.

Assembly 3 Assembly 5 Contigs Contigs Scaffolds Scaffolds Paired MP Paired PE Paired PE Paired MP Unpaired PE&MP Unpaired PE&MP Scaffolds Scaffolds Paired MP Paired MP

Evaluation of best assemblies Quast Report without reference genome Bowtie2 Assembly File # contigs Largest Total Length N50 # N's Predicted genes Mapped PE reads assembly_5 contigs.fa (total, --min-contig 500bp) 4328 (>= 0 bp) 9711 (>= 1000 bp) 3544 (>= 5000 bp) 1887 (>= 10000 bp) 1181 (>= 25000 bp) 604 (>= 50000 bp) 268 553,471 (total, --min-contig 500bp) 57.68Mb (>= 0 bp) 58.59Mb (>= 1000 bp) 57.12Mb (>= 5000 bp) 52.99Mb (>= 10000 bp) 47.96Mb (>= 25000 bp) 38.75Mb (>= 50000 bp) 27.02Mb 45,432 46,124 (unique) 17734 (>= 0 bp) 104288 (>= 300 bp) 21553 (>= 1500 bp) 1189 (>= 3000 bp) 6 60.40% aligned concordantly exactly 1 time 22.51% aligned concordantly >1 times Total 82.91% scaffolds.fa (total, --min-contig 500bp) 3061 3987 (>= 0 bp) 8242 9654 (>= 1000 bp) 2404 3162 (>= 5000 bp) 1182 1724 (>= 10000 bp) 809 1142 (>= 25000 bp) 503 600 (>= 50000 bp) 301 278 1,036,496 587,564 (total, --min-contig 500bp) 57.84 57.15 Mb (>= 0 bp) 58.70 58.13 Mb (>= 1000 bp) 57.37Mb 56.56 Mb (>= 5000 bp) 54.52 53.09 Mb (>= 10000 bp) 51.82 48.90 Mb (>= 25000 bp) 46.94 Mb 40.24 Mb (>= 50000 bp) 39.60 28.82 Mb 99,290 51,001 568,877 945 (unique) 17465 17507 (>= 0 bp) 103878 103379 (>= 300 bp) 21545 21414 (>= 1500 bp) 1198 1192 (>= 3000 bp) 66 66 60.41% aligned concordantly exactly 1 time 22.54% aligned concordantly >1 times Total 82.95 %  assembly_3 (total, --min-contig 500bp) 4816 (>= 0 bp) 40245 (>= 1000 bp) 3514 (>= 5000 bp) 1642 (>= 10000 bp) 1078 (>= 25000 bp) 567 (>= 50000 bp) 256 1,035,772 (total, --min-contig 500bp) 56.36Mb (>= 0 bp) 61.10Mb (>= 1000 bp) 55.45Mb (>= 5000 bp) 50.87Mb (>= 10000 bp) 46.79Mb (>= 25000 bp) 38.70Mb (>= 50000 bp) 27.77Mb 48,947 247,454 (unique) 17570 (>= 0 bp) 103123 (>= 300 bp) 21274 (>= 1500 bp) 1171 (>= 3000 bp) 63 58.95% aligned concordantly exactly 1 time 22.55% aligned concordantly >1 times Total 81.5% (total, --min-contig 500bp) 3632 4820 (>= 0 bp) 38169 40049 (>= 1000 bp) 2629 3402 (>= 5000 bp) 1158 1573 (>= 10000 bp) 773 1037 (>= 25000 bp) 467 552 (>= 50000 bp) 276 254 1,771,018 701,868 (total, --min-contig 500bp) 57.87 56.05 Mb (>= 0 bp) 62.38 60.78 Mb (>= 1000 bp) 57.17 55.06 Mb (>= 5000 bp) 53.63 50.73 (>= 10000 bp) 50.85 46.82 Mb (>= 25000 bp) 46.16 39.16 Mb (>= 50000 bp) 39.24 28.51 Mb 102,079 51,480 1,600,849 806 (unique) 17398 17578 (>= 0 bp) 103404 103318 (>= 300 bp) 21315 21317 (>= 1500 bp) 1182 1172 (>= 3000 bp) 63 63 58.97% aligned concordantly exactly 1 time 22.62% aligned concordantly >1 times Total 81.59%  Evaluation of best assemblies PE: 126-662, peak 301 MP: 832-6140, peak 1700 Quast options: quast/3.2 --gene-finding --eukaryote Bowtie2 options: bowtie2/2.2.9 --very-sensitive-local --no-unal --phred33 -p 20

conclusions Total Length of Assembly # Scaffolds Largest scaffold N50   Abyss assembly Broken Comment Total Length of Assembly (~) Assembly 5 Assemblies: Same Broken: Assembly 3 has 1.1 Mb less. # Scaffolds Assembly 3 has many Scaffolds <500 bp compared with Assembly 5. Largest scaffold Assembly 3 N50 Assembly 3 (~) Abyss: Assemb. 3 has 2,789 bp more. Broken: Assemb. 3 has 479 bp more. # N's Abyss: Assemb. 3 has 1 Mb more N's. Broken: Assemb. 5 has 139 more N's. # Unique predicted genes Assembly 5 (~) Abyss: Assemb. 5 has 67 genes more Broken: Assemb. 3 has 71 genes more Mapped paired end reads Assemb. 5 has 1.36% more (82.95% vs 81.59%).

25298314 reads; of these: 25298314 (100.00%) were paired; of these: 4322365 (17.09%) aligned concordantly 0 times 15280202 (60.40%) aligned concordantly exactly 1 time 5695747 (22.51%) aligned concordantly >1 times ---- 4322365 pairs aligned concordantly 0 times; of these: 2648376 (61.27%) aligned discordantly 1 time 1673989 pairs aligned 0 times concordantly or discordantly; of these: 3347978 mates make up the pairs; of these: 37310 (1.11%) aligned 0 times 725071 (21.66%) aligned exactly 1 time 2585597 (77.23%) aligned >1 times 99.93% overall alignment rate