Denovo genome assembly of Moniliophthora roreri

Slides:



Advertisements
Similar presentations
ILP-BASED MAXIMUM LIKELIHOOD GENOME SCAFFOLDING James Lindsay Ion Mandoiu University of Connecticut Hamed Salooti Alex ZelikovskyGeorgia State University.
Advertisements

Proprietary Signal Generation and Imaging Photons Generated Reagent Flow PicoTiterPlate Wells Sequencing By Synthesis 1600K field of addressable wells.
Final Results Genome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington,
Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick Genome Assembly.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
High Throughput Sequencing
Sequencing Data Quality Saulo Aflitos. Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion.
JAMES LINDSAY*, HAMED SALOOTI, ALEX ZELIKOVSKI, ION MANDOIU* ACM-BCB 2012 Scaffolding Large Genomes Using Integer Linear Programming University of Connecticut*Georgia.
De-novo Assembly Day 4.
CS 394C March 19, 2012 Tandy Warnow.
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Introduction to Short Read Sequencing Analysis
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Next Generation DNA Sequencing
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
The Changing Face of Sequencing
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Theobroma cacao Integrated Physical and Genetic Map 2 BAC Libraries 250 Genetic Markers.
Vervet Monkey Genomics: Genome Canada and Génome Québec Physical Map Project J. Wasserscheid, G. Leveque, C. Nagy, C. Pinsonnault, and K. Dewar, McGill.
CARS at ICOPA XII, August 2010 Next-gen. Haemonchus contortus genomics.
Jan Pačes Institute of Molecular Genetics AS CR
Bombus terrestris, the buff-tailed bumble bee Native to Europe A managed pollinator Commercially available Reared in greenhouses Important pollinator in.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
billion-piece genome puzzle
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
De novo assembly validation
Denovo Sequencing Practical. Overview Very small dataset from Staphylococcus aureus – 4 million x 75 base-pair, paired end reads Cover basic aspects of.
The Wellcome Trust Sanger Institute
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Dobrynin et al., Genome Biology,  The African cheetah  Fastest land animal  Ancestors were distributed in the Americas, Europe and Asia until.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
De-novo Bacterial draft genome de-novo asembly, from the sequencing machine (Illumina) to a genome database (NCBI) An example case: Assembly of Stenotrophomonas.
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
De Novo Assembly of Mitochondrial Genomes from Low Coverage Whole-Genome Sequencing Reads Fahad Alqahtani and Ion Mandoiu University of Connecticut Computer.
Sequencing, de novo assembling, and annotating the genome of the endangered Chinese crocodile lizard, shinisaurus crocodilurus Jian gao, qiye li, zongji.
Short Read Sequencing Analysis Workshop
Sequence Assembly.
MGmapper A tool to map MetaGenomics data
Quality Control & Preprocessing of Metagenomic Data
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
Phusion2 and The Genome Assembly of Tasmanian Devil
Cross_genome: Assembly Scaffolding using Cross-species Synteny
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Metafast High-throughput tool for metagenome comparison
M. roreri de novo genome assembly using abyss/1.9.0-maxk96
Transcriptomics II De novo assembly
Genome sequence assembly
Professors: Dr. Gribskov and Dr. Weil
Assembly.
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Kallisto: near-optimal RNA seq quantification tool
CS 598AGB Genome Assembly Tandy Warnow.
Genome Sequencing and Assembly
The ability of the SOP to sequence and identify unknown samples.
Introduction to Sequencing
IWGS workflow. iWGS workflow. A typical iWGS analysis consists of four steps: (1) data simulation (optional); (2) preprocessing (optional); (3) de novo.
Presentation transcript:

Denovo genome assembly of Moniliophthora roreri Group 4. Chen, Demeke, Habte, Namrata, Rajdeep, Xu

Introduction M.roreri is a fungal pathogen that causes frosty pod rot in cacao (Theobroma cacao) mainly in central and south America Genomic information is important to enhance our understanding of the pathogen biology Genomic assembly is more important and challenging when there is no reference sequence

Assembly Pipeline Gap filling (Gap filler) Gene Prediction (Quast) Quality Control (FastQC) Scaffolding (SSPACE) Adapters Remove (Trimmomatic) Contaminant cleaning (Bowtie) Contig assembly (Minia)

Introduction to minia An ultra-low memory DNA sequence assembly Human genome can be assembled using 4 GB of memory Produces results of similar contiguity and accuracy to other de Bruijn assemblers like velvet Takes set of short genomic sequences (typically - Illumina DNA sequencer) Version used: Minia 1.5418-maxk128

Recommended k-mer based contig assembly K-mer estimation (Kmergenie) Minia assembly Library Recommended K Minimum coverage Predicted assembly size N50 Longest # Contigs Total length PE reads 77 4 55,981,704 5879 54,716 14,236 47,131,570 MP reads 93 7 56,496,136 1384 21,495 35,033 43,387,625 Unpaired reads 71 56,004,620 7431 54,409 11,989 45,327,642 PE and MP reads 87 16 57,157,776 6488 45,213 13,355 48,323,321 All 11 56,949,824 4017 35,607 18,455 48,488,928

Optimizing the k-mer selection for final assembly All library - all (PE, MP and unpaired) Minimum coverage set to 4 K-mer size Minia assembly N50 (bp) Longest (bp) # Contigs Total length (bp) 51 16,767 187,673 10,050 47,937,871 61 18,316 255,017 9,593 50,039,743 71 19,720 189,801 9,008 51,488,056 81 20,068 155,025 8,476 52,624,232

Effect of k-mer, data type on the assembly Data used k-mer Abundance threshold N50 (kb) Longest Contig (kb) # Contigs Assembly length (Mb) All 81 9 19.7 155.0 8595 52.4 12 18.8 147.9 8571 51.4 19 17.3 114.2 8458 48.2

Scaffolding: SSPACE used Standalone scaffolding program Extend and scaffold pre-assembled contigs Uses Bowtie to map paired libraries to a pre-assembled contigs Use positions and orientations for scaffolding Pairs are found within the allowed distance Together with their orientations - used for contig pairing & ordering

Effect of the library and insert size on scaffolding Library (insert size) # scaffolds N50 (kb) Longest Scaffold (Mb) Total Length (Actual sequence) Ns/100 kb (Total Ns) MP (3500) 899 217.2 1.32 65.1 (52.4) 19.5 kb (12.7 Mb) PE (400) 3417 50.6 0.58 52.24 (52.23) 10.25 (5354 b) MP (2500) 763 233.3 1.91 58.4 (52.4) 10.3 kb (6.02 Mb)

Introduction to GapFiller v 1.11 (Boetzer et al 2012 Genome Biology) Close gaps within previously created scaffolds Gaps within scaffolds are defined as unknown nucleotides (N's) the unknown nucleotides are filled with true nucleotides in order to (try) close the gap

Gap filling pipeline 1st cycle of gap filling 3 iterations # scaffolds = 763 Total Ns: 737 kb (1280/100 kb) Total length (with/without Ns): 57.57 / 56.83 Mb N50: 232.19 Kb Longest scaffold: 1.90 Mb Gaps closed : 7831-4350 = 3481 1st cycle of gap filling 3 iterations PE and MP libraries # scaffolds = 459 Total Ns: 892 kb (1546/100 kb) Total length: 57.72 / 56.83 Mb N50: 477.4 Kb Longest scaffold: 3.76 Mb 2nd cycle of scaffolding with MP libraries # scaffolds = 459 Total Ns: 659.5 kb (1138/100 kb) Total length: 57.9 / 57.2 Mb N50: 478.6 Kb Longest scaffold: 3.76 Mb Gaps closed : 4635-4420 = 215 2nd cycle of gap filling 8 iterations PE and MP libraries

Gene Prediction using quast

Summary Fairly good genome assembly pipeline Longest N50 and scaffold, 3.7MB Lowest # scaffolds, < 500 Fairly low # Ns

Assembly: All data set; K=93; M=11; SSPACE Statistics without ref. Minia 1st Scaffold 2nd Scaffold # contigs 18455 831 472 # contigs (>= 1000 bp) 13394 734 410 # contigs (>= 50000 bp) 290 225 Largest contig 35607 2104580 3259578 Total length 48488928 57810380 58311298 T. length (>= 10000 bp) 4904746 56841305 57975028 T. length (>= 50000 bp) 50546624 54752354 N50 4017 182599 320808 L50 3663 76 47 GC (%) 46.71 46.81 46.8 # N's 5898443 6428349

Thanks