Professors: Dr. Gribskov and Dr. Weil

Slides:



Advertisements
Similar presentations
MCB Lecture #15 Oct 23/14 De novo assemblies using PacBio.
Advertisements

JAMES LINDSAY*, HAMED SALOOTI, ALEX ZELIKOVSKI, ION MANDOIU* Scaffolding Large Genomes Using Integer Linear Programming University of Connecticut*Georgia.
ILP-BASED MAXIMUM LIKELIHOOD GENOME SCAFFOLDING James Lindsay Ion Mandoiu University of Connecticut Hamed Salooti Alex ZelikovskyGeorgia State University.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
CS273a Lecture 4, Autumn 08, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
CS262 Lecture 11, Win07, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
DNA Sequencing Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the circular genome (host)
Assembly.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Aut08, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
CSE182-L10 LW statistics/Assembly. Whole Genome Shotgun Break up the entire genome into pieces Sequence ends, and assemble using a computer LW statistics.
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
Sequencing Data Quality Saulo Aflitos. Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion.
De-novo Assembly Day 4.
How to Build a Horse Megan Smedinghoff.
Mon C222 lecture by Veli Mäkinen Thu C222 study group by VM  Mon C222 exercises by Anna Kuosmanen Algorithms in Molecular Biology, 5.
CS 394C March 19, 2012 Tandy Warnow.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Improving the Accuracy of Genome Assemblies July 17 th 2012 Roy Ronen *,1, Christina Boucher *,1, Hamidreza Chitsaz 2 and Pavel Pevzner 1 1. University.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Jan Pačes Institute of Molecular Genetics AS CR
Gena Tang Pushkar Pande Tianjun Ye Xing Liu Racchit Thapliyal Robert Arthur Kevin Lee.
Bombus terrestris, the buff-tailed bumble bee Native to Europe A managed pollinator Commercially available Reared in greenhouses Important pollinator in.
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
1.Data production 2.General outline of assembly strategy.
University of Connecticut School of Engineering Assembler Reference Abyss Simpson et al., J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones,
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
De novo assembly validation
The Wellcome Trust Sanger Institute
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Sequencing and Assembly of the WheatD Genome using BAC Pools A Preliminary Study Daniela Puiu Sept 23rd 2013.
Will 10x technology make us rethink genome assemblies?
Short Read Sequencing Analysis Workshop
DNA Sequencing Project
Sequence Assembly.
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
Cross_genome: Assembly Scaffolding using Cross-species Synteny
Gonzalo Riadi February, 2013 – December, 2015
Adapted from Rayan Chikhi
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Denovo genome assembly of Moniliophthora roreri
M. roreri de novo genome assembly using abyss/1.9.0-maxk96
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
Genome sequence assembly
Introduction to Genome Assembly
CS 598AGB Genome Assembly Tandy Warnow.
How to Build a Horse: Final Report
Genome Sequencing and Assembly
Sequence the 3 billion base pairs of human
Presentation transcript:

Professors: Dr. Gribskov and Dr. Weil AGRY-600 Genomics Genome Assembly Professors: Dr. Gribskov and Dr. Weil Group 3: Brett Lane Amanpreet Kaur Stefanie Griebel Yulu Chen Rupesh Gaire Akanksha Singh

Cleaned Data as Input Files Genome Assembly Pipeline Data Cleaning REAPR QUAST Gene finding GapFiller SSPACE SPAdes Kmergenie Kmer size Cleaned Data as Input Files Assembly Group 3

Results Cleaning Steps - MP reads Group 3

Kmergenie Kmergenie was used to determine the size of k-mer for assembly. All the reads – paired end and mate paired were used to predict k-mer size Best kmer size predicted: 87 Predicted assembly size: 56.6 Mb Group 3

SPAdes Genome Assembler - Why? SPAdes is suitable for: Illumina reads Bacterial and fungal data Small genomes not large genomes Paired-end reads, Mate-pair reads and unpaired reads Group 3

Assembly Statistics Group 3 N50 Longest Contigs/ Scaffolds Total Length Program Data Comments 70,639 (Contigs) 74,736 (Scaffolds) 608,380 3293(Contigs) 3187(Scaffolds) 55.9 Mb SPAdes used multiple k-mer values: 51,61,71,81,83,87 PE, MP, both unpaired #N's=2873 5.14 Ns/100 kbp SPAdes allows De Bruijn graph assembly at multiple k-mer sizes, not just a single fixed one. Merges different k-mer assemblies Group 3

Scaffolding using SSPACE v3.0 The scaffolding was done using SSPACE v 3.0 using the Mate Pair Reads Programme Scaffolds N50 Longest contig Total length #N’s per 100 kb SPAdes 8073 (>=500bp=3187) 74.7 Kb 608 Kb 55.9 Mb 5.14 SSPACE v 3.0 5183 (>=500bp=967) 193 Kb 1.56 Mb 61.6 Mb 7855.43 Group 3

Bridging the gaps using GapFiller No. of N’s was very high after scaffolding : 4783167 #N’s per 100 Kb: 7855.43 GapFiller was used for filling the gaps using Mate Pair reads It reduced the number of N’s: 1254.29/100 Kb Scaffolds N50 Longest contig Total length #N’s per 100 kb Before gap filling 5183 (>=500bp=967) 193Kb 1.56 Mb 61.6 Mb 7855.43 After gap filling 1254.29 GapFiller highly reduced the gaps Group 3

Mapping the PE reads to the Assembly We used bowtie 2 to map the Paired end reads to our final assembly 59.58 % aligned concordantly exactly 1 time 24.01% aligned concordantly >1 times Total: 83.59 % Overall alignment rate : 99.94 % Group 3

REAPR Error free bases: 85.75% Total Number of errors: 2652 FCD errors within a contig: 615 FCD errors over a gap: 46 Low fragment coverage within a contig: 146 Low fragment coverage over a gap: 1845 85 % of the bases are error free which is good Group 3

Gene Prediction Genes were predicted using Quast No. of predicted genes Unique 17455 (>= 0 bp) 107219 (>= 300 bp) 22348 (>= 1500 bp) 1241 (>= 3000 bp) 77 Group 3

Conclusion Good assembly N50: 193Kb Longest contig : 1.56 Mb Less no. of gaps 85.75 % bases are error free Only concern is the total length of the genome (61 Mb) Group 3

Thanks