Sequencing technologies and Velvet assembly Lecturer : Du Shengyang September 29 , 2012.

Slides:



Advertisements
Similar presentations
Accurate Assembly of Maize BACs Patrick S. Schnable Srinivas Aluru Iowa State University.
Advertisements

Chapter 4 Quality Assurance in Context
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
RNA Assembly Using extending method. Wei Xueliang
MCB Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly.
Next-generation sequencing
Next Generation Sequencing, Assembly, and Alignment Methods
DNA Sequencing with Longer Reads Byung G. Kim Computer Science Dept. Univ. of Mass. Lowell
Canadian Bioinformatics Workshops
Assembly.
Comparative ab initio prediction of gene structures using pair HMMs
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
1 Next Generation Sequencing Itai Sharon November 11th, 2009 Introduction to Bioinformatics.
Genome sequencing and assembling
High Throughput Sequencing
NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.
De-novo Assembly Day 4.
Mon C222 lecture by Veli Mäkinen Thu C222 study group by VM  Mon C222 exercises by Anna Kuosmanen Algorithms in Molecular Biology, 5.
High Throughput Sequencing Methods and Concepts
CS 394C March 19, 2012 Tandy Warnow.
Todd J. Treangen, Steven L. Salzberg
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
8. DNA Sequencing. Fred Sanger, Cambridge, England Partition copied DNA into four groups Each group has one of four bases starved ACGTAAGCTA with T starved.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
High Throughput Sequencing Methods and Concepts Cedric Notredame adapted from S.M Brown.
Metagenomics Assembly Hubert DENISE
The iPlant Collaborative
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
Molecular Biology Dr. Chaim Wachtel May 28, 2015.
BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Gena Tang Pushkar Pande Tianjun Ye Xing Liu Racchit Thapliyal Robert Arthur Kevin Lee.
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
UK NGS Sequencing Update July 2009 Dr Gerard Bishop - Division of Biology Dr Sarah Butcher – Centre for Bioinformatics.
GigAssembler. Genome Assembly: A big picture
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Chapter 5 Sequence Assembly: Assembling the Human Genome.
Third Generation Sequencing. Today Illumina – Solexa sequencing technology 454 Life sciences – 454 sequencer Applied Biosystem – SOLiD system Tomorrow.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
MERmaid: Distributed de novo Assembler Richard Xia, Albert Kim, Jarrod Chapman, Dan Rokhsar.
Sequenziamento: metodo di Sanger con ddNTP
Assembly algorithms for next-generation sequencing data
Sequencing technologies
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
Phylogeny - based on whole genome data
DNA Sequencing -sayed Mohammad Amin Nourion -A’Kia Buford
CAP5510 – Bioinformatics Sequence Assembly
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
Genome sequence assembly
Introduction to Genome Assembly
Removing Erroneous Connections
Distributed Memory Partitioning of High-Throughput Sequencing Datasets for Enabling Parallel Genomics Analyses Nagakishore Jammula, Sriram P. Chockalingam,
CS 598AGB Genome Assembly Tandy Warnow.
Can you draw this picture without lifting up your pen/pencil?
2nd (Next) Generation Sequencing
Genome Sequencing and Assembly
BF nd (Next) Generation Sequencing
Fragment Assembly 7/30/2019.
Presentation transcript:

Sequencing technologies and Velvet assembly Lecturer : Du Shengyang September 29 , 2012

The Advances of DNA Sequencing Technology The first generation of sequenci ng technologies 化学降解法 Sanger 法 荧光自动测序技术 The second generation of se quencing technologies 454 Solexa SOLiD

The third generation of sequencing 一、 Helico BioScience 单分子测序技术 二、 Pacific Bioscience SMRTT 技术 三、 Oxford Nanopore Technologies 的纳米孔单分子测序技术

三代测序技术的优点 High throughput, low cost, long read length, sequencing t ime is short And avoid the second generation sequencing of PCR am plification link reduce the sequencing of the error rate, the real realize t he single molecule sequencing

The key to Sequencing success 1 、 Sample preparation 2 、 Choose the right sequencing platform 3 、 Late bioinformatics analysis

Bioinformatics analysis Introduction Some sequencing techniques are commercially available (e.g. 454 Sequencing, Solexa) 454 Sequencing ~ 100 – 200bp Solexa ~ 30bp

Introduction Euler assembler (Pevzner 2001) used k-mer for a node of de Bruijn graphs Reads are mapped as a path through the de Brujin graph High redundancy does not affect the number of nodes “Velvet” effectively deals with experimental errors and repeats by using Brujin graphs with k-mers

De Bruijn Graphs - structure

De Bruijn Graphs – construction Adjacent k-mers overlap by k-1 nucleotides Each node is attached to twin node  Reverse series of reverse complement k-mers  Overlap between reads from opposite strand Union of a node and its twin node is called a “block”

De Bruijn Graphs – construction For each k-mer, hash table records ID of the first read and its position Each k-mer is recorded with reverse complement Reads are traced through the graph Create a directed arc if necessary

De Bruijn Graphs – simplification Simplify the chains of blocks  No information loss If node A has only one outgoing arc to node B, and if node B has only one ingoing arc → merge 11 AB

De Bruijn Graphs – error removal Velvet focuses on “topological features” of the graph First step: remove tips  Tip: chain of nodes disconnected on one end Use two criteria: (1) length and (2) minority count  Length: remove a tip if < 2k bp since two nearby errors can create a tip up to 2k bp 12 error k k

13 De Bruijn Graphs – error removal Minority count: multiplicity m < n Starting from node B, going through the tip is an alternative to a more common path m n B tip A C

De Bruijn Graphs – error removal Second step: remove bubbles using Tour Bus Redundant paths start and end at the same nodes Bubbles are created by errors or biological variants such as SNP 14 Bubble

De Bruijn Graphs – error removal 15 1.Detect redundant paths 2. Compare them using dynamic programming methods 3. If similar, merge them Tour Bus

De Bruijn Graphs – error removal Third step: remove erroneous connections Remove erroneous connections after Tour Bus algorithm Remove erroneous connections with basic coverage cutoff Genuine short nodes which cannot be simplified in the graph should have high coverage 16

Breadcrumb: resolution of repeats 1. Using read pairs, pair up the long nodes 2. Flag paired reads using unambiguous long nodes 17 unambiguous long nodes

Breadcrumb: resolution of repeats Extends the nodes as far as possible using flagged paired reads All nodes between A and B are paired up to either A or B 18

Experimental Results Test error removal pipeline on simulated data Simulate reads are from E. coli, S. cerevisiae, C.elegans, and H. sapiens 19

Experimental Results Test error removal pipeline on experimental data 173,428 bp human BAC was sequenced using Solexa machines Reads were 35bp long, and k=31 Tour Bus increased sensitivity by correcting errors and preserved the integrity of the graph structure 20

Experimental Results (cont) 21

Conclusions Velvet is a de Bruijn graph based sequence assembly method for short reads Errors are handled by removing tips and Tour Bus algorithm A large number of repeats are resolved by Breadcrumb algorithm Velvet was assessed using simulated and real datasets and it performed well 22