CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow.

Slides:



Advertisements
Similar presentations
Accurate Assembly of Maize BACs Patrick S. Schnable Srinivas Aluru Iowa State University.
Advertisements

CS 336 March 19, 2012 Tandy Warnow.
The Build-up of the Red Sequence at z
Simple Graph Warmup. Cycles in Simple Graphs A cycle in a simple graph is a sequence of vertices v 0, …, v n for some n>0, where v 0, ….v n-1 are distinct,
Graph Theory Aiding DNA Fragment Assembly Jonathan Kaptcianos advisor: Professor Jo Ellis-Monaghan Work.
JAMES LINDSAY*, HAMED SALOOTI, ALEX ZELIKOVSKI, ION MANDOIU* Scaffolding Large Genomes Using Integer Linear Programming University of Connecticut*Georgia.
De Bruijn sequences Rotating drum problem:
13 May 2009Instructor: Tasneem Darwish1 University of Palestine Faculty of Applied Engineering and Urban Planning Software Engineering Department Introduction.
Approximating Maximum Subgraphs Without Short Cycles Guy Kortsarz Join work with Michael Langberg and Zeev Nutov.
ILP-BASED MAXIMUM LIKELIHOOD GENOME SCAFFOLDING James Lindsay Ion Mandoiu University of Connecticut Hamed Salooti Alex ZelikovskyGeorgia State University.
De Bruijn Sequences Define an alphabet A consisting of k elements E.g. A={0,1}, A={a,b,c}, A= { 图,论,很,好, 玩 } K=2 , K=3 , and K=5 in the preceding example.
RNA Assembly Using extending method. Wei Xueliang
MCB Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly.
Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy Microbial Genome Assembly 1.
Next Generation Sequencing, Assembly, and Alignment Methods
Alignment Problem (Optimal) pairwise alignment consists of considering all possible alignments of two sequences and choosing the optimal one. Sub-optimal.
What is the first line of the proof? a). Assume G has an Eulerian circuit. b). Assume every vertex has even degree. c). Let v be any vertex in G. d). Let.
Genome Sequence Assembly: Algorithms and Issues Fiona Wong Jan. 22, 2003 ECS 289A.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Shuffle Exchange Network and de Bruijn’s Graph Shuffle Exchange graph Merge exchange into a single node De Bruijn.
Genome Assembly Charles Yan Fragment Assembly Given a large number of fragments, such as ACC AC AT AC AT GG …, the goal is to figure out the original.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Assembling Genomes BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
De-novo Assembly Day 4.
Mon C222 lecture by Veli Mäkinen Thu C222 study group by VM  Mon C222 exercises by Anna Kuosmanen Algorithms in Molecular Biology, 5.
CS 394C March 19, 2012 Tandy Warnow.
Todd J. Treangen, Steven L. Salzberg
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Opera: Reconstructing optimal genomic scaffolds with high- throughput paired-end sequences Song Gao, Niranjan Nagarajan, Wing-Kin Sung National University.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
Improving the Accuracy of Genome Assemblies July 17 th 2012 Roy Ronen *,1, Christina Boucher *,1, Hamidreza Chitsaz 2 and Pavel Pevzner 1 1. University.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Metagenomics Assembly Hubert DENISE
1 NETTAB 2012 FILTERING WITH ALIGNMENT FREE DISTANCES FOR HIGH THROUGHPUT DNA READS ASSEMBLY Maria de Cola, Giovanni Felici, Daniele Santoni, Emanuel Weitschek.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
1 HKU CS Bioinformatics Research Siu Ming Yiu Department of Computer Science The University of Hong Kong Other faculty members: Prof. Francis Chin Prof.
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
Neil Gealy 8/6/10. Pruning Plots What I have found to be the most optimal sequence of pruning: 1. Consecutive pruning based on radius and threshold 2.
billion-piece genome puzzle
De novo assembly validation
GigAssembler. Genome Assembly: A big picture
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
A new Approach to Fragment Assembly in DNA Sequenceing Fei wu April,24,2006.
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
GENOME ASSEMBLY Candidatus Carsonella Ruddii. Problem: How can Eulerian graphs be used to assemble a genomic sequence? ■Real life scenario: multiple copies.
PIECEWISE FUNCTIONS. PIECEWISE FUNCTION Objectives: 1.Understand and evaluate Piecewise Functions 2.Graph Piecewise Functions 3.Graph Step Functions Vocabulary:
Chapter 5 Sequence Assembly: Assembling the Human Genome.
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
De Bruijn sequences 陳柏澍 Novembers Each of the segments is one of two types, denoted by 0 and 1. Any four consecutive segments uniquely determine.
Assembly algorithms for next-generation sequencing data
CAP5510 – Bioinformatics Sequence Assembly
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Fragment Assembly (in whole-genome shotgun sequencing)
Genome sequence assembly
Assembly.
How to Solve NP-hard Problems in Linear Time
Introduction to Genome Assembly
Removing Erroneous Connections
CS 598AGB Genome Assembly Tandy Warnow.
Do You Want to Build a Transcriptome?
Genome Assembly.
Sequencing at 10,000x using Illumina paired reads
Assembling Genomes BCH339N Systems Biology / Bioinformatics – Spring 2016 Edward Marcotte, Univ of Texas at Austin.
Overview of Shotgun Sequence Analysis
Presentation transcript:

CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

nature biotechnology volume 29 number 11 november 2011

Supplementary Figure 1. De Bruijn graph from reads with sequencing errors. (a) A de Bruijn graph E on our set of reads with k = 4. Finding an Eulerian cycle is already a straightforward task, but for this value of k, it is trivial. (b) If TGGAGTG is incorrectly sequenced as a sixth read (in addition to the correct TGGCGTG read), then the result is a bulge in the de Brujin graph, which complicates assembly. (Supplementary materials from the Compeau, Pevzner, and Tesler paper, Nature Biotech, 2011)

(c) An illustration of a de Bruijn graph E with many bulges. The process of bulge removal should leave only the red edges remaining, yielding an Eulerian path in the resulting graph. (Supplementary materials from the Compeau, Pevzner, and Tesler paper, Nature Biotech, 2011)

(Supplementary materials from the Compeau, Pevzner, and Tesler paper, Nature Biotech, 2011)

N50 The N50 value is the size of the smallest contig (or scaffold) such that 50% of the genome is contained in contigs of size N50 or larger. This is the standard metric used to evaluate the quality of an assembly. Salzberg et al. computed “corrected N50” values by splitting contigs (or scaffolds) where errors are identified.

From Mihai Pop’s paper

Differing Conclusions Compeau et al.: “De Bruijn graphs are not a cure-all…Short read sequencing technologies …favor the use of de Bruijn graphs...and are also well suited to representing genomes with repeats. However, if a future sequencing technology produces high quality reads with tens of thousands of bases,…,the pendulum could swing back toward favoring overlap- based approaches for assembly.”

Mihai Pop’s conclusion

Salzberg’s conclusions