OPERA highthroughput paired-end sequences Reconstructing optimal genomic scaffolds with.

Slides:



Advertisements
Similar presentations
P, NP, NP-Complete Problems
Advertisements

JAMES LINDSAY*, HAMED SALOOTI, ALEX ZELIKOVSKI, ION MANDOIU* Scaffolding Large Genomes Using Integer Linear Programming University of Connecticut*Georgia.
22C:19 Discrete Math Graphs Fall 2014 Sukumar Ghosh.
Approximations of points and polygonal chains
The Greedy Approach Chapter 8. The Greedy Approach It’s a design technique for solving optimization problems Based on finding optimal local solutions.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
ILP-BASED MAXIMUM LIKELIHOOD GENOME SCAFFOLDING James Lindsay Ion Mandoiu University of Connecticut Hamed Salooti Alex ZelikovskyGeorgia State University.
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 excerpts Graphs (breadth-first-search)
Gao Song 2010/04/27. Outline Concepts Problem definition Non-error Case Edge-error Case Disconnected Components Simulated Data Future Work.
CSC5160 Topics in Algorithms Tutorial 2 Introduction to NP-Complete Problems Feb Jerry Le
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Network Optimization Problems: Models and Algorithms This handout: Minimum Spanning Tree Problem.
- 1 - Intentional Mobility in Wireless Sensor Networks Deployment, Dispatch, and Applications Dr. You-Chiun Wang ( 王友群 ) Department of Computer Science,
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
NP-Complete Problems Problems in Computer Science are classified into
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
The Maximum Independent Set Problem Sarah Bleiler DIMACS REU 2005 Advisor: Dr. Vadim Lozin, RUTCOR.
22C:19 Discrete Math Graphs Spring 2014 Sukumar Ghosh.
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
JAMES LINDSAY*, HAMED SALOOTI, ALEX ZELIKOVSKI, ION MANDOIU* ACM-BCB 2012 Scaffolding Large Genomes Using Integer Linear Programming University of Connecticut*Georgia.
De-novo Assembly Day 4.
MCS312: NP-completeness and Approximation Algorithms
Mon C222 lecture by Veli Mäkinen Thu C222 study group by VM  Mon C222 exercises by Anna Kuosmanen Algorithms in Molecular Biology, 5.
CS 394C March 19, 2012 Tandy Warnow.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Chapter 5 Dynamic Programming 2001 년 5 월 24 일 충북대학교 알고리즘연구실.
Computer Science Research for The Tree of Life Tandy Warnow Department of Computer Sciences University of Texas at Austin.
1 ELEC692 Fall 2004 Lecture 1b ELEC692 Lecture 1a Introduction to graph theory and algorithm.
1 Steiner Tree Algorithms and Networks 2014/2015 Hans L. Bodlaender Johan M. M. van Rooij.
Fragment assembly of DNA A typical approach to sequencing long DNA molecules is to sample and then sequence fragments from them.
All-Pairs Shortest Paths & Essential Subgraph 01/25/2005 Jinil Han.
Greedy Methods and Backtracking Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Gao Song 2010/02/03.  Background Knowledge  Problem Description  Framework of Solution  Own Methods  Results.
1 The Floyd-Warshall Algorithm Andreas Klappenecker.
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
EMIS 8373: Integer Programming Combinatorial Relaxations and Duals Updated 8 February 2005.
Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
Graphs A graphs is an abstract representation of a set of objects, called vertices or nodes, where some pairs of the objects are connected by links, called.
Walks, Paths and Circuits. A graph is a connected graph if it is possible to travel from one vertex to any other vertex by moving along successive edges.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Graphs. Contents Terminology Graphs as ADTs Applications of Graphs.
Introduction to NP Instructor: Neelima Gupta 1.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
Learning Hidden Graphs Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics Hsin-Chu Chiao Tung Univerity.
Leda Demos By: Kelley Louie Credits: definitions from Algorithms Lectures and Discrete Mathematics with Algorithms by Albertson and Hutchinson graphics.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Fragment Assembly (in whole-genome shotgun sequencing)
Graph theory Definitions Trees, cycles, directed graphs.
Removing Erroneous Connections
ICS 353: Design and Analysis of Algorithms
Reference based assembly
Clustered representations: Clusters, covers, and partitions
Connectivity Section 10.4.
Multiple Genome Rearrangement
Minimum Spanning Tree Algorithms
CSE 589 Applied Algorithms Spring 1999
All pairs shortest path problem
Lecture 14 Shortest Path (cont’d) Minimum Spanning Tree
Lecture 6 Dynamic Programming
Lecture 13 Shortest Path (cont’d) Minimum Spanning Tree
Fragment Assembly 7/30/2019.
Chapter 14 Graphs © 2011 Pearson Addison-Wesley. All rights reserved.
INTRODUCTION A graph G=(V,E) consists of a finite non empty set of vertices V , and a finite set of edges E which connect pairs of vertices .
GRAPH TRAVERSAL.
Presentation transcript:

OPERA highthroughput paired-end sequences Reconstructing optimal genomic scaffolds with

Over view Preliminaries Methods Results

Preliminaries

Schematic of the process

Assembly in a Short View Contiguration: Overlapped reads make longer segments named “contigs” Mapping: Alignning paired-end reads on contigs results a graph whose nodes and edges are contigs and reads, respectively Filtering: Removing inconsistent edges Scaffolding: Reconstructing the whole genome by ordering, orienting, and relative distance

Sequence Assembly

Related Works

Methods

Corcondancy and Scaffold Graph

Corcondancy and Scaffold Graph (Cont’d) A paired-read is concordant in a scaffold if the suggested orientation is satisfied and the distance between the reads is less than a specified maximum library size T Given a set of contigs and a mapping of paired reads onto contigs, a scaffold graph G is a graph in which contigs are nodes and are connected by scaffold edges representing multiple paired-reads Scaffolding Problem: Given a scaffold graph G, find a scaffold S of the contigs that maximizes the number of concordant edges in the graph The decision version of scaffolding problem is NP-complete OPERA suggest a dynamical programming method to solve the scaffolding problem

Scaffolding Problem For a scaffold graph G=(V,E), a partial scaffold S’ is a scaffold on a subset of the contigs (vertices) For a partial scaffold S’, dangling set D(S’) is the set of edges from S’ to V-S’ The active region A(S’) is the shortest suffix of S’ such that all dangling edges are adjacent to a contig in A(S’) A partial scaffold S’ is said to be valid if all edges in the induced subgraph are concordant If S’1 and S’2 are two valid partial scaffolds of G with the same active region and dangling set, then they contain the same set of contigs, and both or niether of them can be extended to a solution Given a scaffold graph G=(V,E) and an empty scaffold, the algorithm “Scaffold-Bounded-Width” returns a scaffold S of G with no discordant edges and runs in, where w is the library width

Scaffolding Problem (Cont’d)

Consider a graph G=(V,E) and let p be the maximum allowed number of discordant edges. The algorithm “Scaffold” returns a scaffold S of G with at most p discordant edges and runs in

Scaffolding Problem (Cont’d)

Results

Run Time Comparison

Scaffold Contiguity

Scaffold Corectness

Scaffold Corectness (Cont’d)