OPERA highthroughput paired-end sequences Reconstructing optimal genomic scaffolds with
Over view Preliminaries Methods Results
Preliminaries
Schematic of the process
Assembly in a Short View Contiguration: Overlapped reads make longer segments named “contigs” Mapping: Alignning paired-end reads on contigs results a graph whose nodes and edges are contigs and reads, respectively Filtering: Removing inconsistent edges Scaffolding: Reconstructing the whole genome by ordering, orienting, and relative distance
Sequence Assembly
Related Works
Methods
Corcondancy and Scaffold Graph
Corcondancy and Scaffold Graph (Cont’d) A paired-read is concordant in a scaffold if the suggested orientation is satisfied and the distance between the reads is less than a specified maximum library size T Given a set of contigs and a mapping of paired reads onto contigs, a scaffold graph G is a graph in which contigs are nodes and are connected by scaffold edges representing multiple paired-reads Scaffolding Problem: Given a scaffold graph G, find a scaffold S of the contigs that maximizes the number of concordant edges in the graph The decision version of scaffolding problem is NP-complete OPERA suggest a dynamical programming method to solve the scaffolding problem
Scaffolding Problem For a scaffold graph G=(V,E), a partial scaffold S’ is a scaffold on a subset of the contigs (vertices) For a partial scaffold S’, dangling set D(S’) is the set of edges from S’ to V-S’ The active region A(S’) is the shortest suffix of S’ such that all dangling edges are adjacent to a contig in A(S’) A partial scaffold S’ is said to be valid if all edges in the induced subgraph are concordant If S’1 and S’2 are two valid partial scaffolds of G with the same active region and dangling set, then they contain the same set of contigs, and both or niether of them can be extended to a solution Given a scaffold graph G=(V,E) and an empty scaffold, the algorithm “Scaffold-Bounded-Width” returns a scaffold S of G with no discordant edges and runs in, where w is the library width
Scaffolding Problem (Cont’d)
Consider a graph G=(V,E) and let p be the maximum allowed number of discordant edges. The algorithm “Scaffold” returns a scaffold S of G with at most p discordant edges and runs in
Scaffolding Problem (Cont’d)
Results
Run Time Comparison
Scaffold Contiguity
Scaffold Corectness
Scaffold Corectness (Cont’d)