Download presentation
Presentation is loading. Please wait.
1
CSCI2950-C Genomes, Networks, and Cancer
Computability of Models for Sequence Assembly
2
Outline Some Terminology Other Algorithms
Assembly of Double-Stranded DNA with Bidirected Flow Chinese Postman Problem ~ Eulerization Problem Bidirected De Brujin Graph Discussion
3
String Terminology Let v and w be two string over the alphabet
v.w : concatenation of v and w |v| : length of v v[i] : ith character of v v[i,j] : substring of v, beginning at the ith character, ending at the jth character v : v concatenated with itself k times i,j s.t. v = w[i,j] : v is a substring of w k
4
String Terminology A string of length k is called a k-mer
The set all k-mers that are substring of v is called k-spectrum of v A pair of reverse complement k-mers is called a k-molecule
5
Graph Terminology
6
Graph Terminology
7
The String Graph Framework & The De Brujin Graph Framework NP - HARD
8
Assembly of Double-Stranded DNA with Bidirected Flow
9
Recall that... Given a weighted bidirected graph G : Chinese Walk
~ cyclical walk that traverses each edge at least once Chinese Postman Problem (CPP) ~ finding a minimum weight Chinese Walk of G or reporting the non-existence of such a walk Eulerization Problem (EP) ~ finding a minimum weight Eulerization Extension of G or reporting the non-existence of such an extension
10
Theorem - 1 Given a bidirected graph G,
G contains an Eulerian tour if and only if it is connected and balanced
11
Theorem - 2 Given a weighted bidirected graph G,
There exists a Chinese walk of weight i if and only if there exists an Eulerian extension of weight i 1 2
12
1 2 Proof ( ) W : a Chinese walk in G
Construct a new graph W2 , induced by W, in a way that the multiplicity of each edge is the number of time it is traversed by W
13
1 2 Proof ( ) G W : a Chinese walk in G
Construct a new graph W2 , induced by W, in a way that the multiplicity of each edge is the number of time it is traversed by W G
14
1 2 Proof ( ) W2 G W : a Chinese walk in G
Construct a new graph W2 , induced by W, in a way that the multiplicity of each edge is the number of time it is traversed by W W2 G
15
W visits every edge of G at least once W2 is an extension of G
Proof ( ) 1 2 G W2 W visits every edge of G at least once W2 is an extension of G + W visits every edge of W2 exactly once W is an Eulerian circuit of W2 W2 is an Eulerian extension of G
16
Proof ( ) 2 1 G2 : an Eulerian extension of G G2 G
17
2 1 Proof ( ) G2 G W2 : AaBbCcBbCfDeCgAdDeCgA
W2 : an Eulerian circuit in G2 with weight w G2 G W2 : AaBbCcBbCfDeCgAdDeCgA
18
2 1 Proof ( ) G2 G W2 : AaBbCcBbCfDeCgAdDeCgA
Construct W from W2 by replacing every edge e’ G by an edge e G such that e’ is a duplicate of e. W : AaBbCcBbCfDeCgAdDeCgA
19
W is a Chinese Walk with weight i
Proof ( ) 2 1 G W : AaBbCcBbCfDeCgAdDeCgA W is a cyclical walk in G which traverses every edge at least once and its weight is the same as the weight of W2 , i. W is a Chinese Walk with weight i
20
Given a weighted bidirected graph G,
Theorem - 1 G contains an Eulerian tour if and only if it is connected and balanced Theorem - 2 There exists a Chinese walk of weight i there exists an Eulerian extension of weight i
21
A Polynomial Time Algorithm for CPP
~ based on the Theorem 1 & 2 Given a weighted bidirected graph G, - If G is not connected, any extension will be not connected No Chinese Walk exists
22
A Polynomial Time Algorithm for CPP
If G is connected, formulate EP as a min-cost bidirected flow problem as follows: (G’ is the desired extension of G) Constants we : weight of edge e Variables fe: additional copies of edge e required to extend from G to G’
23
A Polynomial Time Algorithm for CPP
Constraints ~ using Theorem – 1 for each vertex x for each edge e
24
A Polynomial Time Algorithm for CPP
Integer Programming Model:
25
A Polynomial Time Algorithm for CPP
Soundness of the Algorithm G is connected G’ is connected + Constraint – 1 G’ is balanced G’ is Eulerian
26
A Polynomial Time Algorithm for CPP
Is G’ a min-weight Eulerian-extension?
27
A Polynomial Time Algorithm for CPP
Is G’ a min-weight Eulerian-extension? Yes! Objective Function minimizes total weight of inserted edges
28
A Polynomial Time Algorithm for CPP
Pseudo-code IF G is not connected RETURN “no Chinese walk exists” ELSE Solve it as a Min-Cost Flow Problem IF there is no feasible solution, RETURN “no Chinese walk exists” ...
29
A Polynomial Time Algorithm for CPP
ELSE Construct the G’ Find an Eulerian circuit of G’ Find the corresponding Chinese walk
30
A Polynomial Time Algorithm for CPP
Running Time? O(|E| log (|V|))
31
A Polynomial Time Algorithm for CPP
Integer Programming Model:
32
A Polynomial Time Algorithm for CPP
33
A Polynomial Time Algorithm for CPP
Optimal Solution: fb = 1 fe = 1 fg = 1 all other variables are zero
34
A Polynomial Time Algorithm for CPP
? A Polynomial Time Algorithm for Sequence Alignment
35
A Polynomial Time Algorithm for Sequence Alignment
Input: k-molecule spectrum of the genome ATT TTG TGC GCC CCA CAA AAC TAA AAC ACG CGG GGT GTT TTG
36
A Polynomial Time Algorithm for Sequence Alignment
Arbitrarily label one k-mer as positive, one k-mer as negative - ATT - TTG +TGC +GCC TAA + AAC+ ACG - CGG- +CCA +CAA +AAC GGT- GTT TTG -
37
A Polynomial Time Algorithm for Sequence Alignment
Construct nodes from all possible (k-1) molecules -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- For every k-molecule in the spectrum, let z be one of its two k-mers Let x and y be (k-1)-mers corresponding to z[1..k-1] and z[2..k] respectively
38
A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: - ATT TAA+ An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise
39
A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: - TTG AAC+ An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise
40
A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: + TGC ACG- An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise
41
A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: + GCC CGG- An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise
42
A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: + CCA GGT- An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise
43
A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: + CAA GTT- An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise
44
A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: + AAC TTG- An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise
45
A Polynomial Time Algorithm for Sequence Alignment
-AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG-
46
A Polynomial Time Algorithm for Sequence Alignment
How to read the sequence? If a positive incident edge is used to enter the node, read negative k-mer If a negative incident edge is used to enter the node, read positive k-mer -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG-
47
A Polynomial Time Algorithm for Sequence Alignment
-AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- ATTGCCAAC
48
Future Work ... NP – hardness ? Optimal solution ?
~ parsimony assumption
49
Any questions / comments?
Thanks... Any questions / comments?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.