Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI2950-C Genomes, Networks, and Cancer

Similar presentations


Presentation on theme: "CSCI2950-C Genomes, Networks, and Cancer"— Presentation transcript:

1 CSCI2950-C Genomes, Networks, and Cancer
Computability of Models for Sequence Assembly

2 Outline Some Terminology Other Algorithms
Assembly of Double-Stranded DNA with Bidirected Flow Chinese Postman Problem ~ Eulerization Problem Bidirected De Brujin Graph Discussion

3 String Terminology Let v and w be two string over the alphabet 
v.w : concatenation of v and w |v| : length of v v[i] : ith character of v v[i,j] : substring of v, beginning at the ith character, ending at the jth character v : v concatenated with itself k times  i,j s.t. v = w[i,j] : v is a substring of w k

4 String Terminology A string of length k is called a k-mer
The set all k-mers that are substring of v is called k-spectrum of v A pair of reverse complement k-mers is called a k-molecule

5 Graph Terminology

6 Graph Terminology

7 The String Graph Framework & The De Brujin Graph Framework NP - HARD

8 Assembly of Double-Stranded DNA with Bidirected Flow

9 Recall that... Given a weighted bidirected graph G : Chinese Walk
~ cyclical walk that traverses each edge at least once Chinese Postman Problem (CPP) ~ finding a minimum weight Chinese Walk of G or reporting the non-existence of such a walk Eulerization Problem (EP) ~ finding a minimum weight Eulerization Extension of G or reporting the non-existence of such an extension

10 Theorem - 1 Given a bidirected graph G,
G contains an Eulerian tour if and only if it is connected and balanced

11 Theorem - 2 Given a weighted bidirected graph G,
There exists a Chinese walk of weight i if and only if there exists an Eulerian extension of weight i 1 2

12 1 2 Proof (  ) W : a Chinese walk in G
Construct a new graph W2 , induced by W, in a way that the multiplicity of each edge is the number of time it is traversed by W

13 1 2 Proof (  ) G W : a Chinese walk in G
Construct a new graph W2 , induced by W, in a way that the multiplicity of each edge is the number of time it is traversed by W G

14 1 2 Proof (  ) W2 G W : a Chinese walk in G
Construct a new graph W2 , induced by W, in a way that the multiplicity of each edge is the number of time it is traversed by W W2 G

15 W visits every edge of G at least once  W2 is an extension of G
Proof (  ) 1 2 G W2 W visits every edge of G at least once  W2 is an extension of G + W visits every edge of W2 exactly once  W is an Eulerian circuit of W2 W2 is an Eulerian extension of G

16 Proof (  ) 2 1 G2 : an Eulerian extension of G G2 G

17 2 1 Proof (  ) G2 G W2 : AaBbCcBbCfDeCgAdDeCgA
W2 : an Eulerian circuit in G2 with weight w G2 G W2 : AaBbCcBbCfDeCgAdDeCgA

18 2 1 Proof (  ) G2 G W2 : AaBbCcBbCfDeCgAdDeCgA
Construct W from W2 by replacing every edge e’ G by an edge e G such that e’ is a duplicate of e. W : AaBbCcBbCfDeCgAdDeCgA

19 W is a Chinese Walk with weight i
Proof (  ) 2 1 G W : AaBbCcBbCfDeCgAdDeCgA W is a cyclical walk in G which traverses every edge at least once and its weight is the same as the weight of W2 , i. W is a Chinese Walk with weight i

20 Given a weighted bidirected graph G,
Theorem - 1 G contains an Eulerian tour if and only if it is connected and balanced Theorem - 2 There exists a Chinese walk of weight i there exists an Eulerian extension of weight i

21 A Polynomial Time Algorithm for CPP
~ based on the Theorem 1 & 2 Given a weighted bidirected graph G, - If G is not connected, any extension will be not connected  No Chinese Walk exists

22 A Polynomial Time Algorithm for CPP
If G is connected, formulate EP as a min-cost bidirected flow problem as follows: (G’ is the desired extension of G) Constants we : weight of edge e Variables fe: additional copies of edge e required to extend from G to G’

23 A Polynomial Time Algorithm for CPP
Constraints ~ using Theorem – 1 for each vertex x for each edge e

24 A Polynomial Time Algorithm for CPP
Integer Programming Model:

25 A Polynomial Time Algorithm for CPP
Soundness of the Algorithm G is connected  G’ is connected + Constraint – 1  G’ is balanced G’ is Eulerian

26 A Polynomial Time Algorithm for CPP
Is G’ a min-weight Eulerian-extension?

27 A Polynomial Time Algorithm for CPP
Is G’ a min-weight Eulerian-extension? Yes! Objective Function minimizes total weight of inserted edges

28 A Polynomial Time Algorithm for CPP
Pseudo-code IF G is not connected RETURN “no Chinese walk exists” ELSE Solve it as a Min-Cost Flow Problem IF there is no feasible solution, RETURN “no Chinese walk exists” ...

29 A Polynomial Time Algorithm for CPP
ELSE Construct the G’ Find an Eulerian circuit of G’ Find the corresponding Chinese walk

30 A Polynomial Time Algorithm for CPP
Running Time? O(|E| log (|V|))

31 A Polynomial Time Algorithm for CPP
Integer Programming Model:

32 A Polynomial Time Algorithm for CPP

33 A Polynomial Time Algorithm for CPP
Optimal Solution: fb = 1 fe = 1 fg = 1 all other variables are zero

34 A Polynomial Time Algorithm for CPP
? A Polynomial Time Algorithm for Sequence Alignment

35 A Polynomial Time Algorithm for Sequence Alignment
Input: k-molecule spectrum of the genome ATT TTG TGC GCC CCA CAA AAC TAA AAC ACG CGG GGT GTT TTG

36 A Polynomial Time Algorithm for Sequence Alignment
Arbitrarily label one k-mer as positive, one k-mer as negative - ATT - TTG +TGC +GCC TAA + AAC+ ACG - CGG- +CCA +CAA +AAC GGT- GTT TTG -

37 A Polynomial Time Algorithm for Sequence Alignment
Construct nodes from all possible (k-1) molecules -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- For every k-molecule in the spectrum, let z be one of its two k-mers Let x and y be (k-1)-mers corresponding to z[1..k-1] and z[2..k] respectively

38 A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: - ATT TAA+ An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise

39 A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: - TTG AAC+ An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise

40 A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: + TGC ACG- An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise

41 A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: + GCC CGG- An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise

42 A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: + CCA GGT- An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise

43 A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: + CAA GTT- An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise

44 A Polynomial Time Algorithm for Sequence Alignment
Insert edges according to the following criteria: + AAC TTG- An edge is positive incident to x, if x is from the positive strand, and negative incident otherwise -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- An edge is negative incident to y, if y is from the positive strand, and positive incident otherwise

45 A Polynomial Time Algorithm for Sequence Alignment
-AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG-

46 A Polynomial Time Algorithm for Sequence Alignment
How to read the sequence? If a positive incident edge is used to enter the node, read negative k-mer If a negative incident edge is used to enter the node, read positive k-mer -AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG-

47 A Polynomial Time Algorithm for Sequence Alignment
-AT TA+ - TT AA+ + TG AC- + GC CG- +CC GG- +CA GT- + AC TG- ATTGCCAAC

48 Future Work ... NP – hardness ? Optimal solution ?
~ parsimony assumption

49 Any questions / comments?
Thanks... Any questions / comments?


Download ppt "CSCI2950-C Genomes, Networks, and Cancer"

Similar presentations


Ads by Google