Download presentation
Presentation is loading. Please wait.
Published byBernard Lane Modified over 9 years ago
1
Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.
2
2 Outline Biological motivation. mRNA molecules. The mRNA to protein process. Selenocysteine Insertion. The MRSO problem. Implied structure graph. Known results. Two natural parameters. The parameters. Nice edge bipartition. A general algorithm for both parameters.
3
3 Outline The cutwidth parameter. An efficient algorithm for small cutwidth. Implications of this algorithm. Binary similarity functions. Closing remarks.
4
4 mRNA molecules: Can be considered as strings over {A,C,G,U}. Complementary bases (A-U, G-C) may pair to form a folding structure (secondary structure) of the mRNAs. Encode genetic information that is later translated into proteins. Biological Motivation
5
5 The mRNA protein process:
6
6 The mRNA protein process - standard assumption: Each codon encodes into a single amino acid. Recently, biologists found that this not necessarily true: According to different folding structures of the mRNA, a single codon might encode into different amino acids. Example application - Selenocysteine insertion. Biological Motivation
7
7 Selenocysteine insertion: Selenocysteine is a rare amino acid only recently discovered. Generated by the UGA codon which usually encodes a stop signal. The presence of the SECIS element forces the generation of Selenocysteine rather than stopping the encoding. Biological Motivation
8
8 Selenocysteine insertion: Modifying existing proteins by inserting the SECIS element results in certain cases in enhanced proteins. Is this application only the tip of the iceberg? Biological Motivation
9
9 The MRSO problem The MRSO problem: Given a specified secondary structure S and an mRNA sequence R, construct an mRNA sequence R’ with complementary nucleotides according to S which is as similar as possible to R. CGG CGACUAAAU + R S
10
10 G CGU The MRSO problem The MRSO problem: Given a specified secondary structure S and an mRNA sequence R, construct an mRNA sequence R’ with complementary nucleotides according to S which is as similar as possible to R. CG CGACUA R’ A G A U
11
11 The score of a solution is given by n similarity functions: Given f 1,…,f n, one needs no additional information on the source mRNA sequence R. CGU CGACUAGCG R’ s(R’) = f 1 (CGU) + f 2 (CGA) + f 3 (CUA) + f 4 (GCG) The MRSO problem
12
12 implied structure graph The implied structure graph: A linear graph with maximum degree 3. Complementary constrains within nucleotides are labeled on the edges of G. S 1 234 G The MRSO problem
13
13 The MRSO problem A more formal definition [Backofen et al.’02]: Given an implied structure graph G with n vertices, and f 1,…,f n similarity functions, find an assignment of codons c 1,…,c n to the vertices of G that: 1. Maximizes f(c i ). 2. Is compatible with respect to G. Definition allows adapting to different applications. Allows also a certain degree of combinatorial leverage as we shall soon see…
14
14 The MRSO problem – known results [Backofen et al.’02 and Bongartz’04]: NP-complete (APX-hard) for general implied structure graphs. Constant factor approximation algorithms. Cannot handle well - . In P when the implied structure graph G is outer-planar. In other words, if one can permutate the nodes of G such that all of the edges of G are non-crossing. [Backofen et al.’02] give an O (n) algorithm for outer-planar implied structure graphs. We call this algorithm A op in this talk.
15
15 1 234 Two natural parameters Let = # degree 3 vertices in G. Let = # edge crossings in G. 56 7 8
16
16 Two natural parameters Modifying the similarity functions: We can modify the similarity functions so that some vertices are assigned specific codons in any feasible solution. For example: Ensuring the first vertex is assigned AAA: f* 1 (AAA) = f 1 (AAA). f* 1 (C) = - , for all C AAA.
17
17 6 Nice edge bipartition Nice edge bipartition of G: Upper part induces an outer-planar graph. Two natural parameters 1 2 3 4 5 78 Upper part Bottom part
18
18 A general algorithm: Enumerate all assignments which are compatible with respect to the bottom part. Invoke A op with each such assignment. Time complexity = O (2 O (b) n), where b = # bottom edges. Two natural parameters 6 1 2 3 4 5 78
19
19 The general algorithm can be applied for our two natural parameters: Parameter = # edge crossings in G. Time = O (2 O ( ) n), hence polynomial for = O (lgn). 5 Two natural parameters 12 34 67 8
20
20 The general algorithm can be applied for our two natural parameters: Parameter = # degree 3 vertices in G. Every graph with maximum degree 2 is outer-planar. Time = O (2 O ( ) n), hence polynomial for = O (lgn). Two natural parameters 7 13 5 24 68 1234 5678
21
21 4 56 3 1 2 cutwidth The cutwidth of G: For p {1,…,n-1}, let E p denote the edges connecting vertices from {1,…,p} to {p+1,…,n}, and let V p denote the vertices of G which are incident to E p. Let denote the cutwidth of G. Then = max p |E p |. 7 The cutwidth parameter 8 p = 2 EpEp VpVp
22
22 Algorithm outline: Pick any p {1,…,n-1}. For each assignment for V p that is compatible with E p : Recursively find the optimal solution for the subgraphs of G induced by {1,…,p} and {p+1,…,n} under this assignment. Return the highest scoring solution found in the previous step. The cutwidth parameter 1 2 7 34 568 CGAUAACGGAUAGUUCGC
23
23 Time = O (2 O ( ) n), hence polynomial for = O (lgn). Theorem [Korach&Solel’93 via Chung&Seymour’89]: Any graph G with n vertices and constant treewidth has a vertex ordering such that G under this ordering has cutwidth of O(lgn). Theorem [Bodlaender’95]: If G is either a chordal graph or a circular-arc graph with constant maximum clique size then G has constant treewidth. If G is k-outerplanar for any constant k then G has constant treewidth. Combining all the above we get: MRSO is polynomial time solvable if G is either a chordal graph, a circular- arc graph, or k-outerplanar. The cutwidth parameter
24
24 Binary similarity functions Suppose we are only interested in the number of “correct” codons in a solution. In this case we can restrict ourselves to binary similarity functions. That is, for all i : f i : 3 {0,1}. Unfortunately, MRSO is NP-hard even when restricted only to instances with binary similarity functions. CGG CGACUAAAU Source CUAGGACGGUGA Target CGG GACUAAAUCGACGGUGA U A C C CUA AAU CGA CGGUGA GACGG
25
25 Binary similarity functions MRSO with restrictive similarity functions is in FPT for parameter = score of the optimal solution. More precisely, its solvable in O ( 2 9.25 n) time. Proof sketch: We can assume w.l.o.g. that for all i there exists a C such that f i (C) = 1. Any maximal independent set in G is of size at least n/4, since G is at most cubic. We prove for n/4 and > n/4 separately.
26
26 Binary similarity functions Suppose n/4: Find an independent set of size in O ( ) time. Since for all i there exists a C such that f i (C) = 1, there exists an assignment to this independent set which guarantees a score of at least . Since f i 0 for all i, this assignment can be extended to all vertices of G to obtain an assignment with score at least . Suppose > n/4: Try all - subsets of the vertices of G. There are at most 2 3.25 such subsets. Enumerating all possible codon assignments for each subset requires O (2 6 ) time. 44 ( )
27
27 Closing remarks Extending our results: Finding a practical algorithm for the cutwidth problem restricted to cubic graphs with fixed cutwidth. More interesting parameters? Hardness results? Applying our techniques to a similar variation of the problem which has been studied in the literature [Backofen’04]. Thank You!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.