Presentation is loading. Please wait.

Presentation is loading. Please wait.

On realizing shapes in the theory of RNA neutral networks Speaker: Leszek Gąsieniec, U of Liverpool, UK Joint work with: Peter Clote, Boston College, USA.

Similar presentations


Presentation on theme: "On realizing shapes in the theory of RNA neutral networks Speaker: Leszek Gąsieniec, U of Liverpool, UK Joint work with: Peter Clote, Boston College, USA."— Presentation transcript:

1 On realizing shapes in the theory of RNA neutral networks Speaker: Leszek Gąsieniec, U of Liverpool, UK Joint work with: Peter Clote, Boston College, USA Roman Kolpakov, U of Moscow, Russia Evangelos Kranakis, Carleton U, Canada Danny Krizanc, Wesleyan U, USA

2 Realizing shapes RNA sequences RNA secondary structures - shapes Problems: –Finding all shapes related to a single RNA sequence –Realizing a number of shapes based on a single RNA sequence Solutions –NP-hardness result –Exact/approximate solutions

3 Secondary structure: “shape” CFor an integer n>0, a length n RNA nucleotide sequence is considered as a word in space C n ={A,C,G,U} n For a=a 1 a 2 …a n  C n a secondary structure S  S n is a collection of pairs (i,j) s.t.: –a i a j  {AU,UA,CG,GC} –if (i,j) and (k,l)  S then a combination i<k<j<l is not permitted, i.e., pseudo-knots are disallowed –for each pair (i,j)  S the values of i,j are unique

4 Shapes ACGUCGGUACCAGUUGAGGUCCGAGGACG NO ACGUCGGUACCAGUUGAGGUCCGAGGACG NO

5 Shapes Secondary structures can be identified with a balanced parenthesis expressions padded with ‘dots’, where –a dot (°) corresponds to an unpaired nucleotide position, and –a matching parenthesis which opens at nucleotide position i and closes at nucleotide position j corresponds to a base pair (i,j) AC°UCGGUA°CAGUU°A°°UC°GAG°°C°

6 Realizing Shapes Give a shape S  S n and a word a  C n we say that a is realizing S if padding with dots is feasible AC°UCGGUA°CAGUU°A°°UC°GAG°°C° ACGUCGGUACCAGUUGAGGUCCGAGGACG S a

7 Decision Problem “Given a finite set of secondary structures (shapes) {S 1,S 2,…,S k }. Under what conditions does there exist a single DNA sequence which can realize which of the given structures?” What can be done if such a realization is not feasible?

8 Optimization Problem M*RP We add a “don’t care” symbol * which matches any symbol {A,C,G,U}. Given a set of secondary structures (shapes) {S 1,S 2,…,S k } to be realized by a sequence  C n. Find the minimum number of positions N(S 1,...,S k ) for which after removal (replacement) of all base pairs incident to these positions there exists a sequence a  C n which realizes each of the structures S i. We call this the Min * Realizability Problem and we refer to it by M*RP

9 Results O(nk) algorithm for the decision problem, i.e., when N(S 1,…,S k )=0 Proof that M*RP problem is NP-hard for k > 3 (case k=3 is unclear) We also study a bounded version of M*RP with limited number of *s. E.g., we show that the case limited to the presence of a single * is also solvable in time O(nk).

10 M*RP Simplification We observe that a string a realizing the shapes S 1,…,S k over the four letter alphabet {A,C,G,U} exists if and only if there is a binary string b realizing (here we mean that the endpoints of each edge/pair must have a different bit 0/1) the same set of shapes.

11 M*RP Simplification AC°UCGGUA°CAGUU°A°°UC°GAG°°C° 10°010101°0110U°A°°UC°GAG°°C° 10°010101°01101°1°°00°100°°1°

12 Graph of shapes G(S 1,…,S k ) = (V,E) is a graph with: – the set of vertices V containing consecutive positions 1,…, n of base pairs (binary symbols in the simplified version) of the sequence  C n –the set of edges E is the union of the set of edges appearing in the shapes S 1,…,S k

13 Graph of shapes ACGUCGGUACCAGUUGAGGUCCGAGGACG 12n

14 An observation Lemma: –Any set of shapes S 1,S 2,…,S k of size n can be realized by a single binary string b if and only if the graph G(S 1,S 2,…,S k ) has no odd cycles (it is 2-colorable). –Moreover, one can check the existence of b and, if b exists, construct it in O(nk) time

15 M*RP[m] Problem M*RP[m] problem - for any set of shapes S 1,…,S k compute a string over alphabet {0,1,*} which realize all shapes and contain no more than m occurrences of the don’t care symbol * Lemma: M*RP[m] problem can be solved in time O(( )||G(S 1,…,S k )||) nmnm

16 Solving M*RP[1] problem Using the formula from previous slide we know that M*RP[1] problem can be solved in time O(n  ||G(S 1,…,S k )||) In what follows we give some details of the algorithm solving M*RP[1] in time O(||G(S 1,…,S k )||)

17 Critical vertices A vertex of a graph G is called critical if it is contained in all odd cycles in G. Lemma: All critical vertices of an arbitrary graph G can be found in time O(||G||). Theorem: M*RP[1] can be solved in O(||G(S 1,…,S k )||) time.

18 Sketch of the algorithm Find any odd cycle without chords –this can be done via finding any odd cycle C, e.g., with a help of BFS search and the parity test –having an odd cycle we “chop-off” (one after another) its even sub-cycles based on chords –all done in time O(||G||)

19 External connected components K 1,K 2 …,K e Odd cycle C K1K1 K2K2 KiKi

20 Odd neighbor pairs Connected component K i territory Odd cycle C territory 0 1 1 1 1 0 0 2 3 4 Length L x y L + l(x) + l(y) = 5

21 Some properties of external connected components The external components must not contain an odd cycle, i.e., each component is 2-colorable For any K i –a number of odd neighbor pairs of K i must be odd, –and it cannot be larger than 2 Which means that each K i must have exactly one odd neighbor pair, which defines a segment L i on the odd cycle C

22 Critical vertices Let R be the intersection of all L i s One can prove that: –all critical vertices are contained in R –and every vertex in R is critical, i.e., any cycle in G which does not contain vertices from r must be even The content of the set R can be computed in time linear in ||C||.

23 Conclusion Theorem: M*RP[1] can be solved in O(||G(S 1,…,S k )||) time –what is the complexity of M*RP[i]? Theorem: M*RP is NP-hard for k>3 –the case with k=2 is always realizable, and –the complexity of the case with k=3 is not yet established

24 Thank you


Download ppt "On realizing shapes in the theory of RNA neutral networks Speaker: Leszek Gąsieniec, U of Liverpool, UK Joint work with: Peter Clote, Boston College, USA."

Similar presentations


Ads by Google