Download presentation
Presentation is loading. Please wait.
1
Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick
2
6-Jul-20062 Ribonucleic Acid (RNA) RNA is an organic molecule that forms long chains Each position in the chain can be one of 4 types (bases): A, G, C, U RNA can code gene information (messenger RNA, viral RNA) RNA can also form structures and take many functions within a cell (eg. tRNA, rRNA and other RNA-protein complexes)
3
6-Jul-20063 RNA Bonds and Structures RNA bases can form bonds, in a largely pairwise fashion (A-U, G-C, some exceptions) RNA is single stranded; its bonds form mostly within a single chain, folding it into a complex structure held together by its bonds RNA function is affected by its structure If two bases are paired, it often does not matter what they are; only unpaired bases are ‘available’ Common substructures can help investigate functional relationships
4
6-Jul-20064 RNA Structural Complexity Deceptively simple, since bases are usually paired Stems are formed from two bonded strands, in an antiparallel orientation These simple bonds can however combine to form complex structures Some are nested (stems within loops) Some are knotted (stems effectively crossing) RNA molecules can be very long (eg. > 1000 bases), confounding exhaustive comparison techniques
5
6-Jul-20065 Arc Representation At a bond level, the bond structure of an RNA molecule can be represented as arcs superimposed onto the “stretched” RNA sequence. Each arc represents a bonded pair, and the structure is a set of pairs. Nested StructurePseudoknot
6
6-Jul-20066 Maximum Common Ordered Substructure Input: Structures S 1 and S 2, where each structure is a set of pairs over n 1 and n 2 positions (resp.) Output: max. substructure S c with n c positions, such that there exist 1-1 functions f 1 and f 2 where:
7
6-Jul-20067 General Structures are Hard The general MCOS problem, allowing positions to bond multiple times, is NP-hard (Goldman et al., 1999) Comparing two RNA (pair-bond) structures is polynomial if they do not have knots (Bafna et al., 1995) A structure S has a knot if and only if: there are pairs (i 1, j 1 ) and (i 2, j 2 ) in S where i 1 < i 2 < j 1 < j 2 ( [ ) ] Comparing knotted arc structures is NP-hard for arbitrary pair-bond structures (Evans 1999, and others)
8
6-Jul-20068 Comparing Knot-Free Structures If the two structures are composed only of nested bonds, they can be compared in O(n 4 ) time using a dynamic programming algorithm that computes: M[i 1, j 1, i 2, j 2 ] = max { M[i 1, j 1 -1, i 2, j 2 ], M[i 1, j 1, i 2, j 2 -1], M[i 1, k 1 -1, i 2, k 2 -1] + M[k 1 +1, j 1 -1, k 2 +1, j 2 -1] +1 if (k 1, j 1 ) is in S 1 and (k 2, j 2 ) is in S 2 } our answer is in M[0,|A|-1,0,|B|-1] (result: Bafna et al. 1995)
9
6-Jul-20069 Limited Context The polynomial time DP algorithm for nested bond structures works due to the context-free nature of segments in the nested structures. Knotted structures have segments that are not context-free, but we can limit the context that they need if we consider special cases that cover most known RNA structures.
10
6-Jul-200610 Pseudoknot Observations Three mutually crossing arcs generally do not occur in RNA structures (3-knot) A structure without 3-knots can be separated into 2 layers of non-crossing arcs (2-colourable)
11
6-Jul-200611 Pseudoknot Observations Crossing arcs tend to be grouped into crossing stems, though there can be some nesting Interleaving between left and right endpoints does not usually occur, and would be biochemically unstable
12
6-Jul-200612 Forming LSPs To take advantage of these restrictions, we will consider that bonds group into stems, and that a stem can break the RNA sequence into linked segment pairs (LSPs): a matched pair of segments that are, or may be, linked by bonds. ij hlij Segment LSP: an ordered segment pair
13
6-Jul-200613 Merging LSPs The key to the use of LSPs is our ability to merge them to construct a larger LSP, as shown. The restrictions allow us to consider only pairwise LSP merges – we can always fill at least one existing “hole” when we merge.
14
6-Jul-200614 Structure Pieces We can then consider two types of comparison cases, and build up our results from them: Segment-to-segment (4 dimensions) LSP-to-LSP (8 dimensions) We do not need to match LSPs to segments, as long as we allow both segments and LSPs to be broken into parts.
15
6-Jul-200615 Segment Cases Segment cases are based on the BMR95 algorithm. s1: value of matching segment (i 1, j 1 -1) to (i 2, j 2 ) s2: value of matching segment (i 1, j 1 ) to (i 2, j 2 -1) s3: if j 1 links to k 1 and j 2 links to k 2 : 1 + (value of matching segment (i 1, k 1 -1) to (i 2, k 2 -1)) + (value of matching segment (k 1 +1, j 1 -1) to (k 2 +1, j 2 -1))
16
6-Jul-200616 Creating an LSP While a matched arc can break a segment into two (as in case s3), it can also create an LSP, if we allow the segments to be linked. s4: 1+ (value of matching LSP (i 1, k 1 -1, k 1 +1, j 1 -1) to (i 2, k 2 -1, k 2 +1, j 2 -1))
17
6-Jul-200617 LSP Cases – Simple The first cases for matching LSPs are based on the segment matching: two paring and one split. a1: value of matching LSP (h 1,l 1,i 1, j 1 -1) to (h 2,l 2,i 2, j 2 ) a2: value of matching LSP (h 1,l 1,i 1, j 1 ) to (h 2,l 2,i 2, j 2 -1) a3: (value of matching segment (h 1, l 1 ) to (h 2, l 2 )) + (value of matching segment (i 1, j 1 ) to (i 2, j 2 )) Case a3 can be used with s4 to allow new LSPs to be made from right segments of matched LSPs.
18
6-Jul-200618 LSP Cases – Within Right If the arcs link to positions within the right side of the LSPs, then the segments within the arcs can be the right sides of new LSPs. a4: 1 + (value of matching LSP (h 1,l 1,k 1 +1, j 1 -1) to (h 2,l 2, k 2 +1, j 2 -1)) + (value of matching segment (i 1, k 1 -1) to (i 2, k 2 -1))
19
6-Jul-200619 LSP Cases – Within Right Alternatively, the arcs could bound segments that are within the structure of the right side of the LSPs. a5: 1 + (value of matching LSP (h 1, l 1, i 1, k 1 -1) to (h 2, l 2, i 2, k 2 -1)) + (value of matching segment (k 1 +1, j 1 -1) to (k 2 +1, j 2 -1))
20
6-Jul-200620 LSP Cases – Cross Left If the arcs cross to the left side of the LSPs, then their left endpoints (k) can form a hole to start new LSPs. a6: 1 + (value of matching LSP (h 1,k 1 -1, k 1 +1, l 1 ) to (h 2,k 2 -1, k 2 +1, l 2 )) + (value of matching segment (i 1, j 1 -1) to (i 2, j 2 -1))
21
6-Jul-200621 LSP Cases – Cross Left The arcs can instead separate the LSP within them from initial segments. a7: 1 + (value of matching LSP (k 1 +1,l 1,i 1, j 1 -1) to (k 2 +1,l 2,i 2, j 2 -1)) + (value of matching segment (h 1, k 1 -1) to (h 2, k 2 -1)) We do not try to link the first and third segments as they would form part of a 3-knot.
22
6-Jul-200622 LSP Cases – Cross Left Matched arcs can break the LSPs into three segments. a8: 1 + (value of matching segment (h 1, k 1 -1) to (h 2, k 2 -1)) + (value of matching segment (k 1 +1, l 1 ) to (k 2 +1, l 2 )) + (value of matching segment (i 1, j 1 -1) to (i 2, j 2 -1))
23
6-Jul-200623 LSP Cases – Crossed LSPs Arcs crossing existing LSPs could need a merging of the LSP types in a6 and a7 – but then we need to consider all places for the split to occur. a9: 1 + max [over all s 1,s 2 with k 1 <s 1 <l 1, k 2 <s 2 <l 2 ] (value of matching LSP (h 1,k 1 -1, s 1 +1,l 1 ) to (h 2,k 2 -1, s 2 +1,l 2 )) +(value of matching LSP (k 1 +1,s 1,i 1, j 1 -1) to (k 2 +1,s 2,i 2, j 2 -1))
24
6-Jul-200624 Dynamic Programming These cases take care of all possibilities for how LSPs and segments can be broken down, and their results merged. They can be turned straightforwardly into a dynamic programming algorithm that uses two tables (one for segments, one for LSPs) The algorithm will need to weave between these two tables in a way consistent with the data
25
6-Jul-200625 Making It Feasible This algorithm makes very heavy use of multidimensional dynamic programming tables, and looks more of theoretical interest than practical use. Time complexity is high at O(n 10 ) Space complexity is even more crucial at O(n 8 ) Careful implementation is needed to avoid these theoretical worst cases.
26
6-Jul-200626 Engineering Space and Time Space and time usage can be minimised by eliminating those computations that are not needed. The recurrence should be computed recursively (using memoisation) to enable the data to help this pruning Note that most segment pairs will not correspond to LSPs consistent with a given arc structure The table can be allocated dynamically, in layers, so that a hyperplane of the table is only allocated if it will contain an entry (and note h < l < i < j ) We can reduce this further by limiting hyperplane sizes to the corresponding segment within an arc
27
6-Jul-200627 Experiments Having reduced the space, experiments were run on a variety of RNA structural data to determine if the algorithm is of practical use Large Subunit ribosomal RNA structures RNAse P structures Mosaic Virus structures Structures of up to 400 arcs were compared effectively in 4Gb of space, with correct substructures found allocating about 10 -14 of the theoretical table Even the O(n 4 ) recurrence for unknotted structures would need too much space without the space saving technique
28
6-Jul-200628 Conclusion and Future Work Under these restrictions, RNA bond structures can be compared in polynomial time With careful case pruning, the algorithm is feasible and produces useful results The problem of comparing general 2-colourable bond structures (allowing endpoint interleaving) is still open Extensions to pattern discovery for multiple structures can be explored Weights can be added to model RNA more accurately
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.