Download presentation
Presentation is loading. Please wait.
1
Comparative RNA Structural Analysis
2
Overview Comparative RNA Structural Analysis
Method 1: Align, then fold Method 2: Fold, then compare
3
Overview Comparative RNA Structural Analysis
Method 1: Align, then fold Method 2: Fold, then compare
4
Comparative RNA Structural Analysis Problem Definition
Input: A set of sequences with assumed structural similarities. Output: Alignment, and common structural elements.
5
Possible approaches Homologous RNA sequences 1 Sequence alignment
Aligned Sequences Fold alignments Aligned Structures
6
Possible approaches Homologous RNA sequences 1 2 Fold Sequence
AUCCCCGUAUCGAUC CUCGGCGUAUCGGUC 1 2 Fold Sequences Sequence alignment Homologous RNA secondary Structures Aligned Sequences Structure Alignment Fold alignments Aligned Structures
7
Simultaneous Fold and Alignment
Possible approaches Homologous RNA sequences AUCCCCGUAUCGAUC CUCGGCGUAUCGGUC 1 3 2 Fold Sequences Sequence alignment Sankoff Simultaneous Fold and Alignment Homologous RNA secondary Structures Aligned Sequences Structure Alignment Fold alignments Aligned Structures
8
Align, then fold First step: multiple alignment
We want to use an algorithm we know to fold our aligned sequences. How can we modify Nussinov algorithm to fold multiple alignments? A C G T G G A G A A C G G A C C C T A A A G G G G A T A T A G C A A T T A T C C G G A T T A G T T C C G G A T T G G A C G A A T A G G G C T A A A T G C C A
9
Align, then fold We need a new scoring function
Scoring a base pair is different than scoring a pair of columns in our alignment. Using the new scoring function, we can apply Nussinov algorithm on the converted input (with slight changes).
10
Covariation Columns that βchange togetherβ construct a stem
A C G U G G A G A A C G G A C C C U A A A G G G G A U A U A G C A A U U A U C C G G A U U A G U U C C G G A U U G G A C G A A U A G G G C U A A A U G C C A
11
The Mixy algorithm For each column π in the alignment, define π π π₯ , π₯β π΄,π,πΆ,πΊ to be π₯βs frequency in column π. A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 2 3 π 2 πΆ =
12
The Mixy algorithm For each column π in the alignment, define π π π₯ , π₯β π΄,π,πΆ,πΊ to be π₯βs frequency in column π. For each π and π, define π π,π π₯,π¦ , π₯,π¦β{π΄,π,πΆ,πΊ} to be the frequency of π₯ in column π and π¦ column π on the same sequence. A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 2 3 π 2,9 πΆ,πΊ = π 2,9 π΄,πΊ =
13
The Mixy algorithm For each column π in the alignment, define π π π₯ , π₯β π΄,π,πΆ,πΊ to be π₯βs frequency in column π. For each π and π, define π π,π π₯,π¦ , π₯,π¦β{π΄,π,πΆ,πΊ} to be the frequency of π₯ in column π and π¦ column π on the same sequence. Clearly, if π₯ and π¦ are independent, π π,π π₯,π¦ π π π₯ β π π π¦ β1. A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 2 3 π 2,9 πΆ,πΊ = π 2,9 π΄,πΊ =
14
The Mixy algorithm Now, to measure mutual information between columns π and π weβll define: π» π,π = π₯,π¦ π π,π π₯,π¦ log 2 π π,π π₯,π¦ π π π₯ β π π π¦ A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G π 2,9 πΆ,πΊ log 2 π 2,9 πΆ,πΊ π 2 πΆ β π 9 πΊ + π 2,9 π΄,π log 2 π 2,9 π΄,π π 2 π΄ β π 9 π π» 2,9 = = 2 3 β log β β log β = 2 3 β log βlogβ‘(3)= 2 3 β β1.58=0.526
15
The Mixy algorithm Now, to measure mutual information between columns π and π weβll define: π» π,π = π₯,π¦ π π,π π₯,π¦ log 2 π π,π π₯,π¦ π π π₯ β π π π¦ π 1,10 π΄,πΊ log 2 π 1,10 π΄,πΊ π 1 π΄ β π 10 πΊ = 3 3 β log β =1β0=0 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G π» 1,10 =
16
The Mixy algorithm Now, to measure mutual information between columns π and π weβll define: π» π,π = π₯,π¦ π π,π π₯,π¦ log 2 π π,π π₯,π¦ π π π₯ β π π π¦ π 3,7 πΊ,π΄ log 2 π 3,7 πΊ,π΄ π 3 πΊ β π 7 π΄ + π 3,7 πΆ,πΊ log 2 π 3,7 πΆ,πΊ π 3 πΆ β π 7 πΊ + π 3,7 π,π log 2 π 3,7 π,π π 3 π β π 7 π + π 3,7 π΄,πΆ log 2 π 3,7 π΄,πΆ π 3 π΄ β π 7 πΆ = =4β 1 4 β log β =1βπππ 4 =2 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G A A A A G U C U U G π» 3,7 =
17
The Mixy algorithm 0β€ π» π,π β€2
Now, to measure mutual information between columns π and π weβll define: π» π,π = π₯,π¦ π π,π π₯,π¦ log 2 π π,π π₯,π¦ π π π₯ β π π π¦ 0β€ π» π,π β€2 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G A A A A G U C U U G Higher value means that columns π and π are correlated Lower value means that columns π and π are not correlated
18
Overview Comparative RNA Structural Analysis
Method 1: Align, then fold Method 2: Fold, then compare
19
Ordered rooted tree representation
Shapiro, 1988: nodes - elements of secondary structure (hairpin loop, bulge, internal loop or multi-loop). edges - base-paired (stem) regions.
20
Ordered rooted tree representation
Shapiro, 1988: nodes - elements of secondary structure (hairpin loop, bulge, internal loop or multi-loop). edges - base-paired (stem) regions. Zhang, 1998: nodes - unpaired bases (leaves) or paired bases (internal nodes). Each node is labeled with a base or a pair of bases. edges - connecting consecutive stem base-pairs or a leaf base with the last base-pair in the corresponding stem.
21
Problem definition The subtree isomorphism problem [Matula, 1968,1978]: Given a pattern tree P and a text tree T, find a subtree of T which is isomorphic to P, In other words: find if some subtree of T is identical in structure to P The subtree homeomorphism problem [Chung, 1987, Reyner, 1977, Pinter et al., 2004]: Similar to isomorphism problem, where degree-2 nodes can be deleted from the text tree.
22
Subtree homeomorphism problem
Let P and π be two ordered, rooted trees. Let π‘ be a subtree of π, rooted at node π£βπ A mapping πΌ: P β t is a one-to-one matching of a node of P to a node of π‘. The mapping must preserve the ancestor relations of the nodes and their relative order. The subtree homeomorphism score of a mapping, denoted S(πΌ,v), is: S(πΌ,v) node-to-node similarity score function π’βπ, π£βπ‘ edge-to-edge similarity score function euοP, evοt The penalty of deleting a degree-2-node from T The penalty for deleting any other node in T
23
Subtree homeomorphism problem
Given P and π, we want to find a subtree π‘ in T such that the score S(πΌ,v) is maximal How can we do that? Ho can we solve this problem efficiently? Dynamic programming!
24
Subtree homeomorphism problem
Isomorphism Homeoomorphism
25
Rooted Ordered Subtree Isomorphism
Given trees π and π, and the scoring table below, compute Labeled Ordered Rooted Subtree Isomorphism of π and π. No deletions are allowed from π Only deletions of complete subtrees from π are allowed, with penalty = 0 π π b e f c d a cβ fβ a' e' b' d' g' hβ 4 1 3
26
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b e f c d a Rows are post ordered π nodes Columns are post ordered π nodes 4 1 3
27
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b e f c d a π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
28
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c d a π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
29
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c d a π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
30
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c d a βπππβπ‘ π >βπππβπ‘( π β² ) π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
31
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a βπππβπ‘ π >βπππβπ‘( π β² ) π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
32
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
33
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
34
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a Small DP table eβ fβ gβ e f 4 1 3
35
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a Small DP table eβ fβ gβ e f 4 1 3
36
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a Small DP table eβ fβ gβ e f 4 1 3 ββ
37
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a Small DP table eβ fβ gβ e f 4 1 3 ββ
38
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a Small DP table eβ fβ gβ e f π 4 1 3 ββ
39
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a Small DP table eβ fβ gβ e f π 4 1 3 ββ
40
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a Small DP table eβ fβ gβ e f π 4 1 3 1 ββ
41
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a Small DP table eβ fβ gβ e f π π π 4 1 3 1 4 ββ 4 1 1
42
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a Small DP table eβ fβ gβ e f π π π 4 1 3 1 4 ββ 4 1 1
43
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ d a Small DP table eβ fβ gβ e f π π π 4 1 3 π π, π β² =3+5=8 1 4 ββ 4 1 1
44
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a Small DP table eβ fβ gβ e f π π π 4 1 3 π π, π β² =3+5=8 ββ 1 4 4 1 1
45
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
46
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a Small DP table hβ e f π 4 1 3 4 ββ 1
47
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a Small DP table hβ e f π 4 1 3 4 ββ 1
48
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
49
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a ππππ‘β π >ππππ‘β(πβ²) π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
50
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a ππππ‘β π >ππππ‘β(πβ²) π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
51
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a βπππβπ‘ π >βπππβπ‘( π β² ) π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
52
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a βπππβπ‘ π >βπππβπ‘( π β² ) π π’,π£ = ππ π’ πππ π£ πππ ππππ£ππ ππ‘βπππ€ππ π 4 1 3
53
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a Small DP table bβ cβ dβ b c d 4 1 3
54
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a Small DP table bβ cβ dβ b c d 4 1 3 4 4 3 ββ 8 ββ 3 3 4
55
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a 20 Small DP table bβ cβ dβ b c d 4 1 3 4 4 3 π π, π β² =4+16=20 ββ 8 ββ 3 3 4
56
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a 20 Where is the solution? 4 1 3
57
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a 20 4 1 3
58
Rooted Ordered Subtree Isomorphism
f c d a π π cβ fβ a' e' b' d' g' hβ bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a 20 eβ fβ gβ e f bβ cβ dβ b c d π 4 1 3 4 4 3 1 4 ββ 8 ββ 4 1 1 3 3 4
59
Running time complexity
bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a 20 If π has m nodes and π has π node There are ππ cells in the large DP table In the worst case β for each cell we will compute a small DP table with ππ cells Resulting in π( π 2 π 2 ) running time Is there a tighter bound? π π b e f c d a cβ fβ a' e' b' d' g' hβ
60
Running time complexity
bβ eβ fβ gβ cβ hβ dβ a' b 4 1 3 e f c ββ 8 d a 20 If π has m nodes and π has π node Each node π’ in π will be in a small DP table only when its father is compared to a node in π A father of a node in P is compared at most π times βΉπ(ππ) Symmetrically, for a node π£ in T Overall: π ππ+ππ+ππ =π(ππ) π π b e f c d a cβ fβ a' e' b' d' g' hβ Large DP Small DP for a node in P Small DP for a node in T
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.