Comparative RNA Structural Analysis
Overview Comparative RNA Structural Analysis Method 1: Align, then fold Method 2: Fold, then compare
Overview Comparative RNA Structural Analysis Method 1: Align, then fold Method 2: Fold, then compare
Comparative RNA Structural Analysis Problem Definition Input: A set of sequences with assumed structural similarities. Output: Alignment, and common structural elements.
Possible approaches Homologous RNA sequences 1 Sequence alignment Aligned Sequences Fold alignments Aligned Structures
Possible approaches Homologous RNA sequences 1 2 Fold Sequence AUCCCCGUAUCGAUC CUCGGCGUAUCGGUC 1 2 Fold Sequences Sequence alignment Homologous RNA secondary Structures Aligned Sequences Structure Alignment Fold alignments Aligned Structures
Simultaneous Fold and Alignment Possible approaches Homologous RNA sequences AUCCCCGUAUCGAUC CUCGGCGUAUCGGUC 1 3 2 Fold Sequences Sequence alignment Sankoff Simultaneous Fold and Alignment Homologous RNA secondary Structures Aligned Sequences Structure Alignment Fold alignments Aligned Structures
Align, then fold First step: multiple alignment We want to use an algorithm we know to fold our aligned sequences. How can we modify Nussinov algorithm to fold multiple alignments? A C G T G G A G A A C G G A C C C T A A A G G G G A T A T A G C A A T T A T C C G G A T T A G T T C C G G A T T G G A C G A A T A G G G C T A A A T G C C A
Align, then fold We need a new scoring function Scoring a base pair is different than scoring a pair of columns in our alignment. Using the new scoring function, we can apply Nussinov algorithm on the converted input (with slight changes).
Covariation Columns that “change together” construct a stem A C G U G G A G A A C G G A C C C U A A A G G G G A U A U A G C A A U U A U C C G G A U U A G U U C C G G A U U G G A C G A A U A G G G C U A A A U G C C A
The Mixy algorithm For each column 𝑖 in the alignment, define 𝑓 𝑖 𝑥 , 𝑥∈ 𝐴,𝑈,𝐶,𝐺 to be 𝑥’s frequency in column 𝑖. 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 2 3 𝑓 2 𝐶 =
The Mixy algorithm For each column 𝑖 in the alignment, define 𝑓 𝑖 𝑥 , 𝑥∈ 𝐴,𝑈,𝐶,𝐺 to be 𝑥’s frequency in column 𝑖. For each 𝑖 and 𝑗, define 𝑓 𝑖,𝑗 𝑥,𝑦 , 𝑥,𝑦∈{𝐴,𝑈,𝐶,𝐺} to be the frequency of 𝑥 in column 𝑖 and 𝑦 column 𝑗 on the same sequence. 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 2 3 𝑓 2,9 𝐶,𝐺 = 𝑓 2,9 𝐴,𝐺 =
The Mixy algorithm For each column 𝑖 in the alignment, define 𝑓 𝑖 𝑥 , 𝑥∈ 𝐴,𝑈,𝐶,𝐺 to be 𝑥’s frequency in column 𝑖. For each 𝑖 and 𝑗, define 𝑓 𝑖,𝑗 𝑥,𝑦 , 𝑥,𝑦∈{𝐴,𝑈,𝐶,𝐺} to be the frequency of 𝑥 in column 𝑖 and 𝑦 column 𝑗 on the same sequence. Clearly, if 𝑥 and 𝑦 are independent, 𝑓 𝑖,𝑗 𝑥,𝑦 𝑓 𝑖 𝑥 ∗ 𝑓 𝑗 𝑦 ≈1. 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 2 3 𝑓 2,9 𝐶,𝐺 = 𝑓 2,9 𝐴,𝐺 =
The Mixy algorithm Now, to measure mutual information between columns 𝑖 and 𝑗 we’ll define: 𝐻 𝑖,𝑗 = 𝑥,𝑦 𝑓 𝑖,𝑗 𝑥,𝑦 log 2 𝑓 𝑖,𝑗 𝑥,𝑦 𝑓 𝑖 𝑥 ∗ 𝑓 𝑗 𝑦 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 𝑓 2,9 𝐶,𝐺 log 2 𝑓 2,9 𝐶,𝐺 𝑓 2 𝐶 ∗ 𝑓 9 𝐺 + 𝑓 2,9 𝐴,𝑈 log 2 𝑓 2,9 𝐴,𝑈 𝑓 2 𝐴 ∗ 𝑓 9 𝑈 𝐻 2,9 = = 2 3 ∗ log 2 3 2 3 ∗ 2 3 + 1 3 ∗ log 1 3 1 3 ∗ 1 3 = 2 3 ∗ log 1.5 + 1 3 ∗log(3)= 2 3 ∗0.58+ 1 3 ∗1.58=0.526
The Mixy algorithm Now, to measure mutual information between columns 𝑖 and 𝑗 we’ll define: 𝐻 𝑖,𝑗 = 𝑥,𝑦 𝑓 𝑖,𝑗 𝑥,𝑦 log 2 𝑓 𝑖,𝑗 𝑥,𝑦 𝑓 𝑖 𝑥 ∗ 𝑓 𝑗 𝑦 𝑓 1,10 𝐴,𝐺 log 2 𝑓 1,10 𝐴,𝐺 𝑓 1 𝐴 ∗ 𝑓 10 𝐺 = 3 3 ∗ log 3 3 3 3 ∗ 3 3 =1∗0=0 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 𝐻 1,10 =
The Mixy algorithm Now, to measure mutual information between columns 𝑖 and 𝑗 we’ll define: 𝐻 𝑖,𝑗 = 𝑥,𝑦 𝑓 𝑖,𝑗 𝑥,𝑦 log 2 𝑓 𝑖,𝑗 𝑥,𝑦 𝑓 𝑖 𝑥 ∗ 𝑓 𝑗 𝑦 𝑓 3,7 𝐺,𝐴 log 2 𝑓 3,7 𝐺,𝐴 𝑓 3 𝐺 ∗ 𝑓 7 𝐴 + 𝑓 3,7 𝐶,𝐺 log 2 𝑓 3,7 𝐶,𝐺 𝑓 3 𝐶 ∗ 𝑓 7 𝐺 + 𝑓 3,7 𝑈,𝑈 log 2 𝑓 3,7 𝑈,𝑈 𝑓 3 𝑈 ∗ 𝑓 7 𝑈 + 𝑓 3,7 𝐴,𝐶 log 2 𝑓 3,7 𝐴,𝐶 𝑓 3 𝐴 ∗ 𝑓 7 𝐶 = =4∗ 1 4 ∗ log 1 4 1 4 ∗ 1 4 =1∗𝑙𝑜𝑔 4 =2 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G A A A A G U C U U G 𝐻 3,7 =
The Mixy algorithm 0≤ 𝐻 𝑖,𝑗 ≤2 Now, to measure mutual information between columns 𝑖 and 𝑗 we’ll define: 𝐻 𝑖,𝑗 = 𝑥,𝑦 𝑓 𝑖,𝑗 𝑥,𝑦 log 2 𝑓 𝑖,𝑗 𝑥,𝑦 𝑓 𝑖 𝑥 ∗ 𝑓 𝑗 𝑦 0≤ 𝐻 𝑖,𝑗 ≤2 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G A A A A G U C U U G Higher value means that columns 𝑖 and 𝑗 are correlated Lower value means that columns 𝑖 and 𝑗 are not correlated
Overview Comparative RNA Structural Analysis Method 1: Align, then fold Method 2: Fold, then compare
Ordered rooted tree representation Shapiro, 1988: nodes - elements of secondary structure (hairpin loop, bulge, internal loop or multi-loop). edges - base-paired (stem) regions.
Ordered rooted tree representation Shapiro, 1988: nodes - elements of secondary structure (hairpin loop, bulge, internal loop or multi-loop). edges - base-paired (stem) regions. Zhang, 1998: nodes - unpaired bases (leaves) or paired bases (internal nodes). Each node is labeled with a base or a pair of bases. edges - connecting consecutive stem base-pairs or a leaf base with the last base-pair in the corresponding stem.
Problem definition The subtree isomorphism problem [Matula, 1968,1978]: Given a pattern tree P and a text tree T, find a subtree of T which is isomorphic to P, In other words: find if some subtree of T is identical in structure to P The subtree homeomorphism problem [Chung, 1987, Reyner, 1977, Pinter et al., 2004]: Similar to isomorphism problem, where degree-2 nodes can be deleted from the text tree.
Subtree homeomorphism problem Let P and 𝑇 be two ordered, rooted trees. Let 𝑡 be a subtree of 𝑇, rooted at node 𝑣∈𝑇 A mapping 𝛼: P → t is a one-to-one matching of a node of P to a node of 𝑡. The mapping must preserve the ancestor relations of the nodes and their relative order. The subtree homeomorphism score of a mapping, denoted S(𝛼,v), is: S(𝛼,v) node-to-node similarity score function 𝑢∈𝑃, 𝑣∈𝑡 edge-to-edge similarity score function euP, evt The penalty of deleting a degree-2-node from T The penalty for deleting any other node in T
Subtree homeomorphism problem Given P and 𝑇, we want to find a subtree 𝑡 in T such that the score S(𝛼,v) is maximal How can we do that? Ho can we solve this problem efficiently? Dynamic programming!
Subtree homeomorphism problem Isomorphism Homeoomorphism
Rooted Ordered Subtree Isomorphism Given trees 𝑃 and 𝑇, and the scoring table below, compute Labeled Ordered Rooted Subtree Isomorphism of 𝑃 and 𝑇. No deletions are allowed from 𝑃 Only deletions of complete subtrees from 𝑇 are allowed, with penalty = 0 𝑃 𝑇 b e f c d a c’ f’ a' e' b' d' g' h’ 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b e f c d a Rows are post ordered 𝑃 nodes Columns are post ordered 𝑇 nodes 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b e f c d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c d a ℎ𝑒𝑖𝑔ℎ𝑡 𝑐 >ℎ𝑒𝑖𝑔ℎ𝑡( 𝑏 ′ ) 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a ℎ𝑒𝑖𝑔ℎ𝑡 𝑐 >ℎ𝑒𝑖𝑔ℎ𝑡( 𝑏 ′ ) 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 4 1 3 −∞
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 4 1 3 −∞
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 𝟎 4 1 3 −∞
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 𝟎 4 1 3 −∞
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 𝟎 4 1 3 1 −∞
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 𝟎 𝟎 𝟎 4 1 3 1 4 −∞ 4 1 1
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 𝟎 𝟎 𝟎 4 1 3 1 4 −∞ 4 1 1
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 𝟎 𝟎 𝟎 4 1 3 𝑆 𝑐, 𝑐 ′ =3+5=8 1 4 −∞ 4 1 1
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a Small DP table e‘ f‘ g‘ e f 𝟎 𝟎 𝟎 4 1 3 𝑆 𝑐, 𝑐 ′ =3+5=8 −∞ 1 4 4 1 1
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a Small DP table h‘ e f 𝟎 4 1 3 4 −∞ 1
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a Small DP table h‘ e f 𝟎 4 1 3 4 −∞ 1
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 𝑑𝑒𝑝𝑡ℎ 𝑐 >𝑑𝑒𝑝𝑡ℎ(𝑎′) 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 𝑑𝑒𝑝𝑡ℎ 𝑐 >𝑑𝑒𝑝𝑡ℎ(𝑎′) 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a ℎ𝑒𝑖𝑔ℎ𝑡 𝑎 >ℎ𝑒𝑖𝑔ℎ𝑡( 𝑐 ′ ) 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a ℎ𝑒𝑖𝑔ℎ𝑡 𝑎 >ℎ𝑒𝑖𝑔ℎ𝑡( 𝑐 ′ ) 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a Small DP table b‘ c‘ d‘ b c d 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a Small DP table b‘ c‘ d‘ b c d 4 1 3 4 4 3 −∞ 8 −∞ 3 3 4
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 20 Small DP table b‘ c‘ d‘ b c d 4 1 3 4 4 3 𝑆 𝑐, 𝑐 ′ =4+16=20 −∞ 8 −∞ 3 3 4
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 20 Where is the solution? 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 20 4 1 3
Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 20 e‘ f‘ g‘ e f b‘ c‘ d‘ b c d 𝟎 4 1 3 4 4 3 1 4 −∞ 8 −∞ 4 1 1 3 3 4
Running time complexity b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 20 If 𝑃 has m nodes and 𝑇 has 𝑛 node There are 𝑛𝑚 cells in the large DP table In the worst case – for each cell we will compute a small DP table with 𝑚𝑛 cells Resulting in 𝑂( 𝑛 2 𝑚 2 ) running time Is there a tighter bound? 𝑃 𝑇 b e f c d a c’ f’ a' e' b' d' g' h’
Running time complexity b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 20 If 𝑃 has m nodes and 𝑇 has 𝑛 node Each node 𝑢 in 𝑃 will be in a small DP table only when its father is compared to a node in 𝑇 A father of a node in P is compared at most 𝑛 times ⟹𝑂(𝑚𝑛) Symmetrically, for a node 𝑣 in T Overall: 𝑂 𝑚𝑛+𝑚𝑛+𝑚𝑛 =𝑂(𝑚𝑛) 𝑃 𝑇 b e f c d a c’ f’ a' e' b' d' g' h’ Large DP Small DP for a node in P Small DP for a node in T