Comparative RNA Structural Analysis

Slides:

Advertisements

Similar presentations

RNA Secondary Structure Prediction

Advertisements

Comments We consider in this topic a large class of related problems that deal with proximity of points in the plane. We will: 1.Define some proximity.

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna

. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Chapter 7 Dynamic Programming.

15-853Page : Algorithms in the Real World Suffix Trees.

Rapid Global Alignments How to align genomic sequences in (more or less) linear time.

Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.

Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.

Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.

RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.

. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.

Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.

Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.

Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.

2 -1 Analysis of algorithms Best case: easiest Worst case Average case: hardest.

Lower and Upper Bounds on Obtaining History Independence Niv Buchbinder and Erez Petrank Technion, Israel.

Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.

Realistic evolutionary models Marjolijn Elsinga & Lars Hemel.

7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.

Multiple sequence alignment

Structural Alignment of Pseudoknotted RNAs Banu Dost, Buhm Han, Shaojie Zhang, Vineet Bafna.

. Clarifications and Corrections. 2 The ‘star’ algorithm (tutorial #3 slide 13) can be implemented with the following modification: Instead of step (a)

1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.

Phylogenetic Tree Construction and Related Problems Bioinformatics.

Important Problem Types and Fundamental Data Structures

RNA-Seq and RNA Structure Prediction

Tractable Symmetry Breaking Using Restricted Search Trees Colva M. Roney-Dougal, Ian P. Gent, Tom Kelsey, Steve Linton Presented by: Shant Karakashian.

1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen

CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina

RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

“On an Algorithm of Zemlyachenko for Subtree Isomorphism” Yefim Dinitz, Alon Itai, Michael Rodeh (1998) Presented by: Masha Igra, Merav Bukra.

Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.

Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.

Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.

Structural Alignment of Pseudo-knotted RNA

V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.

V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.

MotivationLocating the k largest subsequences: Main ideasResults Problem definitions Problem instance ( k=5 ) Bibliography

Local Exact Pattern Matching for Non-fixed RNA Structures Mika Amit, Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Mohl, Christina Schmiedl, Sebastian.

RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.

Computing smallest and largest repetition factorization in O(n log n) time Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai,

Decision Trees DEFINITION: DECISION TREE A decision tree is a tree in which the internal nodes represent actions, the arcs represent outcomes of an action,

Sorting Lower Bound 4/25/2018 8:49 PM

Succinct Data Structures

RNA sequence-structure alignment

The minimum cost flow problem

Taku Aratsu1, Kouichi Hirata1 and Tetsuji Kuboyama2

Character-Based Phylogeny Reconstruction

Orthogonal Range Searching and Kd-Trees

RNA Secondary Structure Prediction

(2,4) Trees 11/15/2018 9:25 AM Sorting Lower Bound Sorting Lower Bound.

Lecture 3: Genome Rearrangements and Duplications

Graphs, Trees and Algorithms 1

(2,4) Trees 12/4/2018 1:20 PM Sorting Lower Bound Sorting Lower Bound.

Intro to Alignment Algorithms: Global and Local

Subtree Isomorphism in O(n2.5)

CS 581 Tandy Warnow.

Sungho Kang Yonsei University

CSE 589 Applied Algorithms Spring 1999

(2,4) Trees 2/28/2019 3:21 AM Sorting Lower Bound Sorting Lower Bound.

Copyright © Cengage Learning. All rights reserved.

Dynamic Programming II DP over Intervals

Lecture 6 Dynamic Programming

Chapter 8: Overview Comparison sorts: algorithms that sort sequences by comparing the value of elements Prove that the number of comparison required to.

Important Problem Types and Fundamental Data Structures

Computational Genomics of Noncoding RNA Genes

Presentation transcript:

Comparative RNA Structural Analysis

Overview Comparative RNA Structural Analysis Method 1: Align, then fold Method 2: Fold, then compare

Overview Comparative RNA Structural Analysis Method 1: Align, then fold Method 2: Fold, then compare

Comparative RNA Structural Analysis Problem Definition Input: A set of sequences with assumed structural similarities. Output: Alignment, and common structural elements.

Possible approaches Homologous RNA sequences 1 Sequence alignment Aligned Sequences Fold alignments Aligned Structures

Possible approaches Homologous RNA sequences 1 2 Fold Sequence AUCCCCGUAUCGAUC CUCGGCGUAUCGGUC 1 2 Fold Sequences Sequence alignment Homologous RNA secondary Structures Aligned Sequences Structure Alignment Fold alignments Aligned Structures

Simultaneous Fold and Alignment Possible approaches Homologous RNA sequences AUCCCCGUAUCGAUC CUCGGCGUAUCGGUC 1 3 2 Fold Sequences Sequence alignment Sankoff Simultaneous Fold and Alignment Homologous RNA secondary Structures Aligned Sequences Structure Alignment Fold alignments Aligned Structures

Align, then fold First step: multiple alignment We want to use an algorithm we know to fold our aligned sequences. How can we modify Nussinov algorithm to fold multiple alignments? A C G T G G A G A A C G G A C C C T A A A G G G G A T A T A G C A A T T A T C C G G A T T A G T T C C G G A T T G G A C G A A T A G G G C T A A A T G C C A

Align, then fold We need a new scoring function Scoring a base pair is different than scoring a pair of columns in our alignment. Using the new scoring function, we can apply Nussinov algorithm on the converted input (with slight changes).

Covariation Columns that “change together” construct a stem A C G U G G A G A A C G G A C C C U A A A G G G G A U A U A G C A A U U A U C C G G A U U A G U U C C G G A U U G G A C G A A U A G G G C U A A A U G C C A

The Mixy algorithm For each column 𝑖 in the alignment, define 𝑓 𝑖 𝑥 , 𝑥∈ 𝐴,𝑈,𝐶,𝐺 to be 𝑥’s frequency in column 𝑖. 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 2 3 𝑓 2 𝐶 =

The Mixy algorithm For each column 𝑖 in the alignment, define 𝑓 𝑖 𝑥 , 𝑥∈ 𝐴,𝑈,𝐶,𝐺 to be 𝑥’s frequency in column 𝑖. For each 𝑖 and 𝑗, define 𝑓 𝑖,𝑗 𝑥,𝑦 , 𝑥,𝑦∈{𝐴,𝑈,𝐶,𝐺} to be the frequency of 𝑥 in column 𝑖 and 𝑦 column 𝑗 on the same sequence. 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 2 3 𝑓 2,9 𝐶,𝐺 = 𝑓 2,9 𝐴,𝐺 =

The Mixy algorithm For each column 𝑖 in the alignment, define 𝑓 𝑖 𝑥 , 𝑥∈ 𝐴,𝑈,𝐶,𝐺 to be 𝑥’s frequency in column 𝑖. For each 𝑖 and 𝑗, define 𝑓 𝑖,𝑗 𝑥,𝑦 , 𝑥,𝑦∈{𝐴,𝑈,𝐶,𝐺} to be the frequency of 𝑥 in column 𝑖 and 𝑦 column 𝑗 on the same sequence. Clearly, if 𝑥 and 𝑦 are independent, 𝑓 𝑖,𝑗 𝑥,𝑦 𝑓 𝑖 𝑥 ∗ 𝑓 𝑗 𝑦 ≈1. 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 2 3 𝑓 2,9 𝐶,𝐺 = 𝑓 2,9 𝐴,𝐺 =

The Mixy algorithm Now, to measure mutual information between columns 𝑖 and 𝑗 we’ll define: 𝐻 𝑖,𝑗 = 𝑥,𝑦 𝑓 𝑖,𝑗 𝑥,𝑦 log 2 𝑓 𝑖,𝑗 𝑥,𝑦 𝑓 𝑖 𝑥 ∗ 𝑓 𝑗 𝑦 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 𝑓 2,9 𝐶,𝐺 log 2 𝑓 2,9 𝐶,𝐺 𝑓 2 𝐶 ∗ 𝑓 9 𝐺 + 𝑓 2,9 𝐴,𝑈 log 2 𝑓 2,9 𝐴,𝑈 𝑓 2 𝐴 ∗ 𝑓 9 𝑈 𝐻 2,9 = = 2 3 ∗ log 2 3 2 3 ∗ 2 3 + 1 3 ∗ log 1 3 1 3 ∗ 1 3 = 2 3 ∗ log 1.5 + 1 3 ∗log⁡(3)= 2 3 ∗0.58+ 1 3 ∗1.58=0.526

The Mixy algorithm Now, to measure mutual information between columns 𝑖 and 𝑗 we’ll define: 𝐻 𝑖,𝑗 = 𝑥,𝑦 𝑓 𝑖,𝑗 𝑥,𝑦 log 2 𝑓 𝑖,𝑗 𝑥,𝑦 𝑓 𝑖 𝑥 ∗ 𝑓 𝑗 𝑦 𝑓 1,10 𝐴,𝐺 log 2 𝑓 1,10 𝐴,𝐺 𝑓 1 𝐴 ∗ 𝑓 10 𝐺 = 3 3 ∗ log 3 3 3 3 ∗ 3 3 =1∗0=0 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G 𝐻 1,10 =

The Mixy algorithm Now, to measure mutual information between columns 𝑖 and 𝑗 we’ll define: 𝐻 𝑖,𝑗 = 𝑥,𝑦 𝑓 𝑖,𝑗 𝑥,𝑦 log 2 𝑓 𝑖,𝑗 𝑥,𝑦 𝑓 𝑖 𝑥 ∗ 𝑓 𝑗 𝑦 𝑓 3,7 𝐺,𝐴 log 2 𝑓 3,7 𝐺,𝐴 𝑓 3 𝐺 ∗ 𝑓 7 𝐴 + 𝑓 3,7 𝐶,𝐺 log 2 𝑓 3,7 𝐶,𝐺 𝑓 3 𝐶 ∗ 𝑓 7 𝐺 + 𝑓 3,7 𝑈,𝑈 log 2 𝑓 3,7 𝑈,𝑈 𝑓 3 𝑈 ∗ 𝑓 7 𝑈 + 𝑓 3,7 𝐴,𝐶 log 2 𝑓 3,7 𝐴,𝐶 𝑓 3 𝐴 ∗ 𝑓 7 𝐶 = =4∗ 1 4 ∗ log 1 4 1 4 ∗ 1 4 =1∗𝑙𝑜𝑔 4 =2 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G A A A A G U C U U G 𝐻 3,7 =

The Mixy algorithm 0≤ 𝐻 𝑖,𝑗 ≤2 Now, to measure mutual information between columns 𝑖 and 𝑗 we’ll define: 𝐻 𝑖,𝑗 = 𝑥,𝑦 𝑓 𝑖,𝑗 𝑥,𝑦 log 2 𝑓 𝑖,𝑗 𝑥,𝑦 𝑓 𝑖 𝑥 ∗ 𝑓 𝑗 𝑦 0≤ 𝐻 𝑖,𝑗 ≤2 1 2 3 4 5 6 7 8 9 10 A C G U G A A C G G A C C C U G G G G G A A U A G U U A U G A A A A G U C U U G Higher value means that columns 𝑖 and 𝑗 are correlated Lower value means that columns 𝑖 and 𝑗 are not correlated

Overview Comparative RNA Structural Analysis Method 1: Align, then fold Method 2: Fold, then compare

Ordered rooted tree representation Shapiro, 1988: nodes - elements of secondary structure (hairpin loop, bulge, internal loop or multi-loop). edges - base-paired (stem) regions.

Ordered rooted tree representation Shapiro, 1988: nodes - elements of secondary structure (hairpin loop, bulge, internal loop or multi-loop). edges - base-paired (stem) regions. Zhang, 1998: nodes - unpaired bases (leaves) or paired bases (internal nodes). Each node is labeled with a base or a pair of bases. edges - connecting consecutive stem base-pairs or a leaf base with the last base-pair in the corresponding stem.

Problem definition The subtree isomorphism problem [Matula, 1968,1978]: Given a pattern tree P and a text tree T, find a subtree of T which is isomorphic to P, In other words: find if some subtree of T is identical in structure to P The subtree homeomorphism problem [Chung, 1987, Reyner, 1977, Pinter et al., 2004]: Similar to isomorphism problem, where degree-2 nodes can be deleted from the text tree.

Subtree homeomorphism problem Let P and 𝑇 be two ordered, rooted trees. Let 𝑡 be a subtree of 𝑇, rooted at node 𝑣∈𝑇 A mapping 𝛼: P → t is a one-to-one matching of a node of P to a node of 𝑡. The mapping must preserve the ancestor relations of the nodes and their relative order. The subtree homeomorphism score of a mapping, denoted S(𝛼,v), is: S(𝛼,v) node-to-node similarity score function 𝑢∈𝑃, 𝑣∈𝑡 edge-to-edge similarity score function euP, evt The penalty of deleting a degree-2-node from T The penalty for deleting any other node in T

Subtree homeomorphism problem Given P and 𝑇, we want to find a subtree 𝑡 in T such that the score S(𝛼,v) is maximal How can we do that? Ho can we solve this problem efficiently? Dynamic programming!

Subtree homeomorphism problem Isomorphism Homeoomorphism

Rooted Ordered Subtree Isomorphism Given trees 𝑃 and 𝑇, and the scoring table below, compute Labeled Ordered Rooted Subtree Isomorphism of 𝑃 and 𝑇. No deletions are allowed from 𝑃 Only deletions of complete subtrees from 𝑇 are allowed, with penalty = 0 𝑃 𝑇 b e f c d a c’ f’ a' e' b' d' g' h’ 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b e f c d a Rows are post ordered 𝑃 nodes Columns are post ordered 𝑇 nodes 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b e f c d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c d a ℎ𝑒𝑖𝑔ℎ𝑡 𝑐 >ℎ𝑒𝑖𝑔ℎ𝑡( 𝑏 ′ ) 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a ℎ𝑒𝑖𝑔ℎ𝑡 𝑐 >ℎ𝑒𝑖𝑔ℎ𝑡( 𝑏 ′ ) 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 4 1 3 −∞

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 4 1 3 −∞

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 𝟎 4 1 3 −∞

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 𝟎 4 1 3 −∞

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 𝟎 4 1 3 1 −∞

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 𝟎 𝟎 𝟎 4 1 3 1 4 −∞ 4 1 1

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 𝟎 𝟎 𝟎 4 1 3 1 4 −∞ 4 1 1

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ d a Small DP table e‘ f‘ g‘ e f 𝟎 𝟎 𝟎 4 1 3 𝑆 𝑐, 𝑐 ′ =3+5=8 1 4 −∞ 4 1 1

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a Small DP table e‘ f‘ g‘ e f 𝟎 𝟎 𝟎 4 1 3 𝑆 𝑐, 𝑐 ′ =3+5=8 −∞ 1 4 4 1 1

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a Small DP table h‘ e f 𝟎 4 1 3 4 −∞ 1

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a Small DP table h‘ e f 𝟎 4 1 3 4 −∞ 1

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 𝑑𝑒𝑝𝑡ℎ 𝑐 >𝑑𝑒𝑝𝑡ℎ(𝑎′) 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 𝑑𝑒𝑝𝑡ℎ 𝑐 >𝑑𝑒𝑝𝑡ℎ(𝑎′) 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a ℎ𝑒𝑖𝑔ℎ𝑡 𝑎 >ℎ𝑒𝑖𝑔ℎ𝑡( 𝑐 ′ ) 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a ℎ𝑒𝑖𝑔ℎ𝑡 𝑎 >ℎ𝑒𝑖𝑔ℎ𝑡( 𝑐 ′ ) 𝑆 𝑢,𝑣 = 𝑖𝑓 𝑢 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑙𝑒𝑎𝑣𝑒𝑠 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a Small DP table b‘ c‘ d‘ b c d 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a Small DP table b‘ c‘ d‘ b c d 4 1 3 4 4 3 −∞ 8 −∞ 3 3 4

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 20 Small DP table b‘ c‘ d‘ b c d 4 1 3 4 4 3 𝑆 𝑐, 𝑐 ′ =4+16=20 −∞ 8 −∞ 3 3 4

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 20 Where is the solution? 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 20 4 1 3

Rooted Ordered Subtree Isomorphism f c d a 𝑃 𝑇 c’ f’ a' e' b' d' g' h’ b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 20 e‘ f‘ g‘ e f b‘ c‘ d‘ b c d 𝟎 4 1 3 4 4 3 1 4 −∞ 8 −∞ 4 1 1 3 3 4

Running time complexity b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 20 If 𝑃 has m nodes and 𝑇 has 𝑛 node There are 𝑛𝑚 cells in the large DP table In the worst case – for each cell we will compute a small DP table with 𝑚𝑛 cells Resulting in 𝑂( 𝑛 2 𝑚 2 ) running time Is there a tighter bound? 𝑃 𝑇 b e f c d a c’ f’ a' e' b' d' g' h’

Running time complexity b’ e‘ f‘ g‘ c‘ h‘ d‘ a' b 4 1 3 e f c −∞ 8 d a 20 If 𝑃 has m nodes and 𝑇 has 𝑛 node Each node 𝑢 in 𝑃 will be in a small DP table only when its father is compared to a node in 𝑇 A father of a node in P is compared at most 𝑛 times ⟹𝑂(𝑚𝑛) Symmetrically, for a node 𝑣 in T Overall: 𝑂 𝑚𝑛+𝑚𝑛+𝑚𝑛 =𝑂(𝑚𝑛) 𝑃 𝑇 b e f c d a c’ f’ a' e' b' d' g' h’ Large DP Small DP for a node in P Small DP for a node in T