Tree edit distance1 Tree Edit Distance
Minimum edits to transform one tree into another Tree edit distance2 TED
Tree edit distance3 Delete a node: The edit operations w ˙˙˙ v Relabel a node:
Tree edit distance4 The edit operations ˙˙˙ Insert a node: ˙˙˙ v
Tree edit distance5 Existing Algorithms
Tree edit distance6 Recursive Algorithm [SZ89] v w FG Recurs on the rightmost root: Delete v d(F,G) = min Delete w Match v and w
Tree edit distance7 Recursive Algorithm [SZ89] v w FG Recurs on the rightmost root: Delete v d(F,G) = min Delete w Match v and w
Tree edit distance8 Recursive Algorithm [SZ89] v w FG Recurs on the rightmost root: Delete v d(F,G) = min Delete w Match v and w
Tree edit distance9 Recursive Algorithm [SZ89] v w FG Recurs on the rightmost root: Delete v d(F,G) = min Delete w Match v and w
Tree edit distance10 Recursive Algorithm [SZ89] v w FG Recurs on the rightmost root: Delete v d(F,G) = min Delete w Match v and w
Tree edit distance11 Recursive Algorithm [SZ89] v w FG Recurs on the rightmost root: Delete v d(F,G) = min Delete w Match v and w
Tree edit distance 12 Time Complexity [SZ89] relevant subproblem: if it shows up while computing d(F,G) #relevant subproblems = time complexity = O(n 2 m 2 ) = O(n 4 ) O(nm. min{Depth(F),Leaves(F)}. min{Depth(G),Leaves(G)}) v w F G Relevant subforests
Tree edit distance13 Klein98 Same as previous algorithm, but recurs on a light child in F. #relevant subproblems = (#relevant subforests of F). m 2 = = O(nlogn. m 2 ) = O(n 3 logn) FG By heavy path decomposition [HT84]
Tree edit distance14 Decomposition strategy [DT03] For every two subforests (F,G) a strategy says right or left. Zhang & Shasha’s strategy = right always. Klein’s strategy = right iff the rightmost tree in F is smaller than the leftmost tree in F. Lower bound of strategy algorithms = (nm. logn. logm) Any strategy algorithm computes the edit distance between any two subtrees of F and G (without their roots).
Tree edit distance15 Our Results An O ( m 2 n(log + 1) ) = O(n 3 ) time, O(nm) space algorithm. (Today: O((nm) 3/2 )=O(n 3 ) time and space) [DMRW ICALP07] A strategy algorithm symmetrically dependant on the two input trees. A matching lower bound for all strategy algorithms. (Today: A lower bound of (nm 2 )) Local edit distance and affine gap penalties at the cost of one execution. (Today: Local RNA edit distance) [BHLW CPM06] n m
Tree edit distance16 Our Algorithm Our algorithm to compute d(F,G): 1.If F<G compute d(G,F). 2.Recursively run d(K i,G) for every K i. 3.Run Klein’s strategy where “master” is F (no need to recurs). K5K5 K3K3 K4K4 F K2K2 K1K1 G
Tree edit distance17 Analysis Our algorithm to compute d(F,G): 1.If F<G compute d(G,F). 2.Recursively run d(K i,G) for every K i. 3.Run Klein’s strategy where “master” is F (no need to recurs). K5K5 K3K3 K4K4 F K2K2 K1K1 G R(F, G) = ?
Tree edit distance18 An O((nm) 3/2 ) = O(n 3 ) Upper Bound We show that. Proof by induction: R(F,G)
Tree edit distance19 We show that. Proof by induction: R(F,G) By inductive assumption By (*) and (**) We know G<F An O((nm) 3/2 ) = O(n 3 ) Upper Bound
Tree edit distance20 An O((nm) 3/2 ) = O(n 3 ) Upper Bound We show that. Proof by induction: R(F,G) By inductive assumption By (*) and (**) We know G<F
An O((nm) 3/2 ) = O(n 3 ) Upper Bound Tree edit distance21 We show that. Proof by induction: R(F,G) By inductive assumption By (*) and (**) We know G<F
Tree edit distance22 We show that. Proof by induction: R(F,G) By inductive assumption By (*) and (**) We know G<F An O((nm) 3/2 ) = O(n 3 ) Upper Bound
Tree edit distance23 An O((nm) 3/2 ) = O(n 3 ) Upper Bound We show that. Proof by induction: R(F,G) By inductive assumption By (*) and (**) We know G<F
Tree edit distance 24 An O( ) Bound Proof idea: At most log(n/m) nested recursive calls where F is “master” before all trees ≤ m. For all trees ≤ m use previous O(m 3 ) bound. At most n/m such trees so total = n/m. O(m 3 ) = O(nm 2 ). n m K5K5 K3K3 K4K4 F K2K2 K1K1 G
Tree edit distance25 A Matching Lower Bound for all decomposition strategy algorithms
Tree edit distance26 A Matching Lower Bound for all decomposition strategy algorithms An (nm 2 ) lower bound: F G
Tree edit distance27 A Matching Lower Bound for all decomposition strategy algorithms An (nm 2 ) lower bound: Consider this computational path: If the strategy says left delete from F, otherwise delete from G. For every two internal nodes v in F and w in G we get: min{|F v |,|G w |} new subproblems (F v is the tree rooted at v). Summing over all such v,w:
Tree edit distance28 A Matching Lower Bound for all decomposition strategy algorithms An lower bound A careful counting argument on: F G
Tree edit distance29 Thank you!