Download presentation
Presentation is loading. Please wait.
1
Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting
2
The Forest Edit Distance
3
Edit distance of two ordered, labeled forests Edit operations between E and F Relabling node i in E by the label of node j in F 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy
4
Edit distance of two ordered, labeled forests Edit operations between E and F Relabling node i in E by the label of node j in F Relabel (3,5) 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy y
5
Edit distance of two ordered, labeled forests Edit operations between E and F Relabling node i in E by the label of node j in F Cost of the operation: (3,5) 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy p
6
Edit distance of two ordered, labeled forests Edit operations between E and F Delete node i from E 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy
7
Edit distance of two ordered, labeled forests Edit operations between E and F Delete node i from E Delete (2,-) 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy
8
Edit distance of two ordered, labeled forests Edit operations between E and F Delete node i from E Delete (2,-) 4 3 1 4 1 2 3 7 56 E F a h m a me z v uy
9
Edit distance of two ordered, labeled forests Edit operations between E and F Delete node i from E Cost of the operation: (2,-) 4 3 1 4 1 2 3 7 56 E F a h m a me z v uy
10
Edit distance of two ordered, labelled forests Edit operations between E and F Delete node j from F The cost of operation: (-,j) 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy
11
Edit distance of two ordered, labelled forests The edit distance (E,F) between E and F is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F'. 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy 4 23 1 4 1 2 3 7 56 a h fm a me z v uy
12
Edit distance of two ordered, labelled forests The edit distance (E,F) between E and F is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F'. 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy 4 23 1 4 1 2 3 7 56 a h fm a me z v uy e
13
Edit distance of two ordered, labelled forests The Guided edit distance (E,F,G) between E and F with respect to a third forest G is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F' include G as a subforest. 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy 4 23 4 13 a m a mee 3 12 a me G
14
Application 1: RNA comparisons Cherry small circular viroid-Like RNA GI:2347024 between base 287 and base 337. T he Hammerhead motif of the RNA is printed in bold.
15
Application 2: Comparing XML documents XML documents with same Document Type Descriptor should be aligned with this DTD to get more accurate results
16
The algorithms (E,F) Tai 1979: Zhang and Shasha 1989: where Klein 1998: (E,F,G) : This paper:
17
Special Cases a a c c b a c c a c c f f
18
a a c c b a c c a c c f f Longest Constraint Common Subsequence Constrained Sequence Alignment
19
The algorithms Constrained Longest Common Subsequent Tsai 2003: Constrained Sequence Alignment Chin et al. : This paper: where Since G has one leaf, the time becomes
20
Our algorithm for computing (E,F,G) Dynamic Programming
21
The sub-problems Post-order numbering (naming) of the nodes 5 34 12 14 10 1211 13 8 7 9 6 18 16 15 17 20 192221 23
22
The sub-problems : A "consecutive" sub-forest 5 34 12 14 10 1211 13 8 7 9 6 18 16 15 17 20 192221 23
23
The sub-problems : A "consecutive" sub-forest 5 34 12 14 10 1211 13 8 7 9 6 18 16 15 17 20 192221 23
24
The sub-problems 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG
25
The sub-problems 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG
26
is equal to the minimum of the followings: 1. 2. 3. 4. 5.
27
1. 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG
28
5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG
29
2. 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG
30
3. 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG
31
5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG
32
4. 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG
33
5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG
34
5. 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG
35
5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG
36
5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG
37
The order for solving the sub-problems for i=1 to |E| for j=1 to |F| for h=1 to |G| for k=1 to (|G|-h+1) if k is a leaf then find
38
The time complexity
39
Sparsify the dynamic program using a clever trick of Zhang and Shasha
40
key-root: if it is the root, or has a left-slibling 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG 2 1
41
5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG 2 1 No. of key-roots ≤ no. of leaves
42
To compute (E,F,G)= (E|| 1..|E|,F|| 1..|F|,G|| 1..|G| ) for i=1 to |E| for j=1 to |F| for h=1 to |G| for k=1 to (|G|-h+1) if k is a leaf find
43
To compute (E,F,G)= (E|| 1..|E|,F|| 1..|F|,G|| 1..|G| ) for i=1 to |E| for j=1 to |F| for h=1 to |G| for k=1 to (|G|-h+1) if k is a leaf and i and j are key-roots find
44
The new running time
45
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.