Presentation is loading. Please wait.

Presentation is loading. Please wait.

Taku Aratsu1, Kouichi Hirata1 and Tetsuji Kuboyama2

Similar presentations


Presentation on theme: "Taku Aratsu1, Kouichi Hirata1 and Tetsuji Kuboyama2"— Presentation transcript:

1 Approximating Tree Edit Distance through String Edit Distance for Binary Tree Codes
Taku Aratsu1, Kouichi Hirata1 and Tetsuji Kuboyama2 1Department of Artificial Intelligence, Kyushu Institute of Technology 2Computer Center, Gakushuin University

2 Outline of Talk Tree edit distance and string edit distance
Binary tree code Lower and upper bounds of tree edit distance through the string edit distance for binary tree codes

3 String Edit Distance (cf. [R.A.Wagner et al. 1974])
Edit operations Deletion Insertion Substitution String edit distance s(s1,s2) between two strings s1 and s2 Minimum number of operations to transform s1 into s2 s(s1,s2) is computed by O(n2) time n is the maximum length of strings Deletion Insertion Substitution Insertion s1 G C G C G A T C G C T C s2 C G A T C C T C

4 Tree Edit Distance [K.-C.Tai 1974]
Most famous similarity measure Edit operations Deletion Insertion Substitution Tree edit distance t(T1, T2) between two trees T1 and T2 Minimum number of operations to transform T1 into T2 T1 T2 a a a b d e d e c e c e a c a d d d d a d a d Deletion Insertion Substitution

5 Time Complexity of Tree Edit Distance for Ordered Trees
Algorithm for computing tree edit distance for ordered trees have been continuously improved O(n6) [K.-C.Tai 1974] O(n4) [K.Zhang et al. 1989] O(n3logn) [P.N.Klein 1998] O(n3) [E.D.Demaine et al. 2007] n is the maximum number of nodes of trees Tree edit distance is not adequate for large scale data Approximating the tree edit distance (O(n3)) through the string edit distance (O(n2))

6 String Edit Distance between Euler Strings of Trees [Akutsu 2006]
Approximating tree edit distance through string edit distance between Euler strings of two trees Euler string s(T) of a tree T T1 upward traversal R B C C E D string edit distance s(s(T1),s(T2)) T2 R B C E D

7 String Edit Distance between Euler Strings of Trees [Akutsu 2006]
Approximating tree edit distance through string edit distance between Euler strings of two trees Euler string s(T) of a tree T t is the tree edit distance s is the string edit distance h is the minimum height of two trees T upward traversal R B C C E D

8 String Edit Distance Between Binary Tree Code of Trees
Approximating tree edit distance through string edit distance for binary tree codes t is the tree edit distance s is the string edit distance h is the minimum height of two trees Binary tree code Euler string

9 Binary Tree Representation (cf. [D.E.Knuth 1968])
Binary tree representation b(T) of a tree T v T – {r} First child of v in T is the left child of v in b(T) ⊥ is the left child of v in b(T) if there does not exist Next sibling of v in T is the right child of v in b(T) T is the right child of v in b(T) if there does not exist If v is the root r of T, then r is also the root of b(T) and has just a left child b(T) r T a r d b a b c e f c d e f g T g T T dummy nodes

10 Binary Tree Code Binary tree code bc(T) of a tree T
bc(T) is the preorder traversal of b(T) bc(T) can be constructed from a tree T in O(|T|) time T can be constructed from a bc(T) in O(|T|) time |b(T)| = |bc(T)| = 2|T| Tree edit distance t(T1,T2) = 0 iff string edit distance s(bc(T1),bc(T2)) = 0 r b(T) T r b a c d e f g a d b e f c T g T bc(T) = r a d ⊥ e ⊥Т b f ⊥ g ⊥Т c ⊥Т T

11 String Edit Distance Between Binary Tree Code of Trees
Approximating tree edit distance through string edit distance for binary tree codes t is the tree edit distance s is the string edit distance h is the minimum height of two trees Binary tree code Euler string

12 Lower bound of Tree Edit Distance
s(bc(T1),bc(T2)) changes at most 2 when an edit operation is applied Substitution bc(T1) = v T1 T2 v w bc(T2) = w

13 Lower bound of Tree Edit Distance
s(bc(T1),bc(T2)) changes at most 2 when an edit operation is applied v0 v0 p v n p v1 vn n v1 vn Deletion(Insertion) s1 s2 s3 s4 s1 s2 s3 s4 b (T1) r p v1 n vn r b (T2) bc(T1) = s1 p s2 v s3 T s4 p v v1 n bc(T2) = s1 p s2  s3  s4 vn T

14 String Edit Distance Between Binary Tree Code of Trees
Approximating tree edit distance through string edit distance for binary tree codes t is the tree edit distance s is the string edit distance h is the minimum height of two trees Binary tree code Euler string

15 Alignment (cf. [R.A.Wagner et al. 1974])
Alignment between two strings s1 and s2 is obtained by inserting gap symbol ‘-’ Resulting strings s1’ and s2’ are of the same length Cost of alignment is s1’[i] = s2’[i] ‘-’ : 0 otherwise : 1 An optimal alignment is an alignment with the minimum cost Cost of optimal alignment is equal to the string edit distance s1 G C G T C G T s2 C G A T C C T C inserting gap s1’ G C G - T C G T - s2’ - C G A T C C T C The cost of alignment is 4

16 Ordered Edit Distance Mapping [K.-C.Tai 1997]
Ordered edit distance mapping M from T1 to T2 (mapping, for simply) v T1, w T2 For every pair (v1,v2),(w1,w2) M, v1 = v2 iff w1 = w2 v1 is an ancestor of v2 iff w1 is an ancestor of w2 v1 is to the left of v2 iff w1 is to the left of w2 id(M) Number of pairs identical labels in M Mapping M maximizing id(M) corresponds to the tree edit distance |T1|+|T2|-|M|-id(M) T1 T2 a b d e c e a c d d a d

17 Bottom-up Mapping [G.Valiente 2001]
Bottom-up mapping is the restricted mapping Bottom-up mapping is a mapping that forms the common complete subforest between two trees if labels are ignored T1 b T2 a b c a a a b a c b d b c b c c a a c d a

18 Upper Bound of Tree Edit Distance
Alignment between bc(T1)’ and bc(T2)’ is given from bc(T1) and bc(T2) MSP is the set of maximal substring pairs {(p11,p21),…,(p1d,p2d)} MSSP is the set of maximal subtree string pairs in MSP {(t11,t21),…,(t1b,t2b)} Bottom-up mapping is constructed from the nodes in t1i and one in the t2i without ^ and T p11 p12 p13 p21 p22 p23 T1 T2 a a t11 t12 t13 t21 t22 t23 bc(T1)’ = - a b c ^ d ^ T b c ^ d ^ T e ^ T b b e a c d f t21 t22 t23 c d c d b bc(T2)’ = a a b c ^ d ^ T T c ^ d ^ - f ^ T t11 t12 t13 c d

19 Upper Bound of Tree Edit Distance
ah a2 Ph+1 h+1 a0 a1 a ah ^ T T T T ti h pij The worst case for REST(pij) M’ is corresponding to the elements of MSSP s is the string edit distance h is the minimum height of two trees bc(Ph+1) REST(pij) Total number of positions in substrings pij that do not appear in MSSP For every pij, REST(pij) h MSSP MSSP bc(T1)’ = bc(T2)’ = pi1 pi2 pid-1 pid

20 Upper Bound of Tree Edit Distance
M’ is corresponding to the elements of MSSP s is the string edit distance h is the minimum height of two trees at least d – 1 gaps in alignment bc(T1)’ and bc(T2)’ bc(T1)’ = bc(T2)’ = pi1 pi2 pid-1 pid

21 Upper Bound of Tree Edit Distance
M’ is corresponding to the elements of MSSP s is the string edit distance h is the minimum height of two trees The length of alignment bc(T)’ |bc(T)’| bc(T1)’ = bc(T2)’ = pi1 pi2 pid-1 pid

22 Upper Bound of Tree Edit Distance
M’ is corresponding to the elements of MSSP s is the string edit distance h is the minimum height of two trees

23 Upper Bound of Tree Edit Distance
M’ is defined as same substring in binary tree codes

24 String Edit Distance Between Binary Tree Code of Trees
Approximating tree edit distance through string edit distance for binary tree codes t is the tree edit distance s is the string edit distance h is the minimum height of two trees Binary tree code Euler string

25 Example [Akutsu 2006] T1 T2 r r x x a c a x x b d b c x x a c a d x x 2m b d b c x x ・・・ ・・・ x x a a c d x x b d b c bc(T1) = r (x a ^ x b ^)m d ^ Т(c ^ T d ^ T)m-1 c ^ T T bc(T2) = r (x a ^ x b ^)m d ^ (c ^ T d ^ T)m-1 c ^ T T T s(T1) = (x a a x b b)m d d x (c c x d d x) m-1 c c x s(T2) = (x a a x b b)m d d (c c x d d x) m-1 c c x x

26 Example bc(T1) = (a b)m a ^ d ^ T (c ^ T d ^ T) m-1 T c ^ T
・・・ a ・・・ b c b d a d a c d bc(T1) = (a b)m a ^ d ^ T (c ^ T d ^ T) m-1 T c ^ T bc(T2) = (a b)m a ^ d ^   (c ^ T d ^ T) m-1 T c ^ T T s(T1) = (b a)m-1 a d d b (c c a d d b ) m-1 c c s(T2) = (b a)m-1 a d d   (c c b d d a ) m-1 c c b

27 Conclusion Binary tree code
a string obtained by traversing binary tree representation with two kinds of dummy nodes of a tree in preorder Approximation of the tree edit distance through the string edit distance between binary tree codes of trees Future work Comparison to other similarity measures Application to tree-structured data


Download ppt "Taku Aratsu1, Kouichi Hirata1 and Tetsuji Kuboyama2"

Similar presentations


Ads by Google