Taku Aratsu1, Kouichi Hirata1 and Tetsuji Kuboyama2

Slides:



Advertisements
Similar presentations
Chapter 12 Binary Search Trees
Advertisements

1 Lecture 12 AVL Trees. 2 trees static dynamic game treessearch trees priority queues and heaps graphs binary search trees AVL trees 2-3 treestries Huffman.
Trees Types and Operations
AVL Tree Smt Genap Outline AVL Tree ◦ Definition ◦ Properties ◦ Operations Smt Genap
Algorithms and Data Structures Lecture 4. Agenda: Trees – fundamental notions, variations Binary search tree.
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
DictionaryADT and Trees. Overview What is the DictionaryADT? What are trees? Implementing DictionaryADT with binary trees Balanced trees DictionaryADT.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
2 -1 Analysis of algorithms Best case: easiest Worst case Average case: hardest.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
CS2420: Lecture 15 Vladimir Kulyukin Computer Science Department Utah State University.
The Complexity of Algorithms and the Lower Bounds of Problems
Tree edit distance1 Tree Edit Distance.  Minimum edits to transform one tree into another Tree edit distance2 TED.
KNURE, Software department, Ph , N.V. Bilous Faculty of computer sciences Software department, KNURE The trees.
Binary Trees Chapter 6.
Trees and Tree Traversals Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Lecture 06: Tree Structures Topics: Trees in general Binary Search Trees Application: Huffman Coding Other types of Trees.
COSC2007 Data Structures II
CS Data Structures Chapter 5 Trees. Chapter 5 Trees: Outline  Introduction  Representation Of Trees  Binary Trees  Binary Tree Traversals 
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
1 Trees 2 Binary trees Section Binary Trees Definition: A binary tree is a rooted tree in which no vertex has more than two children –Left and.
CISC220 Fall 2009 James Atlas Lecture 13: Trees. Skip Lists.
CSCI 115 Chapter 7 Trees. CSCI 115 §7.1 Trees §7.1 – Trees TREE –Let T be a relation on a set A. T is a tree if there exists a vertex v 0 in A s.t. there.
Tree (new ADT) Terminology:  A tree is a collection of elements (nodes)  Each node may have 0 or more successors (called children)  How many does a.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
The Lower Bounds of Problems
Computer Science: A Structured Programming Approach Using C Trees Trees are used extensively in computer science to represent algebraic formulas;
2-3 Trees Extended tree.  Tree in which all empty subtrees are replaced by new nodes that are called external nodes.  Original nodes are called internal.
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI 240 Recursion and Trees Dale Roberts, Lecturer
Discrete Mathematics Chapter 5 Trees.
CS223 Advanced Data Structures and Algorithms 1 Priority Queue and Binary Heap Neil Tang 02/09/2010.
Foundation of Computing Systems
© University of Auckland Trees – (cont.) CS 220 Data Structures & Algorithms Dr. Ian Watson.
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
1 Trees 2 Binary trees Section Binary Trees Definition: A binary tree is a rooted tree in which no vertex has more than two children –Left and.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
B+-Tree Deletion Underflow conditions B+ tree Deletion Algorithm
Decision Trees DEFINITION: DECISION TREE A decision tree is a tree in which the internal nodes represent actions, the arcs represent outcomes of an action,
Trees Chapter 15.
CSCE 210 Data Structures and Algorithms
Chapter 5 : Trees.
Binary search tree. Removing a node
CISC220 Fall 2009 James Atlas Lecture 13: Binary Trees.
Binary Search Tree (BST)
Source Code for Data Structures and Algorithm Analysis in C (Second Edition) – by Weiss
Objective: Understand Concepts related to trees.
Data Structures & Algorithm Design
Integrating XML Data Sources Using Approximate Joins
Binary Trees, Binary Search Trees
CS223 Advanced Data Structures and Algorithms
Priority Queue & Heap CSCI 3110 Nan Chen.
The Complexity of Algorithms and the Lower Bounds of Problems
CS202 - Fundamental Structures of Computer Science II
Ch. 12: Binary Search Trees
Comparative RNA Structural Analysis
CS223 Advanced Data Structures and Algorithms
Consensus Partition Liang Zheng 5.21.
Binary Trees, Binary Search Trees
CS223 Advanced Data Structures and Algorithms
Chap 3 String Matching 3 -.
2-3 Trees Extended tree. Tree in which all empty subtrees are replaced by new nodes that are called external nodes. Original nodes are called internal.
Binary Trees, Binary Search Trees
A Heap Is Efficiently Represented As An Array
NATURE VIEW OF A TREE leaves branches root. NATURE VIEW OF A TREE leaves branches root.
Presentation transcript:

Approximating Tree Edit Distance through String Edit Distance for Binary Tree Codes Taku Aratsu1, Kouichi Hirata1 and Tetsuji Kuboyama2 1Department of Artificial Intelligence, Kyushu Institute of Technology 2Computer Center, Gakushuin University

Outline of Talk Tree edit distance and string edit distance Binary tree code Lower and upper bounds of tree edit distance through the string edit distance for binary tree codes

String Edit Distance (cf. [R.A.Wagner et al. 1974]) Edit operations Deletion Insertion Substitution String edit distance s(s1,s2) between two strings s1 and s2 Minimum number of operations to transform s1 into s2 s(s1,s2) is computed by O(n2) time n is the maximum length of strings Deletion Insertion Substitution Insertion s1 G C G C G A T C G C T C s2 C G A T C C T C

Tree Edit Distance [K.-C.Tai 1974] Most famous similarity measure Edit operations Deletion Insertion Substitution Tree edit distance t(T1, T2) between two trees T1 and T2 Minimum number of operations to transform T1 into T2 T1 T2 a a a b d e d e c e c e a c a d d d d a d a d Deletion Insertion Substitution

Time Complexity of Tree Edit Distance for Ordered Trees Algorithm for computing tree edit distance for ordered trees have been continuously improved O(n6) [K.-C.Tai 1974] O(n4) [K.Zhang et al. 1989] O(n3logn) [P.N.Klein 1998] O(n3) [E.D.Demaine et al. 2007] n is the maximum number of nodes of trees Tree edit distance is not adequate for large scale data Approximating the tree edit distance (O(n3)) through the string edit distance (O(n2))

String Edit Distance between Euler Strings of Trees [Akutsu 2006] Approximating tree edit distance through string edit distance between Euler strings of two trees Euler string s(T) of a tree T T1 upward traversal R B C C E D string edit distance s(s(T1),s(T2)) T2 R B C E D

String Edit Distance between Euler Strings of Trees [Akutsu 2006] Approximating tree edit distance through string edit distance between Euler strings of two trees Euler string s(T) of a tree T t is the tree edit distance s is the string edit distance h is the minimum height of two trees T upward traversal R B C C E D

String Edit Distance Between Binary Tree Code of Trees Approximating tree edit distance through string edit distance for binary tree codes t is the tree edit distance s is the string edit distance h is the minimum height of two trees Binary tree code Euler string

Binary Tree Representation (cf. [D.E.Knuth 1968]) Binary tree representation b(T) of a tree T v T – {r} First child of v in T is the left child of v in b(T) ⊥ is the left child of v in b(T) if there does not exist Next sibling of v in T is the right child of v in b(T) T is the right child of v in b(T) if there does not exist If v is the root r of T, then r is also the root of b(T) and has just a left child b(T) r T a r d b a b c ⊥ e f c d e f g ⊥ T ⊥ g ⊥ T ⊥ T dummy nodes

Binary Tree Code Binary tree code bc(T) of a tree T bc(T) is the preorder traversal of b(T) bc(T) can be constructed from a tree T in O(|T|) time T can be constructed from a bc(T) in O(|T|) time |b(T)| = |bc(T)| = 2|T| Tree edit distance t(T1,T2) = 0 iff string edit distance s(bc(T1),bc(T2)) = 0 r b(T) T r b a c d e f g a d b ⊥ e f c ⊥ T ⊥ g ⊥ T bc(T) = r a d ⊥ e ⊥Т b f ⊥ g ⊥Т c ⊥Т ⊥ T

String Edit Distance Between Binary Tree Code of Trees Approximating tree edit distance through string edit distance for binary tree codes t is the tree edit distance s is the string edit distance h is the minimum height of two trees Binary tree code Euler string

Lower bound of Tree Edit Distance s(bc(T1),bc(T2)) changes at most 2 when an edit operation is applied Substitution bc(T1) = v T1 T2 v w bc(T2) = w

Lower bound of Tree Edit Distance s(bc(T1),bc(T2)) changes at most 2 when an edit operation is applied v0 v0 p v n p v1 vn n v1 vn Deletion(Insertion) s1 s2 s3 s4 s1 s2 s3 s4 b (T1) r p v1 n vn r b (T2) bc(T1) = s1 p s2 v s3 T s4 p v v1 n bc(T2) = s1 p s2  s3  s4 vn T

String Edit Distance Between Binary Tree Code of Trees Approximating tree edit distance through string edit distance for binary tree codes t is the tree edit distance s is the string edit distance h is the minimum height of two trees Binary tree code Euler string

Alignment (cf. [R.A.Wagner et al. 1974]) Alignment between two strings s1 and s2 is obtained by inserting gap symbol ‘-’ Resulting strings s1’ and s2’ are of the same length Cost of alignment is s1’[i] = s2’[i] ‘-’ : 0 otherwise : 1 An optimal alignment is an alignment with the minimum cost Cost of optimal alignment is equal to the string edit distance s1 G C G T C G T s2 C G A T C C T C inserting gap s1’ G C G - T C G T - s2’ - C G A T C C T C The cost of alignment is 4

Ordered Edit Distance Mapping [K.-C.Tai 1997] Ordered edit distance mapping M from T1 to T2 (mapping, for simply) v T1, w T2 For every pair (v1,v2),(w1,w2) M, v1 = v2 iff w1 = w2 v1 is an ancestor of v2 iff w1 is an ancestor of w2 v1 is to the left of v2 iff w1 is to the left of w2 id(M) Number of pairs identical labels in M Mapping M maximizing id(M) corresponds to the tree edit distance |T1|+|T2|-|M|-id(M) T1 T2 a b d e c e a c d d a d

Bottom-up Mapping [G.Valiente 2001] Bottom-up mapping is the restricted mapping Bottom-up mapping is a mapping that forms the common complete subforest between two trees if labels are ignored T1 b T2 a b c a a a b a c b d b c b c c a a c d a

Upper Bound of Tree Edit Distance Alignment between bc(T1)’ and bc(T2)’ is given from bc(T1) and bc(T2) MSP is the set of maximal substring pairs {(p11,p21),…,(p1d,p2d)} MSSP is the set of maximal subtree string pairs in MSP {(t11,t21),…,(t1b,t2b)} Bottom-up mapping is constructed from the nodes in t1i and one in the t2i without ^ and T p11 p12 p13 p21 p22 p23 T1 T2 a a t11 t12 t13 t21 t22 t23 bc(T1)’ = - a b c ^ d ^ T b c ^ d ^ T e ^ T b b e a c d f t21 t22 t23 c d c d b bc(T2)’ = a a b c ^ d ^ T T c ^ d ^ - f ^ T t11 t12 t13 c d

Upper Bound of Tree Edit Distance ah a2 Ph+1 h+1 a0 a1 a2 ah ^ T T T T ti h pij The worst case for REST(pij) M’ is corresponding to the elements of MSSP s is the string edit distance h is the minimum height of two trees bc(Ph+1) REST(pij) Total number of positions in substrings pij that do not appear in MSSP For every pij, REST(pij) h MSSP MSSP bc(T1)’ = bc(T2)’ = pi1 pi2 pid-1 pid

Upper Bound of Tree Edit Distance M’ is corresponding to the elements of MSSP s is the string edit distance h is the minimum height of two trees at least d – 1 gaps in alignment bc(T1)’ and bc(T2)’ bc(T1)’ = bc(T2)’ = pi1 pi2 pid-1 pid

Upper Bound of Tree Edit Distance M’ is corresponding to the elements of MSSP s is the string edit distance h is the minimum height of two trees The length of alignment bc(T)’ |bc(T)’| bc(T1)’ = bc(T2)’ = pi1 pi2 pid-1 pid

Upper Bound of Tree Edit Distance M’ is corresponding to the elements of MSSP s is the string edit distance h is the minimum height of two trees

Upper Bound of Tree Edit Distance M’ is defined as same substring in binary tree codes

String Edit Distance Between Binary Tree Code of Trees Approximating tree edit distance through string edit distance for binary tree codes t is the tree edit distance s is the string edit distance h is the minimum height of two trees Binary tree code Euler string

Example [Akutsu 2006] T1 T2 r r x x a c a x x b d b c x x a c a d x x 2m b d b c x x ・・・ ・・・ x x a a c d x x b d b c bc(T1) = r (x a ^ x b ^)m d ^ Т(c ^ T d ^ T)m-1 c ^ T T bc(T2) = r (x a ^ x b ^)m d ^ (c ^ T d ^ T)m-1 c ^ T T T s(T1) = (x a a x b b)m d d x (c c x d d x) m-1 c c x s(T2) = (x a a x b b)m d d (c c x d d x) m-1 c c x x

Example bc(T1) = (a b)m a ^ d ^ T (c ^ T d ^ T) m-1 T c ^ T ・・・ a ・・・ b c b d a d a c d bc(T1) = (a b)m a ^ d ^ T (c ^ T d ^ T) m-1 T c ^ T bc(T2) = (a b)m a ^ d ^   (c ^ T d ^ T) m-1 T c ^ T T s(T1) = (b a)m-1 a d d b (c c a d d b ) m-1 c c s(T2) = (b a)m-1 a d d   (c c b d d a ) m-1 c c b

Conclusion Binary tree code a string obtained by traversing binary tree representation with two kinds of dummy nodes of a tree in preorder Approximation of the tree edit distance through the string edit distance between binary tree codes of trees Future work Comparison to other similarity measures Application to tree-structured data