Download presentation
Presentation is loading. Please wait.
Published byCecil Rodgers Modified over 9 years ago
1
A New Top-down Algorithm for Tree Inclusion Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9
2
Outline Motivation Basic algorithm for tree inclusion problem -Definition -Algorithm description Improvements Summary
3
Given two ordered labeled trees P and T, called the pattern and the target, respectively. An interesting problem is: Can we obtain pattern P by deleting some nodes from target T? That is, is there a sequence v 1,..., v k of nodes such that for T 0 = T and T i+1 = delete(T i, v i +1 ) for i = 0,..., k - 1, we have T k = P. If this is the case, we say, P is included in T, T contains P, or say, T covers P. Motivation a b d ef T:T: cb d ef T:T: a delete(T, c)
4
Motivation s vp vnadv “reads”“book” s np vp detnvnpadv “The”“student”“reads”detadjn “the”“interesting”“book” “again and again” Linguistic analysis
5
Definition 1 Let F and G be labeled ordered forests. We define an ordered embedding (, G, F) as an injective function : V(G) V(F) such that for all nodes v, u V(G), i)label(v) = label((v)); (label preservation condition) ii)v is an ancestor of u iff (v) is an ancestor of (u);(ancestor condition) iii)v is to the left of u iff (v) is to the left of (u); (Sibling condition) Tree inclusion algorithm Definition a b b G:G: a d b ebb F:F:
6
Algorithm Tree inclusion algorithm 1.Let T = (k 1) be a tree and G = (l 1) be a forest. We handle G as a tree P =, where p v represents a virtual node, matching any node in T. 2.Consider a node in P with children v 1,..., v j. We use a pair (i j) to represent an ordered forest containing the first i subtrees of v:. Then, represents the first j trees in G. P:P: v1v1 vivi vkvk … … v
7
Algorithm Tree inclusion algorithm 3.In addition, h(v) represents the height of v in a tree; and (v) represents a link from v in P to the leaf node on the left-most path in P[v]. Let v’ be a leaf node in P. We denote by -1 (v’) a set of nodes x such that for each v x (v) = v’. -1(v 3 ) = {v 1, v 2, v 3 } v1v1 v5v5 v4v4 v2v2 v3v3 (v1)(v1) (v2)(v2) P:P:
8
The tree inclusion checking is done by calling two functions recursively: top-down(T, G), bottom-up(T’, G), where T is a tree, and T’ and G are two forests. Algorithm Tree inclusion algorithm Each of the two functions returns a pair with v being p v or a node on the left-most path in P 1. T = T’ = G =
9
Function: top-down(T, G) Tree inclusion algorithm Case 1: G = ; or G = (l > 1), but |T | |P 1 | + |P 2 |. In this case, we try to find a pair such that T contains the first i subtrees of v, where v = p v, or v -1 (v’) and v’ is the leaf node on the left-most path in P 1. T:T: G:G: P1P1 pvpv G:G: …… P1P1 P2P2 pvpv |T| |P 1 | + |P 2 |. T:T: t t PlPl p1p1 In top-down(T, G), two cases will be handled. p1p1
10
Function: top-down(T, G) Tree inclusion algorithm i)If t is a leaf node, we will check whether label(t) = label((p 1 )), where p 1 is the root of P 1. If it is the case, return. Otherwise, return. T = : G:G: P1P1 pvpv G:G: …… P1P1 P2P2 pvpv |T | |P 1 | + |P 2 |. t t T = : PlPl case 1:
11
Function: top-down(T, G) Tree inclusion algorithm ii)If |T| < |P 1 | or height(t) < height(p 1 ), we will make a recursive call top-down(T, ), where be a forest of the subtrees of p 1. The return value of top-down(T, ) is used as the return value of top-down(T, G) |T | < |P 1 | G:G: …… pvpv p1p1 … … P 11 P1jP1j P1iP1i T:T: t PlPl case 1:
12
Function: top-down(T, G) Tree inclusion algorithm iii)If |T| |P 1 | (but |T | |P1| + |P2|) and height(t) height(p 1 ), two cases need to be considered: label(t) = label(p 1 ). Call bottom-up(, ). label(t) label(p 1 ). Call bottom-up(, ). p1p1 … … P 11 P1jP1j P1iP1i t … … T1T1 TkTk TiTi label(t) = label(p 1 ) p1p1 … … P 11 P1jP1j P1iP1i t … … T1T1 TkTk TiTi label(t) label(p 1 ) case 1:
13
In both sub-cases, assume that the return value is. A further checking needs to be conducted: Function: top-down(T, G) Tree inclusion algorithm If label(t) = label(v) and i = the outdegree of v, the return value should be. Otherwise, the return value is the same as. T:T: t P1:P1: p1p1 v or label(t) label(v) label(t) = label(v) case 1:
14
Function: top-down(T, G) Tree inclusion algorithm Case 2: G = (l > 1), and |T| > |P 1 | + |P 2 |. In this case, we will call bottom-up(, G). Assume that the return value is. The following checkings will be continually conducted. Case 1: G = ; or G = (l > 1), but |T | |P 1 | + |P 2 |. G:G: …… P1P1 P2P2 pvpv |T | > |P 1 | + |P 2 | PlPl T:T: …… T1T1 T2T2 t TkTk
15
Function: top-down(T, G) Tree inclusion algorithm iv)If v = p 1 ’s parent, the return value is the same as. v)If v p 1 ’s parent, check whether label(t) = label(v)) and i = the outdegree of v. If so, the return value will be changed to. Otherwise, the return value remains. Case 2: G = (l > 1), and |T | > |P 1 | + |P 2 |. In this case, we will call bottom-up(, G). Assume that the return value is. The following checkings will be continually conducted. G:G: … … P1P1 P2P2 pvpv v = p 1 ’s parent = p v …… P1P1 P2P2 pvpv v p 1 ’s parent v PiPi PlPl PlPl
16
Function: bottom-up(T’, G) Tree inclusion algorithm bottom-up(T’, G) is designed to handle the case that both T’ and G are forests. Let T’ = and G =. In bottom-up(T’, G), we will make a series of calls top-down(T l, ), where l = 1,..., k, j 1 = 0, and j 1 j 2 ... j h q (for some h k), controlled as follows. … … PiPi … … TkTk T1T1 TiTi P1P1 PqPq T2T2 … top-down(T l, ) T’: G:G:
17
Function: bottom-up(T’, G) Tree inclusion algorithm 1.Two index variables l, j are used to scan T 1,..., T k and P 1,..., P q, respectively. 2.Let be the return value of top-down(T l, ). If v l = p j ’s parent, set j to be j + i l - 1. Otherwise, j is not changed. Set l to be l + 1. Go to (2). 3.The loop terminates when all T l ’s or all P j ’s are examined. bottom-up(T’, G) is designed to handle the case that both T’ and G are forests. Let T’ = and G =. In bottom-up(T’, G), we will make a series of calls top-down(T l, ), where l = 1,..., k, j 1 = 0, and j 1 j 2 ... j h q (for some h k), controlled as follows.
18
Function: bottom-up(T’, G) Tree inclusion algorithm If j > 0 when the loop terminates, bottom-up(T’, G) returns. … … PiPi … … TkTk T1T1 TiTi P1P1 PqPq T2T2 … PjPj
19
Function: bottom-up(T’, G) Tree inclusion algorithm i)Let,,..., be the respective return values of top-down(T 1, ), top-down(T 2, ),...... top-down(T k, ). Since j = 0, each v l -1 (v’) (l = 1,..., k). Otherwise, j = 0. In this case, we will continue to searching for a pair such that T’ contains the first i subtrees of v, where v -1 (v’) and v’ is the leaf node on the left-most path in P 1, as described below. If j > 0 when the loop terminates, bottom-up(T’, G) returns. P1P1 v1v1 v2v2 vkvk …
20
ii)If each i l = 0, return, where is considered to be a descendant of any node in G. Otherwise, find the first v g with children w 1,..., w h such that v g is not a descendant of any other v j, and i g > 0. Call bottom-up(, ). Function: bottom-up(T’, G) Tree inclusion algorithm i)Let,..., be the return values of top-down(T 1, ),..., top-down(T k, ), respectively. Since j = 0, each v l -1 (v’) (l = 1,..., k). Let be its return value. If y = v g, then the return value of bottom-up(T’, G) is set to be. Otherwise, the return value is. … … T g+1 T1T1 TgTg T2T2 P1P1 v1v1 vgvg vkvk TkTk … … igig
21
Further improvements Tree inclusion algorithm In the case j = 0: Let,..., be the return values of top-down(T 1, ),..., top-down(T k, ). We will find the first v g such that it is not a descendant of any other v j and i g > 0. Then, bottom-up(, ). is invoked. This shows that all the return values except are not used in the subsequent computation. Thus, the work for looking for such values should be avoided. … … T g+1 T1T1 TgTg T2T2 P1P1 v1v1 vgvg vkvk TkTk … …
22
Let be the return value of top-down(T j, ) such that i j > 0 and v j is p 1 or a descendant of p 1. Then, during the execution of top-down(T j+1, ), once we have detected that it can only produce a return value with v j+1 being a descendant of v j, we should stop the corresponding computation immediately since this return value will not be used in the subsequent searching. For this purpose, we rearrange top-down(T j+1, ) to top-down(T j+1,, v j ) with v j being used to transfer information, called a controlling-node. Further improvements Tree inclusion algorithm Assume that in the execution of top-down(T j+1,, v j ), we have the following function calls: top-down(T j+1,1,, u 1 ) returns, top-down(T j+1,2,, u 2 ) returns, With all u j ’s being a proper descendant of v j. Then the bottom-up function call with some u i as a controlling node should not be conducted. … bottom-up(,, u i ).
23
Summary An efficient method for tree inclusion problem -O|T|min{D P, |leaves(P)|}) time and -O(|T| + |P|) space where D P – the height of P, and Future work -adapt the algorithm to a data stream environment -adapt the algorithm to an indexing environment leaves(P) - set of the leaf nodes of P.
24
Thank you.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.