Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Linear-Space Top-down Algorithm for Tree Inclusion Problem

Similar presentations


Presentation on theme: "A Linear-Space Top-down Algorithm for Tree Inclusion Problem"— Presentation transcript:

1 A Linear-Space Top-down Algorithm for Tree Inclusion Problem
Prof. Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9

2 Outline Motivation Basic algorithm for tree inclusion problem - Definition - Algorithm description Improvements Summary

3 Motivation Given two ordered labeled trees P and T, called the pattern and the target, respectively. An interesting problem is: Can we obtain pattern P by deleting some nodes from target T? That is, is there a sequence v1 , ..., vk of nodes such that for T0 = T and Ti+1 = delete(Ti, vi +1) for i = 0, ..., k - 1, we have Tk = P. If this is the case, we say, P is included in T, T contains P, or say, T covers P. a b d e f T: c delete(T, c) b d e f T: a By “ordered trees”, we mean those trees, in which the order of siblings is significant.

4 Motivation Linguistic analysis s vp v n adv “reads” “book” s np vp det
“The” “student” “reads” adj “the” “interesting” “book” “again and again”

5 Tree inclusion algorithm
Definition Definition 1 Let F and G be labeled ordered forests. We define an ordered embedding (, G, F) as an injective function  : V(G)  V(F) such that for all nodes v, u  V(G), i) label(v) = label((v)); (label preservation condition) ii) v is an ancestor of u iff (v) is an ancestor of (u);(ancestor condition) iii) v is to the left of u iff (v) is to the left of (u); (Sibling condition) G: F: a a b b d b e b b By “ordered trees”, we mean those trees, in which the order of siblings is significant.

6 Tree inclusion algorithm
Let T = <t; T1, ..., Tk> (k  1) be a tree and G = <P1 , ..., Pl> (l  1) be a forest. We handle G as a tree P = <pv; P1, ..., Pl>, where pv represents a virtual node, matching any node in T. Consider a node v in P with children v1, ..., vj. We use a pair <i, v> (i  j) to represent an ordered forest containing the first i subtrees of v: <P[v1], ..., P[vi]>. Then, <j, pv> represents the first j trees in G. P: v v1 vi vk <i, v> referred to as a left corner

7 Tree inclusion algorithm
In addition, h(v) represents the height of v in a tree; and (v) represents a link from v in P to the leaf node on the left-most path in P[v]. P: (v1) Let v’ be a leaf node in P. We denote by -1(v’) a set of nodes x such that for each v  x (v) = v’. -1(v3) = {v1, v2, v3} v1 (v2) v2 v5 v3 v4

8 Tree inclusion algorithm
The tree inclusion checking is done by calling two functions recursively: top-down(T, G), bottom-up(T’, G), where T is a tree, and T’ and G are two forests. G: P1 P2 pv Pl v T = <t; T1, ..., Tk> T’ = <T1’, ..., Tk’> G = <P1, ..., PL> Each of the two functions returns a pair <i, v> with v being pv or a node on the left-most path in P1. Each node in T will be associated with a data structure, (t). - Initially, (t) is set . - By each call of top-down(<t; T1, ..., Tk>, G), (t) is changed to <i, v>, where <i, v> is the return value of top-down(<t; T1, ..., Tk>, G). ((t) is mainly used in the execution of bottom-up(T’, G) to avoid repeated computation.)

9 Tree inclusion algorithm
Function: top-down(T, G) In top-down(T, G), two cases will be handled. Case 1: G = <P1>; or G = <P1, ..., Pl> (l > 1), but |T|  |P1| + |P2|. In this case, we try to find a pair <i, v> such that T contains the first i subtrees of v, where v = pv , or v  -1(v’) and v’ is the leaf node on the left-most path in P1. T: t G: pv p1 P1 G: pv T: t p1 P2 Pl |T|  |P1| + |P2|. P1

10 Tree inclusion algorithm
Function: top-down(T, G) case 1: i) If t is a leaf node, we will check whether label(t) = label((p1)), where p1 is the root of P1. If it is the case, return <1, parent of (p1)>. Otherwise, return <0, parent of (p1)>. T = <t; T1, ..., Tk>: G: pv t P1 G: pv T = <t; T1, ..., Tk>: t Pl |T|  |P1| + |P2|. P1 P2

11 Tree inclusion algorithm
Function: top-down(T, G) case 1: If |T| < |P1| or height(t) < height(p1), we will make a recursive call top-down(T , <P11, ..., P1j>), where <P11, ..., P1j> be a forest of the subtrees of p1. The return value of top-down(T , <P11, ..., P1j>) is used as the return value of top-down(T, G) pv T: G: t |T| < |P1| p1 P11 P1j P1i Pl

12 Tree inclusion algorithm
Function: top-down(T, G) case 1: If |T|  |P1| (but |T |  |P1| + |P2|) and height(t)  height(p1), two cases need to be considered: label(t) = label(p1). Call bottom-up(<T1, ..., Tk>, <P11, ..., P1j>). p1 P11 P1j P1i t T1 Tk Ti label(t) = label(p1) label(t)  label(p1). Call bottom-up(<T1, ..., Tk>, <P1>). t p1 label(t)  label(p1) T1 Ti Tk P11 P1i P1j

13 Tree inclusion algorithm
Function: top-down(T, G) case 1: In both sub-cases, assume that the return value of the corresponding bottom-up function is <i, v>. A further checking needs to be conducted: If label(t) = label(v) and i = the outdegree of v, the return value should be <1, v’s parent>. Otherwise, the return value is the same as <i, v>. P1: p1 v T: label(t) = label(v) t or label(t)  label(v)

14 Tree inclusion algorithm
Function: top-down(T, G) Case 1: G = <P1>; or G = <P1, ..., Pl> (l > 1), but |T|  |P1| + |P2|. Case 2: G = <P1, ..., Pl> (l > 1), and |T| > |P1| + |P2|. In this case, we will call bottom-up(<T1, ..., Tk>, G). Assume that the return value is <i, v>. The following checkings will be continually conducted. T: t G: pv |T| > |P1| + |P2| Tk Pl T1 T2 P1 P2

15 Tree inclusion algorithm
Function: top-down(T, G) Case 2: G = <P1, ..., Pl> (l > 1), and |T| > |P1| + |P2|. In this case, we will call bottom-up(<T1, ..., Tk>, G). Assume that the return value is <i, v>. The following checkings will be continually conducted. iv) If v = p1’s parent, the return value is the same as <i, v>. v) If v  p1’s parent, check whether label(t) = label(v)) and i = the outdegree of v. If so, the return value will be changed to <1, v’s parent>. Otherwise, the return value remains <i, v>. v = p1’s parent = pv v  p1’s parent G: pv pv Pl v Pl P1 P2 Pi P1 P2

16 Tree inclusion algorithm
Function: bottom-up(T’, G) bottom-up(T’, G) is designed to handle the case that both T’ and G are forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G), we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k, j1 = 0, and j1  j2  ...  jh  q (for some h  k), controlled as follows. T’: G: T1 T2 Ti Tk P1 Pi Pq Before each call of top-down(Tl, <Pjl, ..., Pq>), (tl) will be checked to determine whether the call will be conducted. top-down(Tl, <Pjl, ..., Pq>)

17 Tree inclusion algorithm
Function: bottom-up(T’, G) bottom-up(T’, G) is designed to handle the case that both T’ and G are forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G), we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k, j1 = 0, and j1  j2  ...  jh  q (for some h  k), controlled as follows. Two index variables l, j are used to scan T1, ..., Tk and P1, ..., Pq, respectively. Let <il, vl> be the return value of top-down(Tl, <Pj, ..., Pq>). If vl = pj’s parent, set j to be j + il - 1. Otherwise, j is not changed. Set l to be l + 1. Go to (2). The loop terminates when all Tl’s or all Pj’s are examined. In the above process, (tl) will be checked before each top-down(Tl, <Pj, ..., Pq>) to determine whether the call will be conducted

18 Tree inclusion algorithm
Function: bottom-up(T’, G) If j > 0 when the loop terminates, bottom-up(T’, G) returns <j, p1’s parent>. T1 T2 Ti Tk P1 Pi Pj Pq

19 Tree inclusion algorithm
Function: bottom-up(T’, G) If j > 0 when the loop terminates, bottom-up(T’, G) returns <j, p1’s parent>. Otherwise, j = 0. In this case, we will continue to search for a pair <i, v> such that T’ contains the first i subtrees of v, where v  -1(v’) and v’ is the leaf node on the left-most path in P1, as described below. Let <i1, v1>, <i2, v2>, ..., <ik, vk> be the respective return values of top-down(T1, <P1, ..., Pq>), top-down(T2, <P1, ..., Pq>), top-down(Tk, <P1, ..., Pq>). Since j = 0, each vl  -1(v’) (l = 1, ..., k). P1 P2 v2 v1 vk

20 Tree inclusion algorithm
Function: bottom-up(T’, G) i) Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>), ..., top-down(Tk, <P1, ..., Pq>), respectively. Since j = 0, each vl  -1(v’) (l = 1, ..., k). ii) If each il = 0, return <0, >, where  is considered to be a descendant of any node in G. Otherwise, find the first vg with children w1, ..., wh such that vg is not a descendant of any other vj, and ig > 0. P1 T1 T2 Tg Tg+1 Tk vg v1 vk ig

21 Tree inclusion algorithm
Function: bottom-up(T’, G) iii) Check <P[wig+1], ..., P[wh]>) against <Tg+1, ..., Tk>. This can be done by a series of calls top-down(Tl, <P[wjl] ..., P[wh]>), where l = g + 1, …, k, and j1  j2  ...  jh  h (for some h  k), as illustrated below. <Tg+1, ..., Tk>: <P[wig+1], ..., P[wh]>: Tg+1 Tg+2 Tg+i Tk P[wig+1] P[wig+i] P[wh] Let <x, y> be its return value. If y = vg, then the return value of bottom-up(T’, G) is set to be <ig + x, vg>. Otherwise, the return value is <ig, vg>.

22 Tree inclusion algorithm
Further improvements In the case j = 0: Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>), ..., top-down(Tk, <P1, ..., Pq>). We will find the first vg such that it is not a descendant of any other vj and ig > 0. Then, <Tg+1, ..., Tk> will be checked against <P[wig+1], ..., P[wh]>. This shows that all the return values except <ig, vg> are not used in the subsequent computation. Thus, the work for looking for such values should be avoided. P1 T1 T2 Tg Tg+1 Tk vg v1 vk Any of these subtrees should not be checked against <P1, ..., Pq>).

23 Tree inclusion algorithm
Further improvements Let <ij, vj> be the return value of top-down(Tj, <P1, ..., Pq>) such that ij > 0 and vj is p1 or a descendant of p1. Then, during the execution of top-down(Tj+1, <P1, ..., Pq>), once we have detected that it can only produce a return value <ij+1, vj+1> with vj+1 being a descendant of vj, we should stop the corresponding computation immediately since this return value will not be used in the subsequent searching. For this purpose, we rearrange top-down(Tj+1, <P1, ..., Pq>) to top-down(Tj+1, <P1, ..., Pq>, vj) with vj being used to transfer information, called a controlling-node. Assume that in the execution of top-down(Tj+1, <P1, ..., Pq>, vj), we have the following function calls: top-down(Tj+1, <P1, ..., Pq>, u1) returns <a1, u1>, top-down(Tj+2, <P1, ..., Pq>, u2) returns <a1, u2>, … … with all uj’s being a proper descendant of vj. Then the bottom-up function call with some ui as a controlling node should not be conducted. bottom-up(<Tj+i , ... >, <… …>, ui ).

24 Summary An efficient method for tree inclusion problem
- O|T|leaves(P)|) time and - O(|T| + |P|) space where leaves(P) - set of the leaf nodes of P. Future work - adapt the algorithm to a data stream environment - adapt the algorithm to an indexing environment

25 Thank you.


Download ppt "A Linear-Space Top-down Algorithm for Tree Inclusion Problem"

Similar presentations


Ads by Google