A Linear-Space Top-down Algorithm for Tree Inclusion Problem

Slides:



Advertisements
Similar presentations
Divide and Conquer. Subject Series-Parallel Digraphs Planarity testing.
Advertisements

Binary Searching.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Transitive Closure Compression Jan. 2013Yangjun Chen ACS Outline: Transitive Closure Compression Motivation DAG decomposition into node-disjoint.
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Binary Trees, Binary Search Trees COMP171 Fall 2006.
B + -Trees Sept. 2012Yangjun Chen ACS B + -Tree Construction and Record Searching in Relational DBs Chapter 6 – 3rd (Chap. 14 – 4 th, 5 th ed.; Chap.
Jan. 2013Dr. Yangjun Chen ACS Outline Signature Files - Signature for attribute values - Signature for records - Searching a signature file Signature.
Greedy method for inferring tandem duplication history Louxin Zhang, Bin Ma, Lusheng Wang and Ying Xu. BIOINFORMATICS 2003 reference: 1.Elemento,O.,(2002)
Implementation of Graph Decomposition and Recursive Closures Graph Decomposition and Recursive Closures was published in 2003 by Professor Chen. The project.
Data Structures: Trees i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, or Glenn Brookshear.
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
An Efficient Algorithm for Answering Graph Reachability Queries Yangjun Chen, Yibin Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage.
1 Trees. 2 Outline –Tree Structures –Tree Node Level and Path Length –Binary Tree Definition –Binary Tree Nodes –Binary Search Trees.
Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada.
CHAPTER 12 Trees. 2 Tree Definition A tree is a non-linear structure, consisting of nodes and links Links: The links are represented by ordered pairs.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
CSC 2300 Data Structures & Algorithms February 6, 2007 Chapter 4. Trees.
Binary Trees Chapter 6.
CHAPTER 71 TREE. Binary Tree A binary tree T is a finite set of one or more nodes such that: (a) T is empty or (b) There is a specially designated node.
COSC2007 Data Structures II
Introduction Of Tree. Introduction A tree is a non-linear data structure in which items are arranged in sequence. It is used to represent hierarchical.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Multiway Trees. Trees with possibly more than two branches at each node are know as Multiway trees. 1. Orchards, Trees, and Binary Trees 2. Lexicographic.
Binary Trees. Binary Tree Finite (possibly empty) collection of elements A nonempty binary tree has a root element The remaining elements (if any) are.
COMP20010: Algorithms and Imperative Programming Lecture 1 Trees.
Module #19: Graph Theory: part II Rosen 5 th ed., chs. 8-9.
DATA STRUCTURE Presented By: Mahmoud Rafeek Alfarra Using C# MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
M180: Data Structures & Algorithms in Java Trees & Binary Trees Arab Open University 1.
24 January Trees CSE 2011 Winter Trees Linear access time of linked lists is prohibitive  Does there exist any simple data structure for.
A New Top-down Algorithm for Tree Inclusion Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Chapter 6 – Trees. Notice that in a tree, there is exactly one path from the root to each node.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
On the Intersection of Inverted Lists Yangjun Chen and Weixin Shen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
1 Trees. 2 Trees Trees. Binary Trees Tree Traversal.
CSCE 210 Data Structures and Algorithms
File Organization and Processing Week 3
Lecture 22 Binary Search Trees Chapter 10 of textbook
B+ Tree.
Advanced Algorithms Analysis and Design
Trees Trees are a very useful data structure. Many different kinds of trees are used in Computer Science. We shall study just a few of these.
i206: Lecture 13: Recursion, continued Trees
Binary Trees, Binary Search Trees
Chapter 20: Binary Trees.
(edited by Nadia Al-Ghreimil)
Interval Heaps Complete binary tree.
Graph Algorithms Using Depth First Search
Chapter 21: Binary Trees.
Red-Black Trees Bottom-Up Deletion.
Outline: Transitive Closure Compression
(2,4) Trees (2,4) Trees 1 (2,4) Trees (2,4) Trees
Multi-Way Search Trees
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
Week nine-ten: Trees Trees.
Data Structures: Trees and Binary Trees
Red Black Trees Top-Down Deletion.
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
Binary Trees, Binary Search Trees
(edited by Nadia Al-Ghreimil)
CE 221 Data Structures and Algorithms
CS210- Lecture 9 June 20, 2005 Announcements
On the Graph Decomposition
(2,4) Trees /6/ :26 AM (2,4) Trees (2,4) Trees
Red Black Trees Top-Down Deletion.
Trees Trees are a very useful data structure. Many different kinds of trees are used in Computer Science. We shall study just a few of these.
Binary Trees, Binary Search Trees
Binary Search Trees < > = Dictionaries
Cs212: Data Structures Lecture 7: Tree_Part1
Presentation transcript:

A Linear-Space Top-down Algorithm for Tree Inclusion Problem Prof. Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9

Outline Motivation Basic algorithm for tree inclusion problem - Definition - Algorithm description Improvements Summary

Motivation Given two ordered labeled trees P and T, called the pattern and the target, respectively. An interesting problem is: Can we obtain pattern P by deleting some nodes from target T? That is, is there a sequence v1 , ..., vk of nodes such that for T0 = T and Ti+1 = delete(Ti, vi +1) for i = 0, ..., k - 1, we have Tk = P. If this is the case, we say, P is included in T, T contains P, or say, T covers P. a b d e f T: c delete(T, c) b d e f T: a By “ordered trees”, we mean those trees, in which the order of siblings is significant.

Motivation Linguistic analysis s vp v n adv “reads” “book” s np vp det “The” “student” “reads” adj “the” “interesting” “book” “again and again”

Tree inclusion algorithm Definition Definition 1 Let F and G be labeled ordered forests. We define an ordered embedding (, G, F) as an injective function  : V(G)  V(F) such that for all nodes v, u  V(G), i) label(v) = label((v)); (label preservation condition) ii) v is an ancestor of u iff (v) is an ancestor of (u);(ancestor condition) iii) v is to the left of u iff (v) is to the left of (u); (Sibling condition) G: F: a a b b d b e b b By “ordered trees”, we mean those trees, in which the order of siblings is significant.

Tree inclusion algorithm Let T = <t; T1, ..., Tk> (k  1) be a tree and G = <P1 , ..., Pl> (l  1) be a forest. We handle G as a tree P = <pv; P1, ..., Pl>, where pv represents a virtual node, matching any node in T. Consider a node v in P with children v1, ..., vj. We use a pair <i, v> (i  j) to represent an ordered forest containing the first i subtrees of v: <P[v1], ..., P[vi]>. Then, <j, pv> represents the first j trees in G. P: v v1 vi vk … … <i, v> referred to as a left corner

Tree inclusion algorithm In addition, h(v) represents the height of v in a tree; and (v) represents a link from v in P to the leaf node on the left-most path in P[v]. P: (v1) Let v’ be a leaf node in P. We denote by -1(v’) a set of nodes x such that for each v  x (v) = v’. -1(v3) = {v1, v2, v3} v1 (v2) v2 v5 v3 v4

Tree inclusion algorithm The tree inclusion checking is done by calling two functions recursively: top-down(T, G), bottom-up(T’, G), where T is a tree, and T’ and G are two forests. G: … P1 P2 pv Pl v T = <t; T1, ..., Tk> T’ = <T1’, ..., Tk’> G = <P1, ..., PL> Each of the two functions returns a pair <i, v> with v being pv or a node on the left-most path in P1. Each node in T will be associated with a data structure, (t). - Initially, (t) is set . - By each call of top-down(<t; T1, ..., Tk>, G), (t) is changed to <i, v>, where <i, v> is the return value of top-down(<t; T1, ..., Tk>, G). ((t) is mainly used in the execution of bottom-up(T’, G) to avoid repeated computation.)

Tree inclusion algorithm Function: top-down(T, G) In top-down(T, G), two cases will be handled. Case 1: G = <P1>; or G = <P1, ..., Pl> (l > 1), but |T|  |P1| + |P2|. In this case, we try to find a pair <i, v> such that T contains the first i subtrees of v, where v = pv , or v  -1(v’) and v’ is the leaf node on the left-most path in P1. T: t G: pv p1 P1 G: pv T: t p1 P2 … Pl |T|  |P1| + |P2|. P1

Tree inclusion algorithm Function: top-down(T, G) case 1: i) If t is a leaf node, we will check whether label(t) = label((p1)), where p1 is the root of P1. If it is the case, return <1, parent of (p1)>. Otherwise, return <0, parent of (p1)>. T = <t; T1, ..., Tk>: G: pv t P1 G: pv T = <t; T1, ..., Tk>: t … Pl |T|  |P1| + |P2|. P1 P2

Tree inclusion algorithm Function: top-down(T, G) case 1: If |T| < |P1| or height(t) < height(p1), we will make a recursive call top-down(T , <P11, ..., P1j>), where <P11, ..., P1j> be a forest of the subtrees of p1. The return value of top-down(T , <P11, ..., P1j>) is used as the return value of top-down(T, G) pv T: G: t |T| < |P1| p1 … P11 P1j P1i … Pl

Tree inclusion algorithm Function: top-down(T, G) case 1: If |T|  |P1| (but |T |  |P1| + |P2|) and height(t)  height(p1), two cases need to be considered: label(t) = label(p1). Call bottom-up(<T1, ..., Tk>, <P11, ..., P1j>). p1 … P11 P1j P1i t T1 Tk Ti label(t) = label(p1) label(t)  label(p1). Call bottom-up(<T1, ..., Tk>, <P1>). t p1 label(t)  label(p1) T1 Ti Tk P11 P1i P1j … … … …

Tree inclusion algorithm Function: top-down(T, G) case 1: In both sub-cases, assume that the return value of the corresponding bottom-up function is <i, v>. A further checking needs to be conducted: If label(t) = label(v) and i = the outdegree of v, the return value should be <1, v’s parent>. Otherwise, the return value is the same as <i, v>. P1: p1 v T: label(t) = label(v) t or label(t)  label(v)

Tree inclusion algorithm Function: top-down(T, G) Case 1: G = <P1>; or G = <P1, ..., Pl> (l > 1), but |T|  |P1| + |P2|. Case 2: G = <P1, ..., Pl> (l > 1), and |T| > |P1| + |P2|. In this case, we will call bottom-up(<T1, ..., Tk>, G). Assume that the return value is <i, v>. The following checkings will be continually conducted. T: t G: pv |T| > |P1| + |P2| Tk … Pl … T1 T2 P1 P2

Tree inclusion algorithm Function: top-down(T, G) Case 2: G = <P1, ..., Pl> (l > 1), and |T| > |P1| + |P2|. In this case, we will call bottom-up(<T1, ..., Tk>, G). Assume that the return value is <i, v>. The following checkings will be continually conducted. iv) If v = p1’s parent, the return value is the same as <i, v>. v) If v  p1’s parent, check whether label(t) = label(v)) and i = the outdegree of v. If so, the return value will be changed to <1, v’s parent>. Otherwise, the return value remains <i, v>. v = p1’s parent = pv v  p1’s parent G: pv pv … … Pl v … Pl P1 P2 Pi P1 P2

Tree inclusion algorithm Function: bottom-up(T’, G) bottom-up(T’, G) is designed to handle the case that both T’ and G are forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G), we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k, j1 = 0, and j1  j2  ...  jh  q (for some h  k), controlled as follows. T’: G: T1 T2 Ti Tk P1 Pi Pq … … … … Before each call of top-down(Tl, <Pjl, ..., Pq>), (tl) will be checked to determine whether the call will be conducted. top-down(Tl, <Pjl, ..., Pq>)

Tree inclusion algorithm Function: bottom-up(T’, G) bottom-up(T’, G) is designed to handle the case that both T’ and G are forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G), we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k, j1 = 0, and j1  j2  ...  jh  q (for some h  k), controlled as follows. Two index variables l, j are used to scan T1, ..., Tk and P1, ..., Pq, respectively. Let <il, vl> be the return value of top-down(Tl, <Pj, ..., Pq>). If vl = pj’s parent, set j to be j + il - 1. Otherwise, j is not changed. Set l to be l + 1. Go to (2). The loop terminates when all Tl’s or all Pj’s are examined. In the above process, (tl) will be checked before each top-down(Tl, <Pj, ..., Pq>) to determine whether the call will be conducted

Tree inclusion algorithm Function: bottom-up(T’, G) If j > 0 when the loop terminates, bottom-up(T’, G) returns <j, p1’s parent>. T1 T2 Ti Tk P1 Pi Pj Pq … … … …

Tree inclusion algorithm Function: bottom-up(T’, G) If j > 0 when the loop terminates, bottom-up(T’, G) returns <j, p1’s parent>. Otherwise, j = 0. In this case, we will continue to search for a pair <i, v> such that T’ contains the first i subtrees of v, where v  -1(v’) and v’ is the leaf node on the left-most path in P1, as described below. Let <i1, v1>, <i2, v2>, ..., <ik, vk> be the respective return values of top-down(T1, <P1, ..., Pq>), top-down(T2, <P1, ..., Pq>), ... ... top-down(Tk, <P1, ..., Pq>). Since j = 0, each vl  -1(v’) (l = 1, ..., k). P1 P2 v2 … v1 … vk

Tree inclusion algorithm Function: bottom-up(T’, G) i) Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>), ..., top-down(Tk, <P1, ..., Pq>), respectively. Since j = 0, each vl  -1(v’) (l = 1, ..., k). ii) If each il = 0, return <0, >, where  is considered to be a descendant of any node in G. Otherwise, find the first vg with children w1, ..., wh such that vg is not a descendant of any other vj, and ig > 0. P1 T1 T2 Tg Tg+1 Tk vg … … … … v1 vk ig

Tree inclusion algorithm Function: bottom-up(T’, G) iii) Check <P[wig+1], ..., P[wh]>) against <Tg+1, ..., Tk>. This can be done by a series of calls top-down(Tl, <P[wjl] ..., P[wh]>), where l = g + 1, …, k, and j1  j2  ...  jh  h (for some h  k), as illustrated below. <Tg+1, ..., Tk>: <P[wig+1], ..., P[wh]>: Tg+1 Tg+2 Tg+i Tk P[wig+1] P[wig+i] P[wh] … … … … Let <x, y> be its return value. If y = vg, then the return value of bottom-up(T’, G) is set to be <ig + x, vg>. Otherwise, the return value is <ig, vg>.

Tree inclusion algorithm Further improvements In the case j = 0: Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>), ..., top-down(Tk, <P1, ..., Pq>). We will find the first vg such that it is not a descendant of any other vj and ig > 0. Then, <Tg+1, ..., Tk> will be checked against <P[wig+1], ..., P[wh]>. This shows that all the return values except <ig, vg> are not used in the subsequent computation. Thus, the work for looking for such values should be avoided. P1 T1 T2 Tg Tg+1 Tk vg … … … … v1 vk Any of these subtrees should not be checked against <P1, ..., Pq>).

Tree inclusion algorithm Further improvements Let <ij, vj> be the return value of top-down(Tj, <P1, ..., Pq>) such that ij > 0 and vj is p1 or a descendant of p1. Then, during the execution of top-down(Tj+1, <P1, ..., Pq>), once we have detected that it can only produce a return value <ij+1, vj+1> with vj+1 being a descendant of vj, we should stop the corresponding computation immediately since this return value will not be used in the subsequent searching. For this purpose, we rearrange top-down(Tj+1, <P1, ..., Pq>) to top-down(Tj+1, <P1, ..., Pq>, vj) with vj being used to transfer information, called a controlling-node. Assume that in the execution of top-down(Tj+1, <P1, ..., Pq>, vj), we have the following function calls: top-down(Tj+1, <P1, ..., Pq>, u1) returns <a1, u1>, top-down(Tj+2, <P1, ..., Pq>, u2) returns <a1, u2>, … … with all uj’s being a proper descendant of vj. Then the bottom-up function call with some ui as a controlling node should not be conducted. bottom-up(<Tj+i , ... >, <… …>, ui ).

Summary An efficient method for tree inclusion problem - O|T|leaves(P)|) time and - O(|T| + |P|) space where leaves(P) - set of the leaf nodes of P. Future work - adapt the algorithm to a data stream environment - adapt the algorithm to an indexing environment

Thank you.