Download presentation
Presentation is loading. Please wait.
Published byChristopher West Modified over 9 years ago
1
Requests to Tsong-Li 1. Related work at end of each section 2. Screen dumps of treebase at end of treesearch section (you’ll see where) 3. Web addresses at the very end.
2
Searching for and Comparing Trees and Graphs Dennis Shasha, shasha@cs.nyu.edu Courant Institute, NYU Joint work with Kaizhong Zhang and Jason Wang
3
Philosophy Trees and graphs represent data in many domains in linguistics, chemistry, and even maybe the web. Question: why can’t I search for trees or graphs at the speed of keyword searches? Why can’t I compare trees (or graphs) as easily as I can compare strings?
4
Tree Searching Given a small tree t is it present in a bigger tree T?
5
What does “present” mean? Preserving sibling order or not Preserving ancestor order Preserving distance Mismatches
6
Sibling Order Order of children of a node: A B C A C B ?=?=
7
Ancestor Order Order between children and parent. A B C A C B ?=?=
8
Ancestor Distance Can children become grandchildren: A B C A B X ?=?= C
9
Mismatches Can there be relabellings, inserts, and deletes (Tolstoy problem): A B C A X C how far?
10
Bottom Line There is no one definition of mismatch or subtree (Tolstoy problem). You must choose the package that suits you. I will tell you about three.
11
TreeSearch Query Language Query language is simply a tree decorated with single length don’t cares (?) and variable length don’t cares (*). A * B C ? D >= 0, on each side =1
12
Exact Match Query matches exactly if contained regardless of sibling order or other nodes A * B C ? D = X Y A W Z C B X Q D U
13
Inexact Match Inexact match if missing or differing node labels. Higher differences cost more. A * B C ? D Differ by 1 X Y A W Z C B X Q E U
14
Treesearch Conceptual Algorithm Take all paths in query tree. Find out where each path is in the data tree. So notion of distance is number of paths that differ. Higher nodes are more important. Implementation: suffix array. A few seconds on several thousand trees.
15
Treesearch Review Ancestor order matters. Sibling order doesn’t. Don’t cares: * and ? Distance metric is based on numbers of path differences. Sister system built by Divesh and Sihem at Bell Labs that allows terms to be “generalized”
16
Tsong-Li: screen dumps of treebase then related work
17
Tree Edit Order of children matters A B C A’ C B A->A’ del(B) ins(B)
18
Tree Edit in General Operations are relabel A->A’, delete (X), insert (B). A X C A’ C B A->A’ del(B) ins(B) C C
19
Review of Tree Edit Generalizes string editing distance for trees, a dynamic programming algorithm. O(|T1| |T2| depth(T1) depth(T2)) The basis for XMLdiff. Also has * and best removal of subtrees.
20
Tsong-Li: related work here
21
Graph Edit Thesis work of Rosalba Giugno. Find a small graph (with * and ?) in a big graph. Doesn’t work fast if query graph is big because graph subisomorphism is exponential.
22
Example of GraphGrep Query graph has nodes and don’t cares A B * D C
23
Summary of Tools Why can’t tree and graph search be like keyword search? We are getting there and will provide software if you are interested. Current downloads of about 50.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.