Presentation is loading. Please wait.

Presentation is loading. Please wait.

Searching for and Comparing Trees and Graphs

Similar presentations


Presentation on theme: "Searching for and Comparing Trees and Graphs"— Presentation transcript:

1 Searching for and Comparing Trees and Graphs
Dennis Shasha, Courant Institute, NYU Joint work with Kaizhong Zhang and Jason Wang

2 Philosophy Trees and graphs represent data in many domains in linguistics, chemistry, and even maybe the web. Question: why can’t I search for trees or graphs at the speed of keyword searches? Why can’t I compare trees (or graphs) as easily as I can compare strings?

3 Tree Searching Given a small tree t is it present in a bigger tree T?

4 What does “present” mean?
Preserving sibling order or not Preserving ancestor order Preserving distance Mismatches

5 Sibling Order Order of children of a node: A A ? = B B C C

6 Ancestor Order Order between children and parent. C A ? = A B C B

7 Ancestor Distance Can children become grandchildren: A A ? = X B B C C

8 Mismatches Can there be relabellings, inserts, and deletes (Tolstoy problem): A A how far? C B X C

9 Bottom Line There is no one definition of mismatch or subtree (Tolstoy problem). You must choose the package that suits you. I will tell you about three.

10 TreeSearch Query Language
Query language is simply a tree decorated with single length don’t cares (?) and variable length don’t cares (*). A >= 0, on each side ? =1 * C D B

11 Exact Match Query matches exactly if contained regardless of sibling order or other nodes X A A Y Q X ? = W * B Z D C D U B C

12 Inexact Match Inexact match if missing or differing node labels. Higher differences cost more. X A A Y Q X ? Differ by 1 W * B Z E C D U B C

13 Treesearch Conceptual Algorithm
Take all paths in query tree. Find out where each path is in the data tree. So notion of distance is number of paths that differ. Higher nodes are more important. Implementation: suffix array. A few seconds on several thousand trees.

14 Treesearch Review Ancestor order matters. Sibling order doesn’t.
Don’t cares: * and ? Distance metric is based on numbers of path differences. Sister system built by Divesh and Sihem at Bell Labs that allows terms to be “generalized”

15 Screenshots of TreeSearch
on TreeBASE

16 Query Screen

17 Query Tree Format

18 Search Results

19 Query Tree

20 One of the Result Trees

21 Related Work S. Amer-Yahia, S. Cho, L.V.S. Lakshmanan, and D. Srivastava. Minimization of tree pattern queries. SIGMOD, 2001. Z. Chen, H. V. Jagadish, F. Korn, N. Koudas, S. Muthukrishnan, R. T. Ng, and D. Srivastava. Counting twig matches in a tree. ICDE, 2001. J. Cracraft and M. Donoghue. Assembling the tree of life: Research needs in phylogenetics and phyloinformatics. NSF Workshop Report, Yale University, 2000.

22 Tree Edit Order of children matters A’ A A->A’ del(B) ins(B) B B C

23 Tree Edit in General Operations are relabel A->A’, delete (X), insert (B). A’ A A->A’ del(X) ins(B) B X C C C C

24 Review of Tree Edit Generalizes string editing distance for trees, a dynamic programming algorithm. O(|T1| |T2| depth(T1) depth(T2)) The basis for XMLdiff. Also has * and best removal of subtrees.

25 Related Work IBM XML Diff and Merge Tool. K. Zhang and D. Shasha. Editing distance between trees. SIAM J. Comp., 1989. K. Zhang, D. Shasha and J. T. L. Wang. Approximate tree matching in the presence of variable length don't cares. Journal of Algorithms, 1994.

26 Graph Edit Thesis work of Rosalba Giugno.
Find a small graph (with * and ?) in a big graph. Doesn’t work fast if query graph is big because graph subisomorphism is exponential.

27 Example of GraphGrep Query graph has nodes and don’t cares A C D * B

28 Related Work P. Buneman, M. F. Fernandez, and D. Suciu. UnQL: a query language and algebra for semistructured data based on structural recursion. VLDB Journal, 2000. A. O. Mendelzon and P. T. Wood. Finding regular simple paths in graph databases. VLDB, 1989.  Daylight Chemical Information Systems. Protein Structure Search. Web Structure Search.

29 Summary of Tools Why can’t tree and graph search be like keyword search? We are getting there and will provide software if you are interested. Current downloads of about 50.

30 URLs for Tools http://www.cs.nyu.edu/shasha/papers/graphgrep


Download ppt "Searching for and Comparing Trees and Graphs"

Similar presentations


Ads by Google