Searching for and Comparing Trees and Graphs

Slides:



Advertisements
Similar presentations
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Advertisements

Indexing DNA Sequences Using q-Grams
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava,
§6 Leftist Heaps CHAPTER 5 Graph Algorithms  Heap: Structure Property + Order Property Target : Speed up merging in O(N). Leftist Heap: Order Property.
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
RDFBrowser A tool to analyse metadata Bernhard Schueler CSCI 8350, Spring 2002,UGA.
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
1 Trees Tree nomenclature Implementation strategies Traversals –Depth-first –Breadth-first Implementing binary trees Reading: L&C 9.1 – 9.7.
PODS Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Courant Institute, NYU Joint work with Jason Wang.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
Trees. 2 Definition of a tree A tree is like a binary tree, except that a node may have any number of children Depending on the needs of the program,
Trees. Definition of a tree A tree is like a binary tree, except that a node may have any number of children –Depending on the needs of the program, the.
PODS Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Courant Institute, NYU Joint work with Jason Wang.
Fall 2007CS 2251 Self-Balancing Search Trees Chapter 9.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada.
Modern Information Retrieval Chapter 4 Query Languages.
PODS Phylogenetic Tree Comparison using a “Cousins” Approach Dennis Shasha, Courant Institute, NYU.
Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant.
Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim
Requests to Tsong-Li 1. Related work at end of each section 2. Screen dumps of treebase at end of treesearch section (you’ll see where) 3. Web addresses.
Querying Structured Text in an XML Database By Xuemei Luo.
Tree (new ADT) Terminology:  A tree is a collection of elements (nodes)  Each node may have 0 or more successors (called children)  How many does a.
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
TAX: A Tree Algebra for XML H.V. Jagadish Laks V.S. Lakshmanan Univ. of Michigan Univ. of British Columbia Divesh Srivastava Keith Thompson AT&T Labs –
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
Data Structures: A Pseudocode Approach with C, Second Edition 1 Chapter 7 Objectives Create and implement binary search trees Understand the operation.
1 Review of report "LSDX: A New Labeling Scheme for Dynamically Updating XML Data"
Holistic Twig Joins: Optimal XML Pattern Matching Written by: Nicolas Bruno Nick Koudas Divesh Srivastava Presented by: Jose Luna John Bassett.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
1 The tree data structure Outline In this topic, we will cover: –Definition of a tree data structure and its components –Concepts of: Root, internal, and.
Trees By JJ Shepherd. Introduction Last time we discussed searching and sorting in a more efficient way Divide and Conquer – Binary Search – Merge Sort.
An (Apparently) Indirect Route to Tenure Prof. Dennis Shasha Courant Institute, Computer Science New York University.
Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.
Neo4j: GRAPH DATABASE 27 March, 2017
Non Linear Data Structure
By A. Aboulnaga, A. R. Alameldeen and J. F. Naughton Vldb’01
Welcome to the Course of Web and Document Databases
Insertion/Deletion in binary trees
Efficient processing of path query with not-predicates on XML data
Binary search tree. Removing a node
Andrzej Ehrenfeucht, University of Colorado, Boulder
Holistic Twig Joins: Optimal XML Pattern Matching
Trees Tree nomenclature Implementation strategies Traversals
Lecture 18. Basics and types of Trees
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Trees.
Integrating XML Data Sources Using Approximate Joins
i206: Lecture 13: Recursion, continued Trees
Graph Search with Indexing
CS200: Algorithms Analysis
Trees 7/14/2009.
Design of Declarative Graph Query Languages: On the Choice between Value, Pattern and Object based Representations for Graphs Hasan Jamil Department of.
CSE373: Data Structures & Algorithms Lecture 5: AVL Trees
Structure and Content Scoring for XML
CMSC 202 Trees.
Trees.
Binary Trees, Binary Search Trees
Tries 2/27/2019 5:37 PM Tries Tries.
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Structure and Content Scoring for XML
Query Optimization.
Important Problem Types and Fundamental Data Structures
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Binary Trees, Binary Search Trees
Relax and Adapt: Computing Top-k Matches to XPath Queries
瞿裕忠(Yuzhong Qu) 计算机科学与技术系
Interesting Algorithms for Real World Problems
Presentation transcript:

Searching for and Comparing Trees and Graphs Dennis Shasha, shasha@cs.nyu.edu Courant Institute, NYU Joint work with Kaizhong Zhang and Jason Wang

Philosophy Trees and graphs represent data in many domains in linguistics, chemistry, and even maybe the web. Question: why can’t I search for trees or graphs at the speed of keyword searches? Why can’t I compare trees (or graphs) as easily as I can compare strings?

Tree Searching Given a small tree t is it present in a bigger tree T?

What does “present” mean? Preserving sibling order or not Preserving ancestor order Preserving distance Mismatches

Sibling Order Order of children of a node: A A ? = B B C C

Ancestor Order Order between children and parent. C A ? = A B C B

Ancestor Distance Can children become grandchildren: A A ? = X B B C C

Mismatches Can there be relabellings, inserts, and deletes (Tolstoy problem): A A how far? C B X C

Bottom Line There is no one definition of mismatch or subtree (Tolstoy problem). You must choose the package that suits you. I will tell you about three.

TreeSearch Query Language Query language is simply a tree decorated with single length don’t cares (?) and variable length don’t cares (*). A >= 0, on each side ? =1 * C D B

Exact Match Query matches exactly if contained regardless of sibling order or other nodes X A A Y Q X ? = W * B Z D C D U B C

Inexact Match Inexact match if missing or differing node labels. Higher differences cost more. X A A Y Q X ? Differ by 1 W * B Z E C D U B C

Treesearch Conceptual Algorithm Take all paths in query tree. Find out where each path is in the data tree. So notion of distance is number of paths that differ. Higher nodes are more important. Implementation: suffix array. A few seconds on several thousand trees.

Treesearch Review Ancestor order matters. Sibling order doesn’t. Don’t cares: * and ? Distance metric is based on numbers of path differences. Sister system built by Divesh and Sihem at Bell Labs that allows terms to be “generalized”

Screenshots of TreeSearch on TreeBASE

Query Screen

Query Tree Format

Search Results

Query Tree

One of the Result Trees

Related Work S. Amer-Yahia, S. Cho, L.V.S. Lakshmanan, and D. Srivastava. Minimization of tree pattern queries. SIGMOD, 2001. Z. Chen, H. V. Jagadish, F. Korn, N. Koudas, S. Muthukrishnan, R. T. Ng, and D. Srivastava. Counting twig matches in a tree. ICDE, 2001. J. Cracraft and M. Donoghue. Assembling the tree of life: Research needs in phylogenetics and phyloinformatics. NSF Workshop Report, Yale University, 2000.

Tree Edit Order of children matters A’ A A->A’ del(B) ins(B) B B C

Tree Edit in General Operations are relabel A->A’, delete (X), insert (B). A’ A A->A’ del(X) ins(B) B X C C C C

Review of Tree Edit Generalizes string editing distance for trees, a dynamic programming algorithm. O(|T1| |T2| depth(T1) depth(T2)) The basis for XMLdiff. Also has * and best removal of subtrees.

Related Work IBM XML Diff and Merge Tool. http://www.alphaworks.ibm.com/aw.nsf/textResearchers/CB2EF938D7532F338825671B0068244F K. Zhang and D. Shasha. Editing distance between trees. SIAM J. Comp., 1989. K. Zhang, D. Shasha and J. T. L. Wang. Approximate tree matching in the presence of variable length don't cares. Journal of Algorithms, 1994.

Graph Edit Thesis work of Rosalba Giugno. Find a small graph (with * and ?) in a big graph. Doesn’t work fast if query graph is big because graph subisomorphism is exponential.

Example of GraphGrep Query graph has nodes and don’t cares A C D * B

Related Work P. Buneman, M. F. Fernandez, and D. Suciu. UnQL: a query language and algebra for semistructured data based on structural recursion. VLDB Journal, 2000. A. O. Mendelzon and P. T. Wood. Finding regular simple paths in graph databases. VLDB, 1989.  Daylight Chemical Information Systems. http:// www.daylight.com/. Protein Structure Search. http://sss.berkeley.edu/ Web Structure Search. http://www.almaden.ibm.com/cs/k53/clever.html

Summary of Tools Why can’t tree and graph search be like keyword search? We are getting there and will provide software if you are interested. Current downloads of about 50.

URLs for Tools http://www.cs.nyu.edu/shasha/papers/graphgrep http://cs.nyu.edu/cs/faculty/shasha/papers/treesearch.html http://web.njit.edu/~wangj/sigmod.html