Structural Joins: A Primitive for Efficient XML Query Pattern Matching

Slides:



Advertisements
Similar presentations
Ting Chen, Jiaheng Lu, Tok Wang Ling
Advertisements

APWeb 2004 Hangzhou, China 1 Labeling and Querying Dynamic XML Trees Jiaheng Lu and Tok Wang Ling School of Computing National University of Singapore.
INTERVAL TREE & SEGMENTATION TREE
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava,
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
QUANZHONG LI BONGKI MOON Indexing & Querying XML Data for../Regular Path Expressions/* SUNDAR SUPRIYA.
XML Query Processing Talk prepared by Bhavana Dalvi ( ) Uma Sawant ( )
Web Data Management XML Query Evaluation 1. Motivation PTIME algorithms for evaluating XPath queries: – Simple tree navigation – Translation into logic.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Aho-Corasick String Matching An Efficient String Matching.
CSC 2300 Data Structures & Algorithms February 6, 2007 Chapter 4. Trees.
1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.
1 Implementation of Relational Operations: Joins.
1 Prefix Path Streaming: a New Clustering Method for XML Twig Pattern Matching Ting Chen, Tok Wang Ling, Chee-Yong Chan School of Computing, National University.
COSC2007 Data Structures II
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
Database Management 9. course. Execution of queries.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Data Structures : Project 5 Data Structures Project 5 – Expression Trees and Code Generation.
VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
CS4432: Database Systems II Query Processing- Part 2.
Rooted Tree a b d ef i j g h c k root parent node (self) child descendent leaf (no children) e, i, k, g, h are leaves internal node (not a leaf) sibling.
Dr. N. MamoulisAdvanced Database Technologies1 Topic 8: Semi-structured Data In various application domains, the data are semi-structured; the database.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Reading data into sorted list Want to suck text file in and produce sorted list of the contents: Option 1 : read directly into array based list, sort afterwards.
Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.
1 Structural Join Algorithms – Examples Key property: x is a descendant (resp., child) of y iff x.docId = y.docId & x.StartPos < y.StartPos
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.
What is a Tree? Formally, we define a tree T as a set of nodes storing elements such that the nodes have a parent-child relationship, that satisfies the.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
1 Traversal Algorithms  sequential polling algorithm  traversing connected networks (tree construction) complexity measures tree terminology tarry’s.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava,
Sorting With Priority Queue In-place Extra O(N) space
Paul Ammann & Jeff Offutt
CSCE 210 Data Structures and Algorithms
Efficient processing of path query with not-predicates on XML data
Database Management System
Presented by Sandhya Rani Are Prabhas Kumar Samanta
Introduction to Algorithms
i206: Lecture 13: Recursion, continued Trees
Depth-First Search.
TREES General trees Binary trees Binary search trees AVL trees
Paul Ammann & Jeff Offutt
Paul Ammann & Jeff Offutt
Yan Huang - CSCI5330 Database Implementation – Access Methods
Joining Interval Data in Relational Databases
Selected Topics: External Sorting, Join Algorithms, …
Lecture 2- Query Processing (continued)
Binary Trees, Binary Search Trees
COMP171 Depth-First Search.
Program to find equivalence classes
Revision of C++.
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Binary Trees, Binary Search Trees
Presentation transcript:

Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002

Element Numbering (documentId, startpos:endpos, level)

Join Conditions Using Numbering (D1, S1:E1, L1) (D2, S2:E2, L2) Ancestor-Descendant D1 = D2, S1 < S2 < E2 < E1 Parent-Child D1 = D2, S1 < S2 < E2 < E1, L1 + 1 = L2

Tree pattern >> Structural Relationship

Structural Join Input Output 2 algorithms presented 2 element lists Ancestor and descendant; parent and child Sorted by start position Output Pairs of ancestor/descendant or parent/child Sorted by first or second element 2 algorithms presented With and without stacks Both with ordering by ancestor and by descendant

Example of results Ancestor Descendant Parent/child 1,20 Ancestor 2,11 12,19 Descendant 3,10 13,18 4,5 6,7 8,9 14,15 16,17 Parent/child Interval representation 1 20 2 11 12 19

Tree Merge Join ordered by ancestor

TREE 1,26, 2,3 4,13 14,15 16,23 24,25 Skip descendants with START < ancestor.start FOR each ancestor Check/output descendants until START > ancestor.end 5,12 17,22 6,7 8,9 10,11 18,19 20,21 4,13 14,15 5,12 16,23 17,22 4,13 skip loop skip loop 5,12 14,15 skip no match 6,7 8,9 10,11 18,19 20,21 2,3 24,25 Results: [4,13+6,7][4,13+8,9][4,13+10,11] Results: [5,13+6,7][5,13+8,9][5,13+10,11] …

Tree Merge Join ordered by descendant

TREE 1,26, 2,3 4,13 14,15 16,23 24,25 Skip ancestors with END < descendant.start FOR each descendant Check/output ancestors until START > descendant.end 5,12 17,22 6,7 8,9 10,11 18,19 20,21 Results: [6,7+4,13][6,7+5,12] [8,9+4,13][8,9+5,12] … skip 6,7 skip 8,9 2,3 no match 4,13 14,15 5,12 16,23 17,22 2,3 6,7 8,9 10,11 18,19 20,21 24,25

Complexity For ancestor-descendant relationships: Tree-Merge-Anc time complexity optimal May be quadratic, but proportional to output size But can have poor IO performance For parent-child relationships Tree merge cost may still be quadratic, but output size can only be linear Tree-Merge-Desc can be quadratic in output size

Worst-Case Examples a1 has the whole d list as descendants a2 has from d2 to d2n-1 as descendants and so on Which means: practically quadratic performance (each ancestor has to check the whole descendant list)

Worst-Case Examples Equivalent situation considering when considering Tree-Merge-Desc

Stack-Tree Algorithm Basic idea: depth first traversal of XML tree Linear time with stack size = depth of tree All ancestor-descendant relationships appear on stack during traversal Traverse the lists only once Main problem: do not want to traverse the whole database, just nodes in A-list/D-list

Stack-Tree-Desc

TREE Print in order of descendants Keep ancestors in the same path in a stack When descendant comes, it is descendant of the whole stack, then print them Pop from stack when a different path is processed e.g. when 14,15 comes, both previous ancestors are popped 1,26, 2,3 4,13 14,15 16,23 24,25 5,12 17,22 6,7 8,9 10,11 18,19 20,21 4,13 5,12 14,15 16,23 17,22 Results: [4,13+6,7] 4,13 Print 8,9 with the whole stack: [4,13+8,9] [4,13+5,12] 5,12 Results: [4,13+6,7] [5,12+6,7] 4,13 5,12 Results: [4,13+6,7] [5,12+6,7] 4,13 skip POP!! and keep going stack 6,7 8,9 10,11 18,19 20,21 2,3 24,25 stack

Example of Stack-Tree-Desc Execution

Stack-Tree-Anc Basic problem: results from a particular descendant cannot be output immediately Later descendants may match earlier ancestor Solution: keep lists of matching descendant nodes with each stack node Self-list Descendants that match this node Add descendant node to self-lists of all matching ancestor nodes Inherit list Inherited from descendants already popped from stack, to be output after self-list matches are output

Stack-Tree Analysis Stack-Tree-Desc Stack-Tree-Anc Time complexity (for anc-desc and par-child) O(|Alist| + |Dlist| + |OutputList|) IO Complexity (for anc-desc and par-child) O(|Alist|/B + |Dlist|/B + |OutputList|/B) Where B is blocking factor Stack-Tree-Anc Requires careful handling of lists Complexity is same as for Desc case

Performance Study