Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.

Slides:



Advertisements
Similar presentations
Ting Chen, Jiaheng Lu, Tok Wang Ling
Advertisements

APWeb 2004 Hangzhou, China 1 Labeling and Querying Dynamic XML Trees Jiaheng Lu and Tok Wang Ling School of Computing National University of Singapore.
Alyce Brady CS 470: Data Structures CS 510: Computer Algorithms Post-order Traversal: Left Child - Right Child - Root Depth-First Search.
INTERVAL TREE & SEGMENTATION TREE
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava,
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
QUANZHONG LI BONGKI MOON Indexing & Querying XML Data for../Regular Path Expressions/* SUNDAR SUPRIYA.
XML Query Processing Talk prepared by Bhavana Dalvi ( ) Uma Sawant ( )
Web Data Management XML Query Evaluation 1. Motivation PTIME algorithms for evaluating XPath queries: – Simple tree navigation – Translation into logic.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Aho-Corasick String Matching An Efficient String Matching.
More Graph Algorithms Weiss ch Exercise: MST idea from yesterday Alternative minimum spanning tree algorithm idea Idea: Look at smallest edge not.
CSC 2300 Data Structures & Algorithms February 6, 2007 Chapter 4. Trees.
1 Implementation of Relational Operations: Joins.
Randomized Algorithms - Treaps
1 Prefix Path Streaming: a New Clustering Method for XML Twig Pattern Matching Ting Chen, Tok Wang Ling, Chee-Yong Chan School of Computing, National University.
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
Database Management 9. course. Execution of queries.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
1 Trees A tree is a data structure used to represent different kinds of data and help solve a number of algorithmic problems Game trees (i.e., chess ),
VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
Union-find Algorithm Presented by Michael Cassarino.
CS4432: Database Systems II Query Processing- Part 2.
Review for Final Exam – cs411/511 Definitions (5 questions, 2 points each) Algorithm Analysis (3 questions, 3 points each) General Questions (3 questions,
Rooted Tree a b d ef i j g h c k root parent node (self) child descendent leaf (no children) e, i, k, g, h are leaves internal node (not a leaf) sibling.
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees Linda Shapiro Winter 2015.
Dr. N. MamoulisAdvanced Database Technologies1 Topic 8: Semi-structured Data In various application domains, the data are semi-structured; the database.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Reading data into sorted list Want to suck text file in and produce sorted list of the contents: Option 1 : read directly into array based list, sort afterwards.
Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
1 Structural Join Algorithms – Examples Key property: x is a descendant (resp., child) of y iff x.docId = y.docId & x.StartPos < y.StartPos
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
1 Traversal Algorithms  sequential polling algorithm  traversing connected networks (tree construction) complexity measures tree terminology tarry’s.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava,
Paul Ammann & Jeff Offutt
Efficient processing of path query with not-predicates on XML data
Database Management System
Presented by Sandhya Rani Are Prabhas Kumar Samanta
Lab Find Live CMPUT 229.
Depth-First Search.
Paul Ammann & Jeff Offutt
Paul Ammann & Jeff Offutt
Yan Huang - CSCI5330 Database Implementation – Access Methods
Joining Interval Data in Relational Databases
Algorithms + Data Structures = Programs -Niklaus Wirth
Lecture 2- Query Processing (continued)
COMP171 Depth-First Search.
Program to find equivalence classes
Revision of C++.
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
Presentation transcript:

Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002

Element Numbering (documentId, startpos:endpos, level)

Join Conditions Using Numbering (D1, S1:E1, L1) (D2, S2:E2, L2) Ancestor-Descendant –D1 = D2, S1 < S2 < E2 < E1 Parent-Child –D1 = D2, S1 < S2 < E2 < E1, L1 + 1 = L2

Tree pattern >> Structural Relationship

Structural Join Input –2 element lists Ancestor and descendant; parent and child –Sorted by start position Output –Pairs of ancestor/descendant or parent/child –Sorted by first or second element 2 algorithms presented –With and without stacks –Both with ordering by ancestor and by descendant

Example of results 2,11 3,10 4,56,78,9 1,20 Ancestor Descendant Parent/child 12,19 13,18 14,1516, Interval representation

Tree Merge Join ordered by ancestor

4,13 14,155,1216,2317,2214,15 skip no match skip loop skip 5,124,13 skip loop skip TREE 4,13 5,12 6,78,910,11 1,26, 16,23 17,22 18,1920,21 2,324,2514,15 6,78,910,11 18,1920,21 2,324,25 Results: [4,13+6,7][4,13+8,9][4,13+10,11] Results: [5,13+6,7][5,13+8,9][5,13+10,11] … 1.Skip descendants with START < ancestor.start 2.FOR each ancestor Check/output descendants until START > ancestor.end

Tree Merge Join ordered by descendant

8,9 skip 8,9 2,3 no match 2,3 6,7 skip 6,7 4,13 14,155,1216,2317,22 10,11 18,1920,21 24,25 TREE 4,13 5,12 6,78,910,11 1,26, 16,23 17,22 18,1920,21 2,324,2514,15 Results: [6,7+4,13][6,7+5,12] [8,9+4,13][8,9+5,12] … 1.Skip ancestors with END < descendant.start 2.FOR each descendant Check/output ancestors until START > descendant.end

Complexity For ancestor-descendant relationships: –Tree-Merge-Anc time complexity optimal May be quadratic, but proportional to output size –But can have poor IO performance For parent-child relationships –Tree merge cost may still be quadratic, but output size can only be linear Tree-Merge-Desc can be quadratic in output size

Worst-Case Examples a1 has the whole d list as descendants a2 has from d2 to d2n-1 as descendants and so on Which means: practically quadratic performance (each ancestor has to check the whole descendant list)

Worst-Case Examples Equivalent situation considering when considering Tree- Merge-Desc

Stack-Tree Algorithm Basic idea: depth first traversal of XML tree –Linear time with stack size = depth of tree –All ancestor-descendant relationships appear on stack during traversal –Traverse the lists only once Main problem: do not want to traverse the whole database, just nodes in A- list/D-list

Stack-Tree-Desc

4,13 14,155,1216,2317,22 TREE 4,13 5,12 6,78,910,11 1,26, 16,23 17,22 18,1920,21 2,324,2514,15 6,78,910,11 18,1920,21 2,324,25 Print in order of descendants 1.Keep ancestors in the same path in a stack 2.When descendant comes, it is descendant of the whole stack, then print them 3.Pop from stack when a different path is processed e.g. when 14,15 comes, both previous ancestors are popped stack skip Results: [4,13+6,7] 4,13 5,12 Results: [4,13+6,7] [5,12+6,7] 4,13 Print 8,9 with the whole stack: [4,13+8,9] [4,13+5,12] 5,12 Results: [4,13+6,7] [5,12+6,7] 4,13 POP!! and keep going stack

Example of Stack-Tree-Desc Execution

Stack-Tree-Anc Basic problem: results from a particular descendant cannot be output immediately –Later descendants may match earlier ancestor Solution: keep lists of matching descendant nodes with each stack node –Self-list Descendants that match this node Add descendant node to self-lists of all matching ancestor nodes –Inherit list Inherited from descendants already popped from stack, to be output after self-list matches are output

Stack-Tree Analysis Stack-Tree-Desc –Time complexity (for anc-desc and par- child) O(|Alist| + |Dlist| + |OutputList|) –IO Complexity (for anc-desc and par-child) O(|Alist|/B + |Dlist|/B + |OutputList|/B) –Where B is blocking factor Stack-Tree-Anc –Requires careful handling of lists –Complexity is same as for Desc case

Performance Study