Download presentation
Presentation is loading. Please wait.
1
Ting Chen, Jiaheng Lu, Tok Wang Ling
On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques Ting Chen, Jiaheng Lu, Tok Wang Ling
2
Outline Background Our holistic Twig Pattern Matching algorithms
XML Twig Pattern Query Previous Twig Join algorithms Limit of the original holistic method TwigStack Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching theory iTwigJoin: a generalized holistic matching algorithm Experiments Conclusion On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
3
Background: XML and Region coding
XML document is modeled as a tree in our work Region Coding for XML document tree <start, end, level> label for each element Containment Property: a.start < b.start AND a.end > b.end if and only if a is an ancestor of b On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
4
Background: XML twig pattern queries
An XML twig query is a small tree, whose edges include parent-child or ancestor-descendant relationships. Given an XML document D, and an XML twig query Q, our problem is to find all occurrences of Q on D. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
5
Previous XML Twig Join algorithms
Techniques Edge Based Binary Structural Join [Al-Khalifa et al ICDE02] Join Order Selection [Wu et al ICDE03] Path Based BLAS [Chen et al SIGMOD04] Tree (Holistic) Based TwigStack [Bruno et al SIGMOD02] TwigStackList [Lu et al CIKM04] Index Based B tree [[Chien et al VLDB02] XR tree[Jiang et al ICDE02] TSGeneric+[Jiang et al VLDB03] On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
6
Holistic Twig Matching
TwigStack [Bruno et al SIGMOD02] A holistic twig join algorithm E.g: For query A[.//C]//B, there may be many matches only to A//B. But TwigStack only output results for A with descendants B and C. No join order selection required TwigStack is optimal for only ancestor-descendant twig patterns. Reordering of elements in a stream does not help. [Choi et al DEXA03] On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
7
Sub-optimality of TwigStack
Not optimal for twigs with parent-child edge a1 a1 a2 … an A b1 a2 an cn B C b1 b2 … bn c1 c2 … cn … b2 c1 bn cn-1 Document Query On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
8
Two Refined Streaming Schemes(1)
To enlarge the optimality of TwigStack, in our paper we proposed two refined streaming schemes. Tag + Level: elements with the same tag and level are grouped together a1 A a1 … b1 a2 an cn b1 a2 a3 … an cn B C … b2 b3 … bn c1 c2 … b2 c1 bn cn-1 Document Query On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
9
Two Refined Streaming Schemes(1)
For this query, tag+level streaming scheme can guarantee the optimality. a1 A a1 … b1 a2 an cn b1 a2 a3 … an cn B C … b2 b3 … bn c1 c2 … b2 c1 bn cn-1 Document Query On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
10
Two Refined Streaming Schemes(1)
But given a more complex query and document, tag+level cannot guarantee the optimality. For example: a1 A a1 e1 a2 b2 a2 b2 D B d3 d1 d2,d3 d1 d2 b1 b1 C c1 c2 Query c1 c2 Document On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
11
Two Refined Streaming Schemes(2)
Prefix Path Streaming (PPS): elements with the same root-to-node path are grouped together Every element in the document is stored as an individual stream in this example. D: a1 a1 e1 a2 b2 e1 a2 b2 d1 d2 b1 d3 d3 d1 d2 b1 c1 c2 Document c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
12
Two Refined Streaming Schemes(2)
PPS is optimal for the following example. d1,d2,c1,c2 are separated to different streams a1 A a1 e1 a2 b2 a2 b2 D B d3 d1 d2 d1 d2 b1 b1 C c1 c2 Query c1 c2 Document On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
13
Two Refined Streaming Schemes(2)
A natural question : Can PPS guarantee to be optimal for all queries and data? On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
14
Two Refined Streaming Schemes(2)
A natural question : Can PPS guarantee to be optimal for all queries and data? The answer is NO. For example: c1, c2 are in the same stream. Similarly, e1, e2 are also in the same stream. A a1 b1 b2 b3 C B a2 a3 a4 d2 E D c1 c2 b4 b5 e1 d1 e2 Query : head element Document On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
15
A general algorithm: iTwigJoin
We propose a general algorithm, called iTwigJoin , which can be used on various data streaming schemes. Our key idea is to classify all current head elements to three classes: Subtree-matching Useless Blocked On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
16
Classifying Head Elements
Subtree-Matching Element Element e of tag E is called a subtree-matching element for query Q e is in a match to QE (QE is the sub-tree of Q rooted at E); and NOT in any future match to QP where P is the parent of E in Q Useless Element Element e is called a useless element if e is not in any future match to QE. Blocked Element An element which is neither subtree-matching nor useless On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
17
Example: Classifying Head Elements (Tag+Level Streaming)
Q1: e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 a2 b2 d1 d2 d3 … b1 c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
18
Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching useless blocked d1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 a2 b2 d1 d2 d3 … b1 c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
19
Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching useless blocked d1,c1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 a2 b2 d1 d2 d3 … b1 c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
20
Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching - useless blocked d1,c1,a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 a2 b2 d1 d2 d3 … b1 c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
21
Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching - useless blocked d1,c1,a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching useless blocked D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
22
Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching - useless blocked d1,c1, a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching d1 useless blocked D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
23
Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching - useless blocked d1,c1, a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching d1 useless a1,b2 blocked D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
24
Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching - useless blocked d1,c1, a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching d1 useless a1,b2 blocked c1 D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
25
Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching - useless blocked d1,c1, a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching d1 useless a1,b2 blocked c1, b1, a2, D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
26
Example: Classifying Head Elements (Tag+Level Streaming)
Subtree-matching - useless blocked a1,a2,b1,b2,c1,d1 e1 a2 b2 D B A C d1 d2 b1 d3 B D Subtree-matching d1, useless a1,b2 blocked a2,b1,c1 C c1 c2 Useless element can be discarded safely sub-tree Matching element is pushed to the corresponding stack Blocked element causes problem CANNOT be discarded because it may cause loss of results CANNOT be pushed to stack because it may cause useless results When all head elements are blocked; optimal holistic matching CANNOT be guaranteed On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
27
iTwigJoin In our algorithm, in order to output all correct answers, we push blocked elements into stack, which may result in useless intermediate results in some cases. Tag+Level Streaming a1 A Q1: e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
28
iTwigJoin In our algorithm, in order to output all correct answers, we push blocked elements into stack, which may result in useless intermediate results in some cases. Tag+Level Streaming a1 Since all head elements are blocked, we have to push a1 to stack and output one path solution (a1,d1). A Q1: e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
29
If there is no c2, then (a1,d1) is a useless path solution.
iTwigJoin In our algorithm, in order to output all correct answers, we push blocked elements into stack, which may result in useless intermediate results in some cases. Tag+Level Streaming a1 Since all head elements are blocked, we have to push a1 to stack and output one path solution (a1,d1). A Q1: e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 If there is no c2, then (a1,d1) is a useless path solution. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
30
iTwigJoin Two Main Components
Stream Manager: Control the advance operation of streams and send elements for temporary storage Temporary Storage: Push elements to stack and output intermediate paths. Stream Manager Temporary Storage a1 SA a2 b2 SB SC c1 c2 c3 … b1 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
31
Flowchart of iTwigJoin
Label current head elements as either subtree-Matching, Useless or Blocked If useless element is found Discard Useless elements If not all streams end Select a subtree-Matching or blocked element e Pop some elements from stack Push e to the stack and output intermediate paths if e is the leaf On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
32
Optimal classes of iTwigJoin for three streaming schemes
Tag Streaming A-D only pattern A-D only On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
33
Optimal classes of iTwigJoin for three streaming schemes
Tag Streaming A-D only pattern Tag+Level Streaming A-D/P-C only pattern A-D/P-C only A-D only On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
34
Optimal classes of iTwigJoin for three streaming schemes
Tag Streaming A-D only pattern Tag+Level Streaming A-D/P-C only pattern Prefix Path Streaming A-D/P-C only or 1-Branch A-D/P-C only or 1-Branch node A-D/P-C only A-D only On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
35
Optimal classes of iTwigJoin for three streaming schemes
Optimal class:Larger More refined Tag Streaming A-D only pattern Tag+Level Streaming A-D/P-C only pattern Prefix Path Streaming A-D/P-C only or 1-Branch A-D/P-C only or 1-Branch node A-D/P-C only A-D only On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
36
Experiments Benchmarks XMark: Synthetic Data
Treebank: Real Data from Wall Street Journal On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
37
Experiments: I/O Performance
Tree1: A-D only Tree2: P-C only Tree3: P-C only Tree4: 1-branchnode Tree5: 1-branchnode By pruning irrelevant streams, PPS usually scan the fewest number of elements. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
38
Experiments: Number of Intermediate Paths
Tree1: A-D only Tree2: P-C only Tree3: P-C only Tree4: 1-branchnode Tree5: 1-branchnode For treebank 5, there is no matching results. So Tag+Level and PPS do not output any intermediate results. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
39
Experiments: Running Time
XMark1: Path Pattern, XMark2: A-D only, XMark3: P-C only, XMark4: 1-branchnode, XMark5: Non-optimal, Tag+level and PPS have better performance than TwigStack and TwigStackList in XMark data. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
40
Experiments: Summary Both PPS and Tag+Level help to reduce I/O costs. while PPS saves more. PPS may result in too many streams for deep XML data; Tag+Level seems to be a good compromise. PPS and Tag+Level completely avoid the output of redundant intermediate paths in all cases we tested, though they cannot guarantee the optimality in theory. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
41
Conclusions We develop a general algorithm to perform holistic twig join on Tag+Level and PPS streaming schemes. We identify two I/O optimal classes for Tag+Level and PPS streaming schemes. Since our experiments show that Tag+Level streaming schemes can guarantee to produce very few useless intermediate results in most cases, we recommend to use Tag+Level scheme for efficient XML twig pattern matching. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
42
END Thank you! Q & A On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
43
Backup iTwigJoin Algorithm
While(not all streams end) Label current head elements as either Matching, Useless or Blocked If any head element is Useless, discard it and continue Let e1 be the matching element with the smallest startPos; Let e2 be the blocked element with the smallest endPos; If e2.endPos < e1.startPos, let e be the blocked element with the smallest startPos; else let e be e1 Advance the stream e belongs to Pop out elements from e’s stack whose endPos < e.startPos Push e into its stack if e has a parent/ancestor in the temporary storage system, Output all paths involving e If the tag of e is a leaf node in Q On Boosting Holism in XML Twig Pattern Matching using Structural Indexing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.