Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ting Chen, Jiaheng Lu, Tok Wang Ling

Similar presentations


Presentation on theme: "Ting Chen, Jiaheng Lu, Tok Wang Ling"— Presentation transcript:

1 Ting Chen, Jiaheng Lu, Tok Wang Ling
On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques Ting Chen, Jiaheng Lu, Tok Wang Ling

2 Outline Background Our holistic Twig Pattern Matching algorithms
XML Twig Pattern Query Previous Twig Join algorithms Limit of the original holistic method TwigStack Our holistic Twig Pattern Matching algorithms Two Refined Indexing Schemes: Tag+Level and PPS A generalized holistic matching theory iTwigJoin: a generalized holistic matching algorithm Experiments Conclusion On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

3 Background: XML and Region coding
XML document is modeled as a tree in our work Region Coding for XML document tree <start, end, level> label for each element Containment Property: a.start < b.start AND a.end > b.end if and only if a is an ancestor of b On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

4 Background: XML twig pattern queries
An XML twig query is a small tree, whose edges include parent-child or ancestor-descendant relationships. Given an XML document D, and an XML twig query Q, our problem is to find all occurrences of Q on D. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

5 Previous XML Twig Join algorithms
Techniques Edge Based Binary Structural Join [Al-Khalifa et al ICDE02] Join Order Selection [Wu et al ICDE03] Path Based BLAS [Chen et al SIGMOD04] Tree (Holistic) Based TwigStack [Bruno et al SIGMOD02] TwigStackList [Lu et al CIKM04] Index Based B tree [[Chien et al VLDB02] XR tree[Jiang et al ICDE02] TSGeneric+[Jiang et al VLDB03] On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

6 Holistic Twig Matching
TwigStack [Bruno et al SIGMOD02] A holistic twig join algorithm E.g: For query A[.//C]//B, there may be many matches only to A//B. But TwigStack only output results for A with descendants B and C. No join order selection required TwigStack is optimal for only ancestor-descendant twig patterns. Reordering of elements in a stream does not help. [Choi et al DEXA03] On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

7 Sub-optimality of TwigStack
Not optimal for twigs with parent-child edge a1 a1 a2 … an A b1 a2 an cn B C b1 b2 … bn c1 c2 … cn b2 c1 bn cn-1 Document Query On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

8 Two Refined Streaming Schemes(1)
To enlarge the optimality of TwigStack, in our paper we proposed two refined streaming schemes. Tag + Level: elements with the same tag and level are grouped together a1 A a1 b1 a2 an cn b1 a2 a3 … an cn B C b2 b3 … bn c1 c2 … b2 c1 bn cn-1 Document Query On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

9 Two Refined Streaming Schemes(1)
For this query, tag+level streaming scheme can guarantee the optimality. a1 A a1 b1 a2 an cn b1 a2 a3 … an cn B C b2 b3 … bn c1 c2 … b2 c1 bn cn-1 Document Query On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

10 Two Refined Streaming Schemes(1)
But given a more complex query and document, tag+level cannot guarantee the optimality. For example: a1 A a1 e1 a2 b2 a2 b2 D B d3 d1 d2,d3 d1 d2 b1 b1 C c1 c2 Query c1 c2 Document On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

11 Two Refined Streaming Schemes(2)
Prefix Path Streaming (PPS): elements with the same root-to-node path are grouped together Every element in the document is stored as an individual stream in this example. D: a1 a1 e1 a2 b2 e1 a2 b2 d1 d2 b1 d3 d3 d1 d2 b1 c1 c2 Document c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

12 Two Refined Streaming Schemes(2)
PPS is optimal for the following example. d1,d2,c1,c2 are separated to different streams a1 A a1 e1 a2 b2 a2 b2 D B d3 d1 d2 d1 d2 b1 b1 C c1 c2 Query c1 c2 Document On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

13 Two Refined Streaming Schemes(2)
A natural question : Can PPS guarantee to be optimal for all queries and data? On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

14 Two Refined Streaming Schemes(2)
A natural question : Can PPS guarantee to be optimal for all queries and data? The answer is NO. For example: c1, c2 are in the same stream. Similarly, e1, e2 are also in the same stream. A a1 b1 b2 b3 C B a2 a3 a4 d2 E D c1 c2 b4 b5 e1 d1 e2 Query : head element Document On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

15 A general algorithm: iTwigJoin
We propose a general algorithm, called iTwigJoin , which can be used on various data streaming schemes. Our key idea is to classify all current head elements to three classes: Subtree-matching Useless Blocked On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

16 Classifying Head Elements
Subtree-Matching Element Element e of tag E is called a subtree-matching element for query Q e is in a match to QE (QE is the sub-tree of Q rooted at E); and NOT in any future match to QP where P is the parent of E in Q Useless Element Element e is called a useless element if e is not in any future match to QE. Blocked Element An element which is neither subtree-matching nor useless On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

17 Example: Classifying Head Elements (Tag+Level Streaming)
Q1: e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 a2 b2 d1 d2 d3 … b1 c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

18 Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching useless blocked d1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 a2 b2 d1 d2 d3 … b1 c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

19 Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching useless blocked d1,c1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 a2 b2 d1 d2 d3 … b1 c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

20 Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching - useless blocked d1,c1,a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 a2 b2 d1 d2 d3 … b1 c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

21 Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching - useless blocked d1,c1,a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching useless blocked D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

22 Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching - useless blocked d1,c1, a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching d1 useless blocked D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

23 Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching - useless blocked d1,c1, a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching d1 useless a1,b2 blocked D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

24 Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching - useless blocked d1,c1, a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching d1 useless a1,b2 blocked c1 D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

25 Example: Classifying Head Elements (Tag+Level Streaming)
Q1: Subtree-matching - useless blocked d1,c1, a1,a2,b2,b1 e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 : head element a1 Q2: A a2 b2 Subtree-matching d1 useless a1,b2 blocked c1, b1, a2, D B d1 d2 d3 … b1 c1 c2 C On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

26 Example: Classifying Head Elements (Tag+Level Streaming)
Subtree-matching - useless blocked a1,a2,b1,b2,c1,d1 e1 a2 b2 D B A C d1 d2 b1 d3 B D Subtree-matching d1, useless a1,b2 blocked a2,b1,c1 C c1 c2 Useless element can be discarded safely sub-tree Matching element is pushed to the corresponding stack Blocked element causes problem CANNOT be discarded because it may cause loss of results CANNOT be pushed to stack because it may cause useless results When all head elements are blocked; optimal holistic matching CANNOT be guaranteed On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

27 iTwigJoin In our algorithm, in order to output all correct answers, we push blocked elements into stack, which may result in useless intermediate results in some cases. Tag+Level Streaming a1 A Q1: e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

28 iTwigJoin In our algorithm, in order to output all correct answers, we push blocked elements into stack, which may result in useless intermediate results in some cases. Tag+Level Streaming a1 Since all head elements are blocked, we have to push a1 to stack and output one path solution (a1,d1). A Q1: e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

29 If there is no c2, then (a1,d1) is a useless path solution.
iTwigJoin In our algorithm, in order to output all correct answers, we push blocked elements into stack, which may result in useless intermediate results in some cases. Tag+Level Streaming a1 Since all head elements are blocked, we have to push a1 to stack and output one path solution (a1,d1). A Q1: e1 a2 b2 D B d1 d2 b1 d3 C c1 c2 If there is no c2, then (a1,d1) is a useless path solution. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

30 iTwigJoin Two Main Components
Stream Manager: Control the advance operation of streams and send elements for temporary storage Temporary Storage: Push elements to stack and output intermediate paths. Stream Manager Temporary Storage a1 SA a2 b2 SB SC c1 c2 c3 … b1 On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

31 Flowchart of iTwigJoin
Label current head elements as either subtree-Matching, Useless or Blocked If useless element is found Discard Useless elements If not all streams end Select a subtree-Matching or blocked element e Pop some elements from stack Push e to the stack and output intermediate paths if e is the leaf On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

32 Optimal classes of iTwigJoin for three streaming schemes
Tag Streaming A-D only pattern A-D only On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

33 Optimal classes of iTwigJoin for three streaming schemes
Tag Streaming A-D only pattern Tag+Level Streaming A-D/P-C only pattern A-D/P-C only A-D only On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

34 Optimal classes of iTwigJoin for three streaming schemes
Tag Streaming A-D only pattern Tag+Level Streaming A-D/P-C only pattern Prefix Path Streaming A-D/P-C only or 1-Branch A-D/P-C only or 1-Branch node A-D/P-C only A-D only On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

35 Optimal classes of iTwigJoin for three streaming schemes
Optimal class:Larger More refined Tag Streaming A-D only pattern Tag+Level Streaming A-D/P-C only pattern Prefix Path Streaming A-D/P-C only or 1-Branch A-D/P-C only or 1-Branch node A-D/P-C only A-D only On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

36 Experiments Benchmarks XMark: Synthetic Data
Treebank: Real Data from Wall Street Journal On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

37 Experiments: I/O Performance
Tree1: A-D only Tree2: P-C only Tree3: P-C only Tree4: 1-branchnode Tree5: 1-branchnode By pruning irrelevant streams, PPS usually scan the fewest number of elements. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

38 Experiments: Number of Intermediate Paths
Tree1: A-D only Tree2: P-C only Tree3: P-C only Tree4: 1-branchnode Tree5: 1-branchnode For treebank 5, there is no matching results. So Tag+Level and PPS do not output any intermediate results. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

39 Experiments: Running Time
XMark1: Path Pattern, XMark2: A-D only, XMark3: P-C only, XMark4: 1-branchnode, XMark5: Non-optimal, Tag+level and PPS have better performance than TwigStack and TwigStackList in XMark data. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

40 Experiments: Summary Both PPS and Tag+Level help to reduce I/O costs. while PPS saves more. PPS may result in too many streams for deep XML data; Tag+Level seems to be a good compromise. PPS and Tag+Level completely avoid the output of redundant intermediate paths in all cases we tested, though they cannot guarantee the optimality in theory. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

41 Conclusions We develop a general algorithm to perform holistic twig join on Tag+Level and PPS streaming schemes. We identify two I/O optimal classes for Tag+Level and PPS streaming schemes. Since our experiments show that Tag+Level streaming schemes can guarantee to produce very few useless intermediate results in most cases, we recommend to use Tag+Level scheme for efficient XML twig pattern matching. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

42 END Thank you! Q & A On Boosting Holism in XML Twig Pattern Matching using Structural Indexing

43 Backup iTwigJoin Algorithm
While(not all streams end) Label current head elements as either Matching, Useless or Blocked If any head element is Useless, discard it and continue Let e1 be the matching element with the smallest startPos; Let e2 be the blocked element with the smallest endPos; If e2.endPos < e1.startPos, let e be the blocked element with the smallest startPos; else let e be e1 Advance the stream e belongs to Pop out elements from e’s stack whose endPos < e.startPos Push e into its stack if e has a parent/ancestor in the temporary storage system, Output all paths involving e If the tag of e is a leaf node in Q On Boosting Holism in XML Twig Pattern Matching using Structural Indexing


Download ppt "Ting Chen, Jiaheng Lu, Tok Wang Ling"

Similar presentations


Ads by Google