Download presentation
Presentation is loading. Please wait.
Published byDominick Farmer Modified over 9 years ago
1
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian Yu
2
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 2 Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates Notation and data structure A holistic matching algorithm: TwigStackList ¬ Experiments Conclusion
3
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 3 Introduction XML data representation rapidly increases popularity XML documents modeled as ordered trees. XML queries specify patterns of selection predicates on multiple elements having some structural relationships (parent-child, ancestor-descendant)
4
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 4 Introduction: NOT-twig Query Twig query: a small tree whose nodes are tags, attributes or text values and edges are either Parent-Child (P-C) edges or Ancestor-Descendant (A-D) edges. Twig query can have not-predicates Example: Normal-twig query (without not-predicate): Q1:suppliersDatabase/supplier[//store]//part Not-twig query (with not-predicates): Q2:suppliersDatabase/supplier[NOT(//store)]//part Intuitive meaning: selects all part elements which are descendent of supplier elements having no descendant element store.
5
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 5 Naïve Method for evaluating not-twig queries: Decompose the NOT-twig query into several normal-twig queries Evaluate the decomposed queries with existing method (twigStack or twigStackList) final result can be calculated ( using set operators ) based on the results of the individual decomposed quires. For example: Motivation: Q2: Two decomposed Queries:
6
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 6 Disadvantage of the naïve method: Not optimal. Many useless intermediate results are produced Additional disk I/O: Clearly many elements are accessed more than once. Our objective: Efficient processing of XML not-twig queries. Motivation:
7
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 7 Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates Notation and data structure A holistic matching algorithm: TwigStackList ¬ Experiments Conclusion
8
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 8 TwigStack [1]: a holistic approach to match twig query without not- predicates Two-phase algorithm: Phase 1: The intermediate root to leaf path solutions are outputted Phase 2: Merge the intermediate path solutions to get the final results TwigStack: optimal when the query contains only ancester-descendant relationship parent-child relationship: TwigStack may output some intermediate path solutions that cannot contribute to final solutions. Therefore, TwigStack is sub-optimal for queries with parent-child relationships Previous work: TwigStack [1] 1. N. Bruno, D. Srivastava, and N. Koudas. “Holistic twig joins: optimal xml pattern matching.” In Proceedings of ACM SIGMOD, 2002.
9
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 9 The main problem of TwigStack is to assume all edges are ancestor-descendant relationship in the first phase. So it is not efficient for queries with parent-child relationships. Improved method: TwigStackList [2] There is an additional list structure for each query node to cache elements that likely participate in final solutions. TwigStackList is optimal when there is no P-C edge for branching nodes, therefore, it is more efficient than TwigStack. 2. J. Lu, T. Chen, and T. W. Ling. “Efficient processing of xml twig patterns with parent child edges: a look-ahead approach.” In Proceedings of CIKM, pages 533- 542, 2004. Previous work: TwigStackList [2]
10
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 10 Previous work: Query Predicates[3][4][5] XML Query Predicate OR-predicates [3] Ordered-predicates [4] not-predicates on XPath [5] 3. H. Jiang, H. Lu, and W. Wang. “Efficient processing of twig queries with OR-predicates.” In Proceeding of SIGMOD, pages 59–70, 2004. 4. J. Lu, T. W. Ling, T. Yu, C. Li, and W. Ni. “Efficient processing of ordered XML twig pattern.” In Proceeding of DEXA, 2005. 5. E. Jiao, T.W. Ling, C. Y. Chan, and P. S. Yu. “Pathstack¬: A holistic path join algorithm for path query with not-predicates on XML data.” In Proceedings of DASFAA, 2005.
11
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 11 Previous work: Query Predicates[3][4][5] OR-predicate Define OR-block AND/OR-twig can be viewed as an AND-twig with elements and OR-blocks. Ordered XML query matching Use look-ahead list to check for order Support Four axis: following-sibling, preceding-sibling, following, and preceding 3. H. Jiang, H. Lu, and W. Wang. “Efficient processing of twig queries with OR-predicates.” In Proceeding of SIGMOD, pages 59–70, 2004. 4. J. Lu, T. W. Ling, T. Yu, C. Li, and W. Ni. “Efficient processing of ordered XML twig pattern.” In Proceeding of DEXA, 2005. a dc > b
12
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 12 Previous work: not-Predicates on XPath [5] It evaluating path queries with not-predicates, based on PathStack algorithm Define: non-output node, output node in a query non-output node: node that appear below a negative edge output node: node that does not appear below any negative edge XPath query with NOT-predicate Each query node is also associated with a stack which is either a regular stack or a boolean stack Check NOT-predicate by updating the boolean value Cannot match not-twig queries 5. E. Jiao, T.W. Ling, C. Y. Chan, and P. S. Yu. “Pathstack¬: A holistic path join algorithm for path query with not-predicates on XML data.” In Proceedings of DASFAA, 2005. Output node: A, B Non-Output node: C, D
13
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 13 Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates Notation and data structure A holistic matching algorithm: TwigStackList ¬ Experiments Conclusion
14
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 14 Data model An XML document is modeled as a rooted, ordered and tagged tree Labelling scheme: Region Coding (startPos, endPos, LevelNum) Given e 1, e 2 : e 1 is ancestor of e 2 : iff e 1.start e 2. end e 1 is parent of e 2 : iff e 1.start e 2. end, and e 1.level + 1= e 2. level “…” book preface chapter sectiontitle “Data..” “Intro..” “…” (1,123,1) (2,4,2) (3,3,3) (13,24,2) (5,12,2) (9,11,3) (6,8,3) (7,7,4) (10,10,4) sectiontitle “Data..” “…” (17,19,3)(14,16,3) (15,15,4)(18,18,4) e1e1 e2e2 e2e2 section … …chapter
15
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 15 Proposed Subquery Matching Method: Positive/negative children of a query node: Positive children of A: C, D, E Negative children of A: B, F NOT twig query
16
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 16 Proposed Subquery Matching Method: Given a NOT-twig query (with a query node n) and an XML data tree (Data), we say that an element e n (with the tag n) in the XML data tree satisfies the sub-query rooted at n iff: (1) n is a leaf node of NOT-query; OR (2) For each child node m of n: (Four cases) (case i)If m is a positive PC child node of n, there is an element e m in Data such that e m is a child element of e n and satisfies the sub-query rooted at m in Data. (case ii)If m is a positive AD child node of n, there is an element e m in Data such that e m is a descendant element of e n and satisfies the sub-query rooted at m in Data. n m Query e n e m Data Case 1 … m Query e n e m Data Case 2 n
17
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 17 Proposed Subquery Matching Method: (case iii)If m is a negative PC child node of n, there does not exist any element e m in Data such that e m is a child element of e n and satisfies the sub-query rooted at m in Data. (case iv)If m is a negative AD child node of n, there does not exist any element e m in Data such that e m is a descendant element of e n and satisfies the sub query rooted at m in Data. n m Query e n e m Data Case 1 n e m n e m e … … m Query e n e m Data Case 2 n m QueryData Case 3 n Query Case 4 n m e Data
18
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 18 Leaf nodes: B 1, B 2 and D 1 has satisfy the subqueries rooted at B and D respectively. C 1 satisfies subquery rooted at C, since D 1 is a descendant of C 1 Since C 1 satisfies the subquery rooted at C, we can safely say A 1 does not satisfy subquery rooted at A. It is because C 1 is a child of A 1 and in the NOT-twig, node C is a negative child of node A. A 2 satisfies subquery rooted at A since A 2 has a descendent B 2 satisfies subquery rooted at B, and does not have any child with the tag C. Subquery Matching: Example
19
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 19 Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates Notation and data structure A holistic matching algorithm: TwigStackList ¬ Experiments Conclusion
20
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 20 TwigStackList ¬ Algorithm: Data structure Each node n in the twig query has: Stream, List, and Stack Data Stream: T n we partition an XML document into streams All elements in a stream are of the same tag and ordered by their start Position The elements in each stream is read only once from head to tail. a 1, a 2, a 3 b 1, b 2 C 1, C 2 d 1, d 2, d 3 a dcb TaTa TbTb TcTc TdTd Document 2: 3: a1a1 a2a2 a3a3 b2b2 d2d2 b1b1 d3d3 c2c2 d1d1 c1c1 4: Level 1:
21
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 21 TwigStackList ¬ Algorithm: Data structure Each node with tag name n in the twig query has: Stream, List, and Stack Data Stream: T n List: L n The elements in lists help to check for P-C relationship and negative edges Stack: S n Stacks is used for each output node Store elements that are the potential solutions of the XML query. The stack size if bounded by the document root to leaf depth. a 1, a 2 … b 1.. C 1.. d 1,d 3.. a dcb LaLa LbLb LcLc LdLd SaSa SbSb
22
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 22 Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates Notation and data structure A holistic matching algorithm: TwigStackList ¬ Experiments Conclusion
23
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 23 Document: Query: A BC A1A1 B1B1 C1C1 B 1, B 2 A D C1C1 B2B2 D1D1 D1D1 A2A2 TwigStackList¬ Algorithm: An Example A 1, A 2 T B T C T D T
24
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 24 Document: Query: A BC A1A1 B1B1 C1C1 B 1, B 2 D C1C1 B2B2 D1D1 D1D1 A2A2 Create Stacks of every output nodes Next Action: TwigStackList¬ Algorithm: An Example A 1, A 2 A T B T C T D T
25
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 25 Document: Query: A BC A1A1 B1B1 C1C1 B 1, B 2 D C1C1 B2B2 D1D1 D1D1 A2A2 Create lists of every query nodes Next Action: TwigStackList¬ Algorithm: An Example A 1, A 2 A T B T C T D T
26
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 26 Document: Query: A BC A1A1 B1B1 C1C1 B 1, B 2 D C1C1 B2B2 D1D1 D1D1 A2A2 Push C 1 into list of C Next Action: TwigStackList¬ Algorithm: An Example A 1, A 2 C1C1 A T B T C T D T C1 has a child D1
27
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 27 Document: Query: A BC A1A1 B1B1 C1C1 B 1, B 2 D C1C1 B2B2 D1D1 D1D1 A2A2 Push A 1 into list of A Next Action: TwigStackList¬ Algorithm: An Example A 1, A 2 C1C1 A1A1 A T B T C T D T A1 has descendant B1
28
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 28 Document: Query: A BC A1A1 B1B1 C1C1 B 1, B 2 D C1C1 B2B2 D1D1 D1D1 A2A2 Remove A 1 from list of A Next Action: TwigStackList¬ Algorithm: An Example A 1, A 2 C1C1 A1A1 A T B T C T D T C 1 is a child of A 1, thus A 1 does not satisfy the negative edge of the query between A and C.
29
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 29 Document: Query: A BC A1A1 B1B1 C1C1 B 1, B 2 D C1C1 B2B2 D1D1 D1D1 A2A2 Advance A Next Action: TwigStackList¬ Algorithm: An Example A 1, A 2 C1C1 A T B T C T D T
30
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 30 Document: Query: A BC A1A1 B1B1 C1C1 B 1, B 2 D C1C1 B2B2 D1D1 D1D1 A2A2 Advance B Next Action: TwigStackList¬ Algorithm: An Example A 1, A 2 C1C1 A T B T C T D T
31
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 31 Document: Query: A BC A1A1 B1B1 C1C1 B 1, B 2 D C1C1 B2B2 D1D1 D1D1 A2A2 Advance C, reach end of stream Next Action: TwigStackList¬ Algorithm: An Example A 1, A 2 C1C1 A T B T C T D T
32
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 32 Document: Query: A BC A1A1 B1B1 C1C1 B 1, B 2 D C1C1 B2B2 D1D1 D1D1 A2A2 Push A 2 into list of A Next Action: TwigStackList¬ Algorithm: An Example A 1, A 2 A2A2 A 2 has descendant B 2 C1C1 A T B T C T D T
33
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 33 Document: Query: A BC A1A1 B1B1 C1C1 B 1, B 2 D C1C1 B2B2 D1D1 D1D1 A2A2 Final Solution: A 2, B 2 Next Action: TwigStackList¬ Algorithm: An Example A 1, A 2 A2A2 A 2 has descendant B 2 and no child with tag C C1C1 A T B T C T D T
34
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 34 TwigStackList¬ Algorithm: Optimality TwigStack is optimal for A-D only relationship. TwigStackList is optimal when no branching node has P-C relationship. TwigStackList ¬ is optimal if a branching node doesn’t have more than one positive relationships among which at least one is P-C edge. Negative P-C for Branch node A-D only A-D for branching node
35
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 35 Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates Notation and data structure A holistic matching algorithm: TwigStackList ¬ Experiments Conclusion
36
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 36 TwigStackList¬ Algorithm: Experiments Algorithms for comparison: na Ï ve -TwigStack (NTS) na Ï ve-TwigStackList (NTSL) Our proposed TwigStackList¬ Benchmarks XMark: Synthetic Data Treebank: Real Data from Wall Street Journal Random Data set: random uniformly distributed data Evaluation metrics Number of intermediate path solutions Total running time
37
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 37 Testing Queires for TwigStackList ¬ : Q(a)-Q(e) TreeBank, Q(f)-Q(h) Xmark, Q(i)-Q(k) Random Dataset TwigStackList¬ Algorithm: Experiments
38
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 38 TwigStackList¬ has the smallest intermediate results NTS, NTSL match more than one Twig Patterns TwigStackList¬ Algorithm: Intermediate result
39
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 39 Treebank and random data: TwigStackList¬ has the smallest running time 1.They output more intermediate results. 2.They match more than one twig queries. TwigStackList¬ Algorithm: Execution time Treebank testing result Random dataset testing result
40
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 40 TwigStackList¬ Algorithm: Execution time XMark QueriesXMark testing result XMark: TwigStackList¬ has the smallest running time Query Q(f), Q(g), Q(h) have the same query nodes and structure, but the number of not-predicates is increasing. NTS and NTSL execution time increases linearly, because the number of decomposed queries is increasing. TwigStackList¬ execution time is almost constant.
41
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 41 Outline Introduction and motivation Related Work Preliminaries Match Twig Query with Not-predicates Notation and data structure A holistic matching algorithm: TwigStackList ¬ Experiments Conclusion
42
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 42 Conclusions We developed a new algorithm TwigStackList ¬ to match Twig Pattern with not-predicates. Our algorithm can identify a larger query class to guarantee I/O optimally Experimental results showed the effectiveness and efficiency of our algorithm
43
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 43 END Thank you! Q & A
44
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 44 Reference 1. N. Bruno, D. Srivastava, and N. Koudas. “Holistic twig joins: optimal xml pattern matching.” In Proceedings of ACM SIGMOD, 2002. 2. J. Lu, T. Chen, and T. W. Ling. “Efficient processing of XML twig patterns with parent child edges: a look-ahead approach.” In Proceedings of CIKM, pages 533- 542, 2004. 3. H. Jiang, H. Lu, and W. Wang. “Efficient processing of twig queries with OR-predicates.” In Proceeding of SIGMOD, pages 59–70, 2004. 4. J. Lu, T. W. Ling, T. Yu, C. Li, and W. Ni. “Efficient processing of ordered XML twig pattern.” In Proceeding of DEXA, 2005. 5. E. Jiao, T.W. Ling, C. Y. Chan, and P. S. Yu. “Pathstack¬: A holistic path join algorithm for path query with not-predicates on XML data.” In Proceedings of DASFAA, 2005.
45
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 45 1. S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu, “Structural Joins: A Primitive for Efficient XML Query Pattern Matching.” In Proceedings of ICDE Conf. 2002. Previous work: binary structural Join TreeMerge and Stack-Merge [1]: Break into binary relationship branches Stitch the branches together A novel stack-based binary join algorithm Disadvantage: large intermediate results Appendix (1)
46
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data 46 Previous work: TwigStackList v.s. TwigStack TwigStack output the it output the “uesless” intermediate path solution, since it doesn’t check for parent-child relationsihp. TwigStackList has no uesless intermediate output. is not in the output. Twig Pattern s1s1 p1p1 section title paragraph figure p3p3 f1f1 t1t1 An XML tree t2 s2s2 p2p2 t3t3 f2f2 Root s1s1 t1t1 No Parent-child relationship for branching node Appendix (2)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.