VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui Zhang, Vassilis J. Tsotras, Carlo Zaniolo VLDB 2002

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML2 Overview Motivation Problem Description Structural Joins Structural Joins using B+-trees Structural Joins using R-trees Problem Variations Experimental Results

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML3 Motivation (1) Query languages for XML qualify documents for retrieval both by their structure and the values of their elements. Example: section[title=“Overview”]//figure[caption=“R-tree”] (path-expression query)

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML4 Motivation (2) When the XML document is combined with a numbering scheme, path expression queries require the computation of structural joins. Numbering Schemes Each node is assigned a unique interval. The intervals of a parent node contains the intervals of all its children.

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML5 Motivation (2) When the XML document is combined with a numbering scheme, path expression queries require the computation of structural joins. From path expressions to structural join: two nodes qualify for a path expression query if one is an ancestor of the other. With intervals, this is equivalent to containment.

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML6 Problem Description Structural Join: Let A and D be two lists containing the instances of two particular tags in an XML document, join A and D using their containment associations as the join condition. [Al-Khalifa, etc. 2002] proposed non- indexed structural join algorithms. We extend their algorithms to take advantage of existing indices on the two lists.

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML7 Structural Joins, no indices 1.Let a, d be the first elements of A and D 2.while (A, D are not empty or the stack is not empty) do 3. if (a.start > stack.top and d.start > stack.top) then 4. stack.pop() 5. else if (a.start < d.start) then 6. stack.push(a) 7. Let a be the next element in A 8. else 9. output d as descendant of all elements in stack 10. let d be the next element in D 11. endif 12.endwhile

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML8 Example a1a1 d1d1 d2d2 a2a2 a4a4 a3a3 d3d3

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML9 Structural Joins using B+-trees Existing structural join algorithms sequentially scan the input lists. Durable numbering schemes have enabled indexing of XML files with mainstream indices. Such indices can result in sub-linear access time as they provide the facility to skip elements that don’t participate in the join.

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML10 Motivation for using the B+-tree index (1) a1a1 d1d1 d2d2 a2a2 a 12 a3a3 a4a4 a8a8 a5a5 a9a9 a6a6 a7a7 a 10 a 11

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML11 Motivation for using the B+-tree index (2) d2d2 d3d3 d 13 d4d4 d5d5 d9d9 d6d6 d 10 d7d7 d8d8 d 11 d 12 a1a1 a2a2 d1d1 d 14

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML12 Structural Joins using B+-trees 1.Put pointers a and d at the beginning of lists A and D 2.while ( not at the end of A or D ) do 3. if ( a is an ancestor of d ) then 4. Push into stack all elements in A that are ancestors of d 5. Join d with all elements in stack and let d=d->next 6. else if ( a.start < d.start ) then // jump ancestor A 7. Pop all elements in stack which are before d 8. Move a forward by skipping sub-trees of last element popped 12. else // a is after d; jump descendant D 13. Join d with all elements in stack 14. Move d forward by skipping all D elements with start<a.start

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML13 Containment forest Structure linking elements that belong to the same tag. Each element corresponds to a node in the structure and is linked to other elements via parent, first-child and right-sibling pointers. Can be embedded within the associated B+- tree Improves CPU time

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML14 Containment forest example A (150,250) A (10,500) A (800,900) A (1400,2000) A (300,400) A (830,860) A (1530,1560) A (1700,1800)

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML15 Containment forest properties The (start, end) interval of each node contains all intervals in its subtree. The start numbers in the forest follow a preorder traversal. The start (end) numbers of sibling nodes are in increasing order. Containment forest can be dynamically maintained. Efficient algorithms for element insertion/deletion

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML16 Structural Join using R-trees (1) The interval (start, end) of an element can be mapped to a point (e.start, e.end) in the 2-D space which is then indexed by an R-tree. An R-tree can also be used to index the element (start, end) ranges as 1-D intervals

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML17 Structural Join using R-trees (2) two pointstwo pages

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML18 Problem Variations Self Joins non-indexed algorithm that traverses the element list exactly once Structural Join in a pipelining environment Feedback between modules can help to skip elements that don’t take part in the join

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML19 Performance Analysis (1) Effect of skipping only ancestors in join performance Join Ancestors no-indexB+B+pspB+spR*R*2 90%182180 190230228 70%150149150155198196 55%132130 140176178 40%109108 114160156 25%8684 90132130 10%7467 70122119

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML20 Performance Analysis (2) Effect of skipping only descendants in join performance

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML21 Performance Analysis (3) Effect of skipping both ancestors and descendants

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML22 Performance Analysis (4) Comparison of B+-tree and B+psp algorithms

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML23 Conclusions We presented efficient ways to perform structural joins over XML data utilizing existing indices. Experimental results showed that among the indexed approaches, the B+-tree with sibling pointers performs the best. Easily maintainable solution that provided drastic improvement over no-index case.

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.

Similar presentations

Presentation on theme: "VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui.

Similar presentations

Presentation on theme: "VLDB'02, Aug 20 Efficient Structural Joins on Indexed XML1 Efficient Structural Joins on Indexed XML Documents Shu-Yao Chien, Zografoula Vagena, Donghui."— Presentation transcript:

Similar presentations

About project

Feedback