Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structural Joins: A Primitive for Efficient XML Query Pattern Matching

Similar presentations


Presentation on theme: "Structural Joins: A Primitive for Efficient XML Query Pattern Matching"— Presentation transcript:

1 Structural Joins: A Primitive for Efficient XML Query Pattern Matching
Al Khalifa et al., ICDE 2002

2 Element Numbering (documentId, startpos:endpos, level)

3 Join Conditions Using Numbering
(D1, S1:E1, L1) (D2, S2:E2, L2) Ancestor-Descendant D1 = D2, S1 < S2 < E2 < E1 Parent-Child D1 = D2, S1 < S2 < E2 < E1, L1 + 1 = L2

4 Tree pattern >> Structural Relationship

5 Structural Join Input Output 2 algorithms presented 2 element lists
Ancestor and descendant; parent and child Sorted by start position Output Pairs of ancestor/descendant or parent/child Sorted by first or second element 2 algorithms presented With and without stacks Both with ordering by ancestor and by descendant

6 Example of results Ancestor Descendant Parent/child
1,20 Ancestor 2,11 12,19 Descendant 3,10 13,18 4,5 6,7 8,9 14,15 16,17 Parent/child Interval representation 1 20 2 11 12 19

7 Tree Merge Join ordered by ancestor

8 TREE 1,26, 2,3 4,13 14,15 16,23 24,25 Skip descendants with START < ancestor.start FOR each ancestor Check/output descendants until START > ancestor.end 5,12 17,22 6,7 8,9 10,11 18,19 20,21 4,13 14,15 5,12 16,23 17,22 4,13 skip loop skip loop 5,12 14,15 skip no match 6,7 8,9 10,11 18,19 20,21 2,3 24,25 Results: [4,13+6,7][4,13+8,9][4,13+10,11] Results: [5,13+6,7][5,13+8,9][5,13+10,11] …

9 Tree Merge Join ordered by descendant

10 TREE 1,26, 2,3 4,13 14,15 16,23 24,25 Skip ancestors with END < descendant.start FOR each descendant Check/output ancestors until START > descendant.end 5,12 17,22 6,7 8,9 10,11 18,19 20,21 Results: [6,7+4,13][6,7+5,12] [8,9+4,13][8,9+5,12] … skip 6,7 skip 8,9 2,3 no match 4,13 14,15 5,12 16,23 17,22 2,3 6,7 8,9 10,11 18,19 20,21 24,25

11 Complexity For ancestor-descendant relationships:
Tree-Merge-Anc time complexity optimal May be quadratic, but proportional to output size But can have poor IO performance For parent-child relationships Tree merge cost may still be quadratic, but output size can only be linear Tree-Merge-Desc can be quadratic in output size

12 Worst-Case Examples a1 has the whole d list as descendants
a2 has from d2 to d2n-1 as descendants and so on Which means: practically quadratic performance (each ancestor has to check the whole descendant list)

13 Worst-Case Examples Equivalent situation considering when considering Tree-Merge-Desc

14 Stack-Tree Algorithm Basic idea: depth first traversal of XML tree
Linear time with stack size = depth of tree All ancestor-descendant relationships appear on stack during traversal Traverse the lists only once Main problem: do not want to traverse the whole database, just nodes in A-list/D-list

15 Stack-Tree-Desc

16 TREE Print in order of descendants
Keep ancestors in the same path in a stack When descendant comes, it is descendant of the whole stack, then print them Pop from stack when a different path is processed e.g. when 14,15 comes, both previous ancestors are popped 1,26, 2,3 4,13 14,15 16,23 24,25 5,12 17,22 6,7 8,9 10,11 18,19 20,21 4,13 5,12 14,15 16,23 17,22 Results: [4,13+6,7] 4,13 Print 8,9 with the whole stack: [4,13+8,9] [4,13+5,12] 5,12 Results: [4,13+6,7] [5,12+6,7] 4,13 5,12 Results: [4,13+6,7] [5,12+6,7] 4,13 skip POP!! and keep going stack 6,7 8,9 10,11 18,19 20,21 2,3 24,25 stack

17 Example of Stack-Tree-Desc Execution

18 Stack-Tree-Anc Basic problem: results from a particular descendant cannot be output immediately Later descendants may match earlier ancestor Solution: keep lists of matching descendant nodes with each stack node Self-list Descendants that match this node Add descendant node to self-lists of all matching ancestor nodes Inherit list Inherited from descendants already popped from stack, to be output after self-list matches are output

19

20 Stack-Tree Analysis Stack-Tree-Desc Stack-Tree-Anc
Time complexity (for anc-desc and par-child) O(|Alist| + |Dlist| + |OutputList|) IO Complexity (for anc-desc and par-child) O(|Alist|/B + |Dlist|/B + |OutputList|/B) Where B is blocking factor Stack-Tree-Anc Requires careful handling of lists Complexity is same as for Desc case

21 Performance Study

22


Download ppt "Structural Joins: A Primitive for Efficient XML Query Pattern Matching"

Similar presentations


Ads by Google