Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Structural Join Algorithms – Examples Key property: x is a descendant (resp., child) of y iff x.docId = y.docId & x.StartPos < y.StartPos <= y.EndPos.

Similar presentations


Presentation on theme: "1 Structural Join Algorithms – Examples Key property: x is a descendant (resp., child) of y iff x.docId = y.docId & x.StartPos < y.StartPos <= y.EndPos."— Presentation transcript:

1 1 Structural Join Algorithms – Examples Key property: x is a descendant (resp., child) of y iff x.docId = y.docId & x.StartPos < y.StartPos <= y.EndPos < x.EndPos (and y.Level = x.Level+1). A node n for us is (D, S:E, L). Call this node id for convenience. What is structural join?  given lists Alist and Dlist of nodes  output pairs (x,y) of nodes [x in Alist, y in Dlist], s.t. x is a of y.  frequently, assume i/p lists are ordered by node id.  might want to order o/p by first operand(‘s node id) or second. (what diff. does it make?) TPQ = compute several SJs and stitch ‘em together.

2 2 SJ variants There is also a so-called holistic join algorithm (Bruno, Koudas, and Srivastava SIGMOD 2002).  Extend binary join ideas to finding matches for paths/twigs. In XML query processing, also need following variants of SJ:  Given Alist and Dlist, whenever x in Alist has a relative y in Dlist, output (x,y); else just output x. (structural outerjoin).  Given …, output x in Alist whenever there exists y in Dlist such that y is a relative of x. (structural semijoin.)  Given …, output x in Alist whenever it has no relative y in Dlist. (structural semi-antijoin.)

3 3 Tree-Merge Join (ordered by ancestor) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), …, (a1,d6),

4 4 a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), …, (a1,d6), (a2,d1), Tree-Merge Join (ordered by ancestor)

5 5 Tree-Merge Join (ordered by anc). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), …, (a1,d6), (a2,d1),(a3,d2), (a3,d3),

6 6 Tree-Merge Join (O.B. anc). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), …, (a1,d6), (a2,d1),(a3,d2), (a3,d3), (a4,d5), (a4,d6).

7 7 Tree-Merge Join (ordered by descendant). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), (a2,d1),

8 8 Tree-Merge Join (ordered by descendant). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), (a2,d1), (a1,d2), (a3,d2),

9 9 Tree-Merge Join (ordered by descendant). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3),

10 10 Tree-Merge Join (ordered by descendant). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4),

11 11 Tree-Merge Join (ordered by descendant). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4), (a1,d5), (a4,d5),

12 12 Tree-Merge Join (ordered by descendant). a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4), (a1,d5), (a4,d5), (a1,d6),(a4,d6).

13 13 Which is more efficient? Tree-Merge-anc: time and space complexity – O(|Alist| + |Dlist| + |OutputList|). Note: it is not quadratic in input size. However, Tree-Merge-desc has quadratic worst-case time complexity.  Saw some evidence in previous example.  Here is another “bad” input: What is amount of the work done by Tree-Merge-desc on this input? a0 a1 a2 an d1d2dn

14 14 More analysis a1 a2 an d1 d2 dndn+1 d2n-1 d2n What about finding (par,child) pairs? Does the same upper bound apply for T-M-par? Consider the input below. The size of the o/p list is O(|Alist| + |Dlist|). What’s the amount of work done by T-M-par on this input? A breed of stack-tree SJ algorithms have been developed to overcome the deficiencies of T-M algorithms.

15 15 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1

16 16 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 a2

17 17 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 a2 (a1,d1), (a2,d1),

18 18 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1),

19 19 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1), a3

20 20 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1), (a1,d2), (a3,d2), a3

21 21 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), a3

22 22 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4),

23 23 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4), (a1,d5), (a4,d5), a4

24 24 Stack-Tree Join (ordered by descendant) a1 a2 d1 a3 a4 d2d3 d4 d6 d5 a1 a2 a3 a4 d1 d6 d4 d2 d3 d5 output a1 (a1,d1), (a2,d1), (a1,d2), (a3,d2), (a1,d3), (a3,d3), (a1,d4), (a1,d5), (a4,d5),(a1,d6), (a4,d6). a4 Time & space complexity: O(|Alist| + |Dlist| + |Outputlist|). (for both ad and pc relationships!) Unlike T-M-anc, I/O complexity is similarly bounded (modulo blocking factor). Can handle streaming i/p lists: non-blocking algorithm. Stack-Tree-anc is similar with similar bounds.

25 25 Extensions Can you adapt the SJ algorithms to handle SJ variants mentioned before? Can you make the Tree-Merge algorithms more efficient, e.g., by bookkeeping? We have seen, a TPQ = a sequence of joins on the results of SJs; what’s the best way to order these joins? Can we reuse join order optimization from relational DB optimization? (what’s a right cost model?) What if (universal) quantifiers are present? How can we handle aggregation?


Download ppt "1 Structural Join Algorithms – Examples Key property: x is a descendant (resp., child) of y iff x.docId = y.docId & x.StartPos < y.StartPos <= y.EndPos."

Similar presentations


Ads by Google