Presentation is loading. Please wait.

Presentation is loading. Please wait.

Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.

Similar presentations


Presentation on theme: "Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov."— Presentation transcript:

1 Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov

2 Outline Problem Statement PathStack Algorithm TwigStack Algorithm Experimental Results

3 Problem Statement Given a query twig pattern Q, and a XML database D, compute ALL the answers to Q in D. Example: QueryXML document

4 Binary Structural Joins The approach –Decompose the twig pattern into binary structural relationships –Use structural join algorithms to match the binary relationships against the XML database –Stitch together the basic matches The problem –The intermediate result sizes can get large, even when the input and output sizes are more manageable.

5 Example QueryXML document

6 Example QueryXML document Decomposition author – fn author – ln fn – jane ln – doe

7 Example Decomposition Number of Intermediate Results 3 QueryXML document author – fn author – ln fn – jane ln – doe

8 Example QueryXML document Decomposition Number of Intermediate Results 3333 author – fn author – ln fn – jane ln – doe

9 Example QueryXML document Decomposition Number of Intermediate Results 332332 author – fn author – ln fn – jane ln – doe

10 Example QueryXML document Decomposition author – fn author – ln fn – jane ln – doe Number of Intermediate Results 33223322

11 Example QueryXML document Decomposition author – fn author – ln fn – jane ln – doe Number of Intermediate Results 33223322 Output 1

12 Holistic Twig Joins The approach –Uses linked stacks to compactly represent partial results to query paths –Merges results to query paths to obtain matches for the twig pattern The advantage –It ensures that no intermediate solutions is larger than the final answer to the query.

13 Example QueryXML document

14 Example Decomposition author – fn – jane author – ln – doe Intermediate Results 1111 Output QueryXML document 1 Number of Intermediate Results author3 – fn3 – jane2 author3 – ln3 – doe2

15 Query isLeaf (author) = false isRoot (author) = true parent (fn) = author children (author) = {fn, ln} subtreeNodes (author) = {fn, ln, jane, doe} XML document Streams T a : a 1, a 2, a 3 T fn : fn 1, fn 3 T ln : ln 2, ln 3 T j : j 1, j 2 T d : d 1, d 2 eof (T a ) = false advance (T a ) => T a : a 1, a 2, a 3 next (T a ) = a 1 nextL (T a ) = 6 nextR (T a ) = 20 Notation Stacks empty (S a ) = false pop (S f ) push (S ln, ln 3, pointer to a 3 ) topL (S a ) = LeftPos of a 3 topR (S a ) = RightPos of a 3

16 Algorithm: PathStack While the streams of the leaves are not empty (i.e. a solution could be found) do: - select the node with minimal LeftPos value and push it into stack - if it is a leaf, print the solution A1B1C1A1B2C1A2B2C1A1B1C1A1B2C1A2B2C1 Intuition:

17 T A : A 1, A 2 T B : B 1, B 2 T C : C 1 Stacks Comments Streams q min = A 06) moveStreamToStack(T A, S A, null)

18 T A : A 1, A 2 T B : B 1, B 2 T C : C 1 Stacks Comments Streams q min = B 06) moveStreamToStack(T B, S B, A 1 )

19 T A : A 1, A 2 T B : B 1, B 2 T C : C 1 Stacks Comments Streams q min = A 06) moveStreamToStack(T A, S A, null)

20 T A : A 1, A 2 T B : B 1, B 2 T C : C 1 Stacks Comments Streams q min = B 06) moveStreamToStack(T B, S B, A 2 )

21 T A : A 1, A 2 T B : B 1, B 2 T C : C 1 Stacks Comments Streams q min = C 06) moveStreamToStack(T C, S C, B 2 )

22 T A : A 1, A 2 T B : B 1, B 2 T C : C 1 Stacks Comments Streams 07) isLeaf(C) = true 08) showSolutions(S C, 1) 09) pop(S C )

23 T A : A 1, A 2 T B : B 1, B 2 T C : C 1 Stacks Comments Streams 01) end(q) = true Algorithm ends.

24 Procedure: showSolutions Intuition: - stacks have the compact encodings of the anwers - output is in leaf-to-root order C1B1A1C1B2A1C1B2A2C1B1A1C1B2A1C1B2A2

25 Analysis: PathStack Correctness –(Theorem 3.1) Given a query path pattern Q and an XML database D, Algorithm PathStack correctly returns all answers for Q on D. Optimality –(Theorem 3.2) Algorithm PathStack has worst case I/O and CPU time complexities linear in the sum of sizes of the input lists and the output list.

26 PathMPMJ A naïve extension of MPMGJN could be to backtrack all possible solutions – PathMPMJNaive A much faster approach is to keep “k” pointers on the streams and prune part of the solutions - PathMPMJ T A = A1, A2, A3… T B = B1, B2 … BK… T C = C1, C2, C3 …

27 PathStack Limitations Merging the path queries for twig joins is not optimal Example: Query result: (a3, fn3, ln3, j2, d2) Query: (a1, fn1, j1) (a3, fn3, j3) (a2, ln2, d2) (a3, ln3, d3)

28 TwigStack Intuition: While the streams of the leaves are not empty (i.e. a solution could be found) do: - select a node that could be expanded to a solution - if it is a leaf, print the solution

29 Streams T a : a1, a2, a3 T fn : fn1, fn2, fn3 T ln : ln1, ln2, ln3 T j : j1, j2 T d : d1, d2 Stacks Comments: Phase1 01: while (notEmpty(T j ) || notEmpty(T d )) do: TwigStack: Example

30 Stacks Comments: iteration1 q act = getNext(a) fn getNext(fn) fn getNext(j) j n min =n max =8 (j1) getNext(ln) ln getNext(d) d n min =n max =26 (d1) advance(ln) n min =7(fn1) n max =ln2 advance(T a ) advance(T fn ) TwigStack: Example Streams T a : a1, a2, a3 T fn : fn1, fn2, fn3 T ln : ln1, ln2, ln3 T j : j1, j2 T d : d1, d2

31 Stacks Comments: iteration2 q act = getNext(a) j getNext(fn) j getNext(j) j n min =n max =8 (j1) getNext(ln) ln getNext(d) d n min =n max =26 (d1) n min =8(j1) n max =ln2 advance(T j ) TwigStack: Example Streams T a : a1, a2, a3 T fn : fn1, fn2, fn3 T ln : ln1, ln2, ln3 T j : j1, j2 T d : d1, d2

32 Stacks Comments: iteration3 q act = getNext(a) ln getNext(fn) fn getNext(j) j n min =n max =43 (j2) advance(fn) getNext(ln) ln getNext(d) d n min =n max =26 (d1) n min =ln2 n max =fn3 advance(T a ) advance(T ln ) TwigStack: Example Streams T a : a1, a2, a3 T fn : fn1, fn2, fn3 T ln : ln1, ln2, ln3 T j : j1, j2 T d : d1, d2

33 Stacks Comments: iteration4 q act = getNext(a) d getNext(fn) fn getNext(j) j n min =n max =43 (j2) getNext(ln) d getNext(d) d n min =n max =26 (d1) n min =26(d1) n max =fn3 advance(T d ) TwigStack: Example Streams T a : a1, a2, a3 T fn : fn1, fn2, fn3 T ln : ln1, ln2, ln3 T j : j1, j2 T d : d1, d2

34 Stacks Comments: iteration5 q act = getNext(a) a getNext(fn) fn getNext(j) j n min =n max =43 (j2) getNext(ln) ln getNext(d) d n min =n max =46 (d2) n min =fn3 n max =ln3 moveStreamToStack(T a ) advance(T a ) TwigStack: Example Streams T a : a1, a2, a3 T fn : fn1, fn2, fn3 T ln : ln1, ln2, ln3 T j : j1, j2 T d : d1, d2

35 Stacks Comments: iteration6 q act = getNext(a) fn getNext(fn) fn getNext(j) j n min =n max =43 (j2) getNext(ln) ln getNext(d) d n min =n max =46 (d2) n min =fn3 n max =ln3 moveStreamToStack(T fn ) advance(T fn ) TwigStack: Example Streams T a : a1, a2, a3 T fn : fn1, fn2, fn3 T ln : ln1, ln2, ln3 T j : j1, j2 T d : d1, d2

36 Stacks Comments: iteration7 q act = getNext(a) j getNext(fn) j getNext(j) j n min =n max =43 (j2) getNext(ln) ln getNext(d) d n min =n max =46 (d2) n min =43(j2) n max =ln3 moveStreamToStack(T j ) advance(T j ) pop(S j ) showSolutionsWithBlocking(j) TwigStack: Example “Merge-joinable” root-to-leaf path: (j2, fn3, a3) Streams T a : a1, a2, a3 T fn : fn1, fn2, fn3 T ln : ln1, ln2, ln3 T j : j1, j2 T d : d1, d2

37 Stacks Comments: iteration8 q act = getNext(a) ln3 getNext(fn) nil getNext(j) nil n min =n max =nil getNext(ln) ln getNext(d) d n min =n max =46 (d2) n min =ln3 n max =ln3 moveStreamToStack(T ln ) advance(T ln ) TwigStack: Example “Merge-joinable” root-to-leaf path: (j2, fn3, a3) Streams T a : a1, a2, a3 T fn : fn1, fn2, fn3 T ln : ln1, ln2, ln3 T j : j1, j2 T d : d1, d2

38 Stacks Comments: iteration9 q act = getNext(a) ln3 getNext(fn) nil getNext(j) nil n min =n max =nil getNext(ln) d getNext(d) d n min =n max =46 (d2) n min =d n max =d moveStreamToStack(T d ) advance(T d ) pop(S d ) showSolutionsWithBlocking(d) TwigStack: Example “Merge-joinable” root-to-leaf paths: (j2, fn3, a3) (d2, ln3, a3) Streams T a : a1, a2, a3 T fn : fn1, fn2, fn3 T ln : ln1, ln2, ln3 T j : j1, j2 T d : d1, d2

39 Stacks Comments: Phase2 12: MergeAllPathSolutions() TwigStack: Example TwigStack solution: (j2, fn3, d2, ln3, a3) Streams T a : a1, a2, a3 T fn : fn1, fn2, fn3 T ln : ln1, ln2, ln3 T j : j1, j2 T d : d1, d2

40 Analysis of TwigStack Let getNext(q) = q N –q N has minimum descendant extension –for all qi subtreeNodes(q N ) next(T qi ) = h qi –Either q=q N or parent(q N ) has no min right extension Any ancestor of q N whose extension uses h qn is returned by getNext before q N => correctness (TwigStack finds all solutions to q) TwigStack is time and space optimal for ancestor-descendant edges

41 Suboptimality for parent-child edges Example final solutions TS Phase1 solutions: (A1, B2, C2) (A2, B1, C1) (A1, B1, C1) (A1, B1, C2) Would be optimal for:

42 TwigStack and XB-Trees XB-Trees - B+ trees with some additional features 1 -Internal nodes have the form [L:R], sorted on L -Parent node interval includes child node intervals -Each page P has pointer P.parent TwigStackXB – same as TwigStack with the following modifications -T q for a query node with an index is now the XB tree rather than a stream -The advance operation is modified according to the pointer act=(actPage,actIndex) - The drilldown operation is introduced 1. “An Evaluation of XML indexes for Structural Join” demonstrates that while all – B+, XR and XB trees build the same tree structure, for “highly recursive” XML XB trees outperform the other two

43 Experimental Results PS vs TS for binary twig queryPS vs TS for parent-child query

44 Questions?


Download ppt "Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov."

Similar presentations


Ads by Google