Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,

Similar presentations


Presentation on theme: "1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,"— Presentation transcript:

1 1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California, San Diego

2 2 query result The problem … views defined by queries V1, …, Vn and materialized as docV1, …, docVn the query Q docVn docV1 V1Vn Can we answer Q using only view access paths? Input XML data INTRO

3 3 The problem views defined by queries V1, …, Vn and materialized as docV1, …, docVn is there a query R such that R(V1( Input ) … Vn( Input )) = Q( Input )? ? query result … the query Q the rewriting query R docVn docV1 V1Vn Input XML data INTRO

4 4 Motivation: caching & indexes caching: answer new queries using results of previously answered ones (partial) indexes: materialized references to frequently accessed parts of the data materialized views, faster to access than the original input query result … the query Q the rewriting query R docVn docV1 V1Vn Input XML data INTRO

5 5 query result Motivation: security views … checking existence of R  security problem: allow only queries that can be expressed in terms of certain permitted queries, the security views the query Q the rewriting query R docVn docV1 V1Vn ? security views (permitted queries) Input XML data INTRO

6 6 query result Motivation: data integration … data integration: given a query expressed in global terms, rewrite it using the descriptions of the particular sources the query Q the rewriting query R source1 sourcen local/global mappings expressed as views INTRO Virtual global DB

7 7 Rewritings enabled by pattern matching Previous literature: find parts of the query that are precomputed by the views. How to decide that: match the patterns of the views into the query –In the relational case, patterns were: tableaux, conjunctive queries –For XPath: tree patterns Matching XML queries? –(until recently) no pattern based description of XQuery semantics –Nested XML Tableaux (NEXT) come to fill the gap The NEXT Logical Framework for XQuery, A.Deutsch et al., VLDB’04 INTRO

8 8 Scope of Our Approach Nested XML Tableaux (NEXT) extend previous work on tree patterns. NEXT+ extends NEXT to the whole XQuery. Tree Patterns  cover XPath NEXT  extend TreePatterns with: - nested for-loops - joins - element construction etc. NEXT+  extends NEXT to the whole XQuery language, including: - function calls - universal quantification - disjunction, negation etc. INTRO

9 9 Scope of Our Approach INTRO Tree Patterns  cover XPath NEXT  extend TreePatterns with: - nested for-loops - joins - element construction etc. NEXT+  extends NEXT to the whole XQuery language, including: - function calls - universal quantification - disjunction, negation etc. soundness guarantee: if a rewriting is found, it is equivalent to the original query completeness guarantee: if a rewriting exists, we will find one

10 10 Query Q: group titles by author for each distinct author, output the titles of his/her books View V: group authors by title for each book, output its title and the list of authors Rewriting using views example Rewriting R scan the view and create an entry for each distinct author in the view output; add to it all the titles of the respective author Data on the Web  bib.xml book title author The result of the view is cached and has faster access time than getting the data directly from the source INTRO

11 11 View V: group authors by title for $b1 in $doc//book, $t1 in $b1/title return {$t1, $b1/author} Rewriting using views example Rewriting R scan the view and create an entry for each distinct author in the view output; add to it all the titles of the respective author INTRO Previous work captures: - XPath navigation Query Q: group titles by author for each distinct author, output the titles of his/her books

12 12 View V: group authors by title for $b1 in $doc//book, $t1 in $b1/title return {$t1, $b1/author} Rewriting using views example Query Q: group titles by author for $a in distinct-values($doc//book[title]/author) return { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } Previous work captures: - XPath navigation NEXT captures: - XPath navigation - nested for loops - joins - element construction etc. INTRO

13 13 View V: group authors by title for $b1 in $doc//book, $t1 in $b1/title return {$t1, $b1/author} Rewriting using views example Query Q: group titles by author for in distinct-values($doc//book[title]/author) return { $a, for $b in $doc//book, $t in $b/title where some in $b/author satisfies $a1 eq $a return $t } INTRO Previous work captures: - XPath navigation NEXT captures: - XPath navigation - nested for loops - joins - element construction etc. $a1 $a

14 14 View V: group authors by title for $b1 in $doc//book, $t1 in $b1/title return {$t1, $b1/author} Rewriting using views example Query Q: group titles by author for $a in distinct-values($doc//book[title]/author) return { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } INTRO Previous work captures: - XPath navigation NEXT captures: - XPath navigation - nested for loops - joins - element construction etc.

15 15 Rewriting using views example Data on the Web  bib.xml book title author Query Q: group titles by author for $a in distinct-values($doc//book[title]/author) return { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } bound to the root of the view output INTRO View V: group authors by title for $b1 in $doc//book, $t1 in $b1/title return {$t1, $b1/author} Rewriting R for $a3 in distinct-values($docV/authorlist[title]/author) return { $a3, for $p in $docV/authorlist, $t3 in $p/title where some $a4 in $p/author satisfies $a4 eq $a3 return $t3 } navigate inside the view output

16 16 Outline NEXT (NEsted XML Tableaux) Rewriting Algorithm and Extensions Experiments Previous Work Conclusions

17 17 Outline NEXT (NEsted XML Tableaux) Rewriting Algorithm and Extensions Experiments Previous Work Conclusions

18 18 Architecture of the NEXT framework Nested XML Tableaux (NEXT) Normalization XQuery query and views Minimization Rewriting Using Views Logical Optimization Plan Execution Engine Logical Plan VLDB’04 presented at this conference NEXT patterns Nested XML Tableaux (NEXT) Translate to XQuery To Any XQuery Processor

19 19 The need for normalization Nested XML Tableaux (NEXT) Normalization XQuery query and views NEXT for $a in distinct-values($doc//book[title]/author) return { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t }

20 20 Normalization into NEXT Nested XML Tableaux (NEXT) Normalization XQuery query and views for $a in distinct-values($doc//book[title]/author) return { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } NEXT for $a in distinct-values($doc//book[title]/author) return { $a, for $b in $doc//book, $a1 in $b/author, $t in $b/title where $a1 eq $a return $t }

21 21 Normalization into NEXT Nested XML Tableaux (NEXT) Normalization XQuery query and views NEXT for $a in distinct-values($doc//book[title]/author) return { $a, for $b in $doc//book, $a1 in $b/author, $t in $b/title where $a1 eq $a groupby [$b], [$t] return $t } for $a in distinct-values($doc//book[title]/author) return { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } cardinality ? NEXT …

22 22 NEXT Patterns book($b1) title($t1) book($b1) author($a2) $t1, B2(V) $a2 B1(V) [$a2] [$b1],[$t1] $doc B2(V) alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns View V: graphical representation of NEXT: nested patterns NEXT B1(V) B2(V) forest of tree patterns for $b1 in $doc//book, $t1 in $b1/title groupby [$b1], [$t1] return {$t1, for $a2 in $b1/author groupby [$a2] return $a2 }

23 23 NEXT Patterns alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns View V: book($b1) title($t1) book($b1) author($a2) $t1, B2(V) $a2 B1(V) [$a2] [$b1],[$t1] $doc B2(V) graphical representation of NEXT: nested patterns NEXT B1(V) B2(V) descendant navigation child navigation for $b1 in $doc//book, $t1 in $b1/title groupby [$b1], [$t1] return {$t1, for $a2 in $b1/author groupby [$a2] return $a2 }

24 24 NEXT Patterns book($b1) title($t1) book($b1) author($a2) $t1, B2(V) $a2 B1(V) [$a2] [$b1],[$t1] $doc B2(V) return function alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns View V: graphical representation of NEXT: nested patterns NEXT B1(V) B2(V) for $b1 in $doc//book, $t1 in $b1/title groupby [$b1], [$t1] return {$t1, for $a2 in $b1/author groupby [$a2] return $a2 }

25 25 NEXT Patterns book($b1) title($t1) book($b1) author($a2) $t1, B2(V) $a2 B1(V) [$a2] [$b1],[$t1] $doc B2(V) list of groupby variable s alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns View V: graphical representation of NEXT: nested patterns NEXT B1(V) B2(V) for $b1 in $doc//book, $t1 in $b1/title groupby [$b1], [$t1] return {$t1, for $a2 in $b1/author groupby [$a2] return $a2 }

26 26 NEXT Patterns alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns book($b1) title($t1) book($b1) author($a2) $doc book($b0) title($t0) Query Q: author($a) book($b) title($t) author($a1) $t1, B2(V) $a2 B1(V) [$a2] [$b1],[$t1] $doc $a, B2(Q) $t B1(Q) $a B2(Q) [$b], [$t] B2(V) for $b0 in $doc//book, $t0 in $b0/title, $a in $b0/author groupby $a return { $a, for $b in $doc//book, $a1 in $b/author, $t in $b/title where $a1 eq $a groupby [$b],[$t] return $t } NEXT View V: graphical representation of NEXT: nested patterns B1(V) B2(V) B1(Q) B2(Q) for $b1 in $doc//book, $t1 in $b1/title groupby [$b1], [$t1] return {$t1, for $a2 in $b1/author groupby [$a2] return $a2 }

27 27 NEXT Patterns alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns book($b1) title($t1) book($b1) author($a2) $doc book($b0) title($t0) author($a) book($b) title($t) author($a1) $t1, B2(V) $a2 B1(V) [$a2] [$b1],[$t1] $doc $a, B2(Q) $t B1(Q) $a B2(Q) [$b], [$t] B2(V) NEXT View V: graphical representation of NEXT: nested patterns Query Q: for $b0 in $doc//book, $t0 in $b0/title, $a in $b0/author groupby $a return { $a, for $b in $doc//book, $a1 in $b/author, $t in $b/title where $a1 eq $a groupby [$b],[$t] return $t } for $b1 in $doc//book, $t1 in $b1/title groupby [$b1], [$t1] return {$t1, for $a2 in $b1/author groupby [$a2] return $a2 }

28 28 Outline NEXT (NEsted XML Tableaux) Rewriting Algorithm and Extensions Experiments Previous Work Conclusions

29 29 Architecture of the NEXT framework Nested XML Tableaux (NEXT) Normalization XQuery query and views Minimization Rewriting Using Views Logical Optimization Plan Execution Engine Logical Plan NEXT Nested XML Tableaux (NEXT) Translate to XQuery Independent XQuery Processor rewriting algorith m

30 30 Overview of the Rewriting Algorithm Input: query Q, views V 1.detect alternative access paths towards the variable bindings through the views 2.build a candidate rewriting R that uses only the access paths from phase 1. 3.check that R is equivalent to Q REWRITING ALGORITHM Query Q Access paths through V Access paths (candidate rewriting)

31 31 Step 1: Detect View Access Paths access paths: ways of accessing data using the view identify matching subqueries (extended tree pattern matching) find a mapping and add navigation from the view return book($b1) title($t1) book($b1) author($a2) $doc book($b0) title($t0) author($a) book($b) title($t) author($a1) $t1, B2(V) $a2 $doc view query body REWRITING ALGORITHM

32 32 Step 1: Detect View Access Paths access paths: ways of accessing data using the view identify matching subqueries (extended tree pattern matching) find a mapping and add navigation from the view return book($b1) title($t1) book($b1) author($a2) $doc book($b0) title($t0) author($a) book($b) title($t) author($a1) $t1, B2(V) $a2 $doc view query body $docV authorlist($p0) title($t2) extended query REWRITING ALGORITHM

33 33 Step 1: Detect View Access Paths access paths: ways of accessing data using the view identify matching subqueries (extended tree pattern matching) find a mapping and add navigation from the view return and another one… book($b1) title($t1) book($b1) author($a2) $doc book($b0) title($t0) author($a) book($b) title($t) author($a1) $t1, B2(V) $a2 $doc view query body $docV authorlist($p0) extended query author($a3) title($t2) REWRITING ALGORITHM

34 34 Step 1: Detect View Access Paths access paths: ways of accessing data using the view identify matching subqueries (extended tree pattern matching) find a mapping and add navigation from the view return and another one… computing all such mappings  query extension that uses only view access paths book($b1) title($t1) book($b1) author($a2) $doc book($b0) title($t0) author($a) book($b) title($t) author($a1) $t1, B2(V) $a2 $doc view query body extended query $docV authorlist($p0) title($t2) author($a3) authorlist($p) title($t3) author($a4) $docV query extension REWRITING ALGORITHM

35 35 Step 2: Candidate Rewriting same return function as the initial query, but with other variable bindings $doc book($b0) title($t0) author($a) book($b) title($t) author($a1) $doc original query $docV authorlist($p0) title($t2) author($a3) authorlist($p) title($t3) author($a4) $docV extended query $a, B2(Q) $t B1(Q) $a B2(Q) [$b], [$t] REWRITING ALGORITHM

36 36 Step 2: Candidate Rewriting same return function as the initial query, but with other variable bindings $doc book($b0) title($t0) author($a) book($b) title($t) author($a1) $doc original query $docV authorlist($p0) title($t2) author($a3) authorlist($p) title($t3) author($a4) $docV $a3, B2(R) $t3 B1(R) B2(R) $a3 [$t3] candidate rewriting B1(Q) $a B2(Q) [$b], [$t] REWRITING ALGORITHM

37 37 Step 3: Equivalence Check check that R ≡ Q: containment mappings defined on the tree of query blocks and then (optional step) translate back to XQuery: $docV authorlist($p0) title($t2) author($a3) authorlist($p) title($t3) author($a4) $docV $a3, B2(R) $t3 B1(R) B2(R) $a3 [$t3] Rewriting R: for $a3 in distinct-values ($docV/authorlist[title]/author) return { $a3, for $p in $docV/authorlist, $t3 in $p/title where some $a4 in $p/author satisfies $a4 eq $a3 return $p } REWRITING ALGORITHM

38 38 Under the Hood two types of equality: by value and by node id –mappings must take it into consideration –the groupby clause also XQuery results have order. We consider rewritings that: –do not respect order (for DB-centric applications) –respect order (for text-centric applications) for rewritings that respect order: look for an ordering of the view access paths that preserves the original query order (details in the paper) REWRITING ALGORITHM

39 39 for $x in $doc/book where count( for $a in $x/author where $x/price eq 60 groupby [$a] return $a ) eq count( …) groupby $x return $x Extensions to NEXT Extended NEXT to NEXT+: –extend the pattern based representation to the whole XQuery –functions and other expressions (negation, disjunction, aggregates etc.) modeled as uninterpreted functions Extended the algorithm to use NEXT+: need to identify maximal subparts that are pure NEXT blocks REWRITING ALGORITHM

40 40 Extensions to NEXT Extended NEXT to NEXT+: –extend the pattern based representation to the whole XQuery –functions and other expressions (negation, disjunction, aggregates etc.) modeled as uninterpreted functions Extended the algorithm to use NEXT+: need to identify maximal subparts that are pure NEXT blocks. REWRITING ALGORITHM for $x in $doc/book where count( for $a in $x/author where $x/price eq 60 groupby [$a] return $a ) eq count( …) groupby $x return $x rewrite blocks inside function arguments, with free variables bound in upper blocks rewrite outer block, disregarding function calls

41 41 The rewriting algorithm is sound and complete for a large fragment of XQuery (the one that can be translated into NEXT), without order –Completeness means that if there are any rewritings, we are guaranteed to find at least one. There is no hope for completeness for –ordered rewritings: equivalence is undecidable –expressions beyond NEXT: negation and universal quantification also lead to undecidability  In these cases, our algorithm is a best effort approach, with guaranteed soundness. Formal Guarantees REWRITING ALGORITHM

42 42 Implementation (considerations) completeness guarantees  a price to pay: compute mappings between view and query patterns in general, NP-complete, but PTIME if the patterns are trees (no equality conditions): based on M. Yanakakis, Algorithms for acyclic database schemes, 1981 our goal: design an implementation whose running time is polynomial for pure tree patterns and degrades progressively with the number of added joins REWRITING ALGORITHM

43 43 Implementation in practice when computing the query plan, apply techniques from the Yanakakis algorithm: push projections & selections performance degrades with the number of equalities: the problem is NP-complete in the width of the view pattern (see the paper) and in PTIME when no join equalities. V query plan (SPJ) Q XML instance compile evaluate..… mappings REWRITING ALGORITHM compile

44 44 Outline NEXT (NEsted XML Tableaux) Rewriting Algorithm and Extensions Experiments Previous Work Conclusions

45 45 Experiments: Design The running time of the algorithm increases with: –number of nested levels: mappings are block by block –size of the pattern: # of mapped and target nodes increases –number of views: more patterns to match Our experiments measured how the algorithm scales with these parameters. We designed a configuration where we generated queries and views of increasing size and nesting depth. EXPERIMENTS

46 46 Experiments: Implementation Queries & views with similar basic patterns, in a vertical chain of blocks: $doc mkmk a c1c1 mkmk a c2c2 m k+1 a c1c1 $doc m k+1 a c2c2 ….. basic pattern $doc mkmk a cici Irrelevant views don’t matter (can be quickly discarded).  We create only relevant views (with mappings into query): –split the query recursively into fragments = views –make them overlap on basic patterns EXPERIMENTS block B k+1 block B k

47 47 Experiments: Good Scalability d = depth (# of nested levels in a query) b = breadth (# of basic patterns in a block) EXPERIMENTS 1.25s for d=16, b=16 and 128 views

48 48 Previous work rewriting XPath queries using XPath views Rewriting XPath Queries Using Materialized Views W.Xu et al. VLDB 2005 rewriting XQuery using XPath views A Framework for Using Materialized XPath Views in XML Query Processing A. Balmin et al. VLDB 2004 rewrite an XQuery with only one XQuery view that has to contain the query ACE-XQ: A CachE-aware XQuery Answering System L.Chen et al. WebDB 2002 caching common XQuery subexpressions Implementing Memoization in a Streaming XQuery Processor Y.Diao et al. XSym 2004

49 49 Conclusions NEXT is a pattern based representation that describes what the query result is and not how it is computed  more opportunities for semantic optimizations extensible to all of XQuery, using NEXT+ rewriting using views algorithm –sound for the whole language –complete for a large fragment of XQuery –good scalability –independent of the underlying algebra of the query processor

50 50 Online Demo http://db.ucsd.edu/reform


Download ppt "1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,"

Similar presentations


Ads by Google