Presentation is loading. Please wait.

Presentation is loading. Please wait.

Containment of Nested XML Queries Presented by: Orly Goren Xin Dong, Igor TatarinovAlon Halevy,

Similar presentations


Presentation on theme: "Containment of Nested XML Queries Presented by: Orly Goren Xin Dong, Igor TatarinovAlon Halevy,"— Presentation transcript:

1 Containment of Nested XML Queries Presented by: Orly Goren Xin Dong, Igor TatarinovAlon Halevy,

2 Query Containment The most fundamental relationship between a pair of queries Query Q is contained in Q’ if:  For any database D,  Q(D) is a subset of Q’(D)

3 Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME

4 Applications of Query Containment Semantic caching Determining independence of database updates Query answering using views Detecting that a reformulated query is redundant Query minimization Verification of knowledge bases

5 Query Processing in PDMS XML Query Containment in Peer Data Management System (PDMS)  Answering queries using views to extract remote data  Removing redundant queries to enhance performance M WS M PW M SB M BW QWQW Q W UW Stanford Berkeley UPenn QWQW QPQP Q B1 Q B2 QSQS Q B1 QSQS Q B2 Q B1

6 Query Containment: Relational v.s. XML Relational Input DSets of tuples Output Q(D)A set of tuples Instance containment Q(D)  Q’(D) – Subset Query containment Q  Q’ – for every input D, Q(D)  Q’(D)

7 Query Containment: Relational v.s. XML RelationalXML Input DSets of tuplesAn XML instance tree Output Q(D)A set of tuplesAn XML instance tree Instance containment Q(D)  Q’(D) – Subset Q(D) Q’(D) – Tree embedding Query containment Q  Q’ – for every input D, Q(D)  Q’(D) Q Q’ – for every input D, Q(D) Q’(D)

8 Example – An XML Instance D: Alice Bob project member AliceBob

9 Example – An XML Query Q: for $x in /project return { for $y in $x/member return { where $y=“Alice” return where $y=“Bob” return } D: Q(D): group name group name AliceBob project member AliceBob

10 Example – Another XML Query Q’: for $x in /project return { for $y in /project/member return { where $y=“Alice” return where $y=“Bob” return } D: Q’(D): name group name AliceBob project member AliceBob

11 Tree Embedding Given two trees, a node mapping ψ from T 1 to T 2 is said to be an embedding from T 1 to T 2 if:  ψ maps the root of T 1 to the root of T 2.  If node n 2 is a child of node n 1 in T 1, then ψ (n 2 ) is a child of ψ (n 1 ), and the labels of n 1 and n 2 has the same labels as ψ (n 1 ) and ψ (n 2 ). What is the time complexity of finding an embedding from t 1 to t 2 ?

12 Let e and e’ be two XML instances. e is contained in e’, denoted as e e’, if the tree of e can be embedded in the tree of e’.  Containment is reflexive and transitive.  Containment is not antisymmetric: e e’ and e’ e do not imply e = e’. XML Instance Containment a a b a b Two XML instances that contain each other but are not equivalent.

13 XML Query Containment Let Q and Q’ be two XML queries. Q is contained in Q’, denoted as Q Q’, if for every input XML instance D, Q(D) Q’(D).

14 Q’(D): Q(D): X Example – Tree Embedding and Query Containment Q (D) Q’(D) Q’(D) Q (D) name group name AliceBob group name group name AliceBob Q’(D): Q(D): name group name AliceBob group name group name AliceBob

15 Query Containment Problem From answer containment to query containment Our problems  Given queries Q and Q’, decide whether Q Q’  The complexity of query containment Q’(D) Q (D)  Q’ Q Q (D) Q’(D)  Q Q’

16 Previous Work (I) Relational query containment  Conjunctive queries [Chandra and Merlin, STOC 1977]  Acyclic queries [Yannakakis, VLDB 1981]  Queries with union [Sagiv and Yannakakis, JACM 1980]  Queries with negation [Levy and Sagiv, VLDB 1993]  Queries with arithmetic comparisons [Klug, JACM 1988]  Recursive queries [Shmueli, 1993], [Chaudhuri and Vardi, 1992]  Queries over bags [Ioannidis and Ramakrishnan, 1995]

17 Previous Work (II) XML query containment – two new challenges  XPath containment With *, // and […] [Miklau and Suciu, PODS 2002] With equality testing on tag variables [Deutsch and Tannen, KRDB 2001] Conjunctive queries over path expressions [Florescu, Levy and Suciu, PODS 1998]  Nested query containment

18 Containment Cannot be Determined Solely by Comparing XPath Components Q: for $g in /group where $g/gname/text() = “database” return { for $p in $g/person return {$p/text()} {for $q in $g/paper where $q/author/text() = $p/text() return {$q/title/text()} } } Q’: for $g in /group return { for $p in $g/person return {$p/text()} {$g/gname/text()} {for $q in $g/paper where $q/author/text() = $p/text() return {$q/title/text()} } }

19 Previous Work (II) XML query containment – two new challenges  XPath containment With *, // and […] [Miklau and Suciu, PODS 2002] With equality testing on tag variables [Deutsch and Tannen, KRDB 2001] Conjunctive queries over path expressions [Florescu, Levy and Suciu, PODS 1998]  Nested query containment Complex object query containment [Levy and Suciu, PODS 1997] Containment of nested XML queries has not been fully studied

20 Conjunctive XML Queries (c-XQueries) Returned variables are bound to tag names or text values only. Conjunctive – no two sibling query blocks return the same tag XPath:  HAVE Child axis (/) Wildcards (*) Branches ([…])  NOT HAVE descendant // Arithmetic comparison Union Here, XPath containment is in PTIME

21 Conjunctive Queries – cont. A c-XQuery consists of nested query blocks. The fan-out of a query block is the number of its immediate sub-blocks. The nesting depth of a query is 1 plus the maximal nesting depth if its sub-blocks.  The nesting depth of the query is the depth of its outer-most block.

22 Query Head Tree The structure of an XML query and its answers can be described using a query head tree.  Edges represents query blocks. The label of the node n in the head tree is the returned tag of the block corresponding to the incoming edge of n in Q.  A head tree is also an XML instance if its variables are substituted with actual values.

23 Query Head Tree Example: Q: for $x in /project return { for $s in $x/title/text() return {$s} } { for $t in $x/member/text() return {$t} } Query Head Tree group name projtitles t What is the fan-out and the nesting depth of Q?

24 Constant Conjunctive XML Queries (cc-XQueries) A cc-XQuery is a c-XQuery that does not return tag variables. The head tree of a cc-XQuery has constant labels only.

25 Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME

26 Deciding Q Q’? How to find a property for an infinite number of input XML instances Standard technique  Find a finite set of input representatives – Canonical Databases  Relational query: each canonical database is a minimal input to generate the answer template  XML query answers have infinite number of shapes Find a finite set of answer templates – Canonical Answers

27 Answer Shapes Determined by the Head Tree Q’: for $x in /project return { for $y in /project/member return { where $y=“Alice” return where $y=“Bob” return } Alice Bob Head Tree: group name group name group Alice name group name Bob

28 group Alice name Bob Head Tree: An Additional Candidate Answer name group name AliceBob group name group Alice name group name Bob

29 group Alice name Bob Head Tree: Why Consider the Additional Case name group name AliceBob project member AliceBob Q(D): group name group name AliceBob Q’(D): D:

30 What can Serve as Canonical Answers?  Prefix subtrees of the head tree? – necessary but not sufficient Trees contained in the head tree? – necessary and sufficient – but, too many and too complex 

31 A Head Tree can Have Many Trees Contained in it group name AliceBobAlice group name Alice Bob AliceBob name group Alice Bob AliceBob group name group Alice name Bob Head Tree:

32 What can Serve as Canonical Answers?  Prefix subtrees of the head tree? – necessary but not sufficient  Trees contained in the head tree? – necessary and sufficient – but, too many and too complex Solution: consider only minimal trees that are contained in the head tree

33 Canonical Answer A minimal XML instance: No two sibling subtrees where one is contained in the other Canonical Answer : A minimal XML instance contained in the head tree Every answer A of query Q corresponds to a unique canonical answer CA, s.t. A CA, CA A group name Alice Bob Alice group Alice name Bob  group name AliceBob

34 Canonical Database Canonical Database: DB CA  The minimal XML instance to generate CA project member project member Alice Bob project group name AliceBob CA: DB: for $x in /project return { for $y in /project/member return { where $y=“Alice” return where $y=“Bob” return }

35 Canonical Database – Formal Def. Canonical Database of a cc-XQuery – DB CA. DB CA is an XML instance, s.t. for each node N of CA where N’s generator query block is q n the following holds: Let p 0 /p 1 /…p n be a path expression in q n, where p 0 is an optional node variable from an ancestor query block. For each p i, i [1,n], there is a distinct node, labeled i, that is a child of the node for p i-1. If p 0 is absent, then p 1 is a child of DB CA ’s root.

36 Sound and Complete Conditions for Nested Query Containment Let Q and Q’ be two cc-XQueries. The following three conditions are equivalent: 1. Q Q’ 2. For every canonical database DB of Q, Q(DB) Q’(DB) 3. For every canonical answer CA of Q, a) CA is a canonical answer of Q’ b) DB’ CA DB CA

37 Properties of Canonical Answers and Databases. Lemma 1: Let Q be a cc-XQuery and D be an XML instance. There exist a unique canonical answer CA of Q, s.t. Q(D) CA and CA Q(D). Lemma 2: Let Q be a cc-XQuery, CA be a canonical answer of Q, DB CA be the canonical database for CA of Q, and D be an XML instance. CA Q(D) if only if DB CA D.

38 Containment of cc-XQueries – Proof (1) 1) => 2) Follows from definition. 2) => 3) CA Q(DB CA ) Q(DB CA ) Q’(DB CA ) CA Q’(DB CA ) a) holds. CA is a canonical answer of Q’ (a), CA Q’(DB CA ), DB’ CA DB CA b) holds. Lemma 22) Containment is transitive Lemma 2

39 Containment of cc-XQueries – Proof (2) 3) => 2) To show Q Q’, we need to show for every XML instance D, Q(D) Q’(D). There exists a unique CA of Q, s.t. Q(D) CA and CA Q(D) DB CA D. DB’ CA DB CA DB’ CA D. CA Q’(D) Q(D) Q’(D). Lemma 1 Lemma 2 3) b) transitive Lemma 2 transitive

40 Query Containment Algorithm Algorithm: for every canonical answer CA of Q do 1. check whether CA is a canonical answer of Q’ 2. generate DB CA and DB’ CA 3. check DB’ CA DB CA

41 Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1?? Arbitrary??

42 Query Containment Algorithm Algorithm: for every canonical answer CA of Q do 1. check whether CA is a canonical answer of Q’ 2. generate DB CA and DB’ CA 3. check DB’ CA DB CA Polynomial in the size and number of canonical answers  What are the sizes of canonical answers?  What is the number of canonical answers?

43 Containment of XML Queries with Fanout 1 E.g. d=3 – the depth; m=1 – the maximum fanout Canonical Answers and Complexity  Number: the depth of the query  Size: bounded by the depth of the query  Complexity: O( d·|Q|·|Q’|) Theorem: Testing containment of XML Queries with fanout 1 is in PTIME for $x in /project return {for $y in /project/member return {where $y =“Alice” return } group Alice name group name group Nesting with fanout 1 does not increase complexity

44 Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary??

45 Containment of XML Queries with Arbitrary Fanout E.g. d=4 – the depth; m=3 – the maximum fanout Canonical AnswersComplexity  Number:  Size: Theorem: Testing containment of XML Queries with depth 2 and arbitrary fanout is coNP-hard 1 2 3 1223311223233131121223123 d d-1 d

46 Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable  NOT TIGHT  Query containment in practice  Conclusions Depth Fanout FixedArbitrary = 1PTIME ArbitrarycoNP hard

47 Effect of the Depth on Containment of XML Queries Insight: Kernel Canonical Answer  The root node has a single child  In any subtree, a path pattern is repeated no more than cd times. d – query depth c – #(maximum path steps in a query block) The size of kernel canonical answers  Polynomial in the query size (for fixed nesting depth).  Exponential in the query depth (for arbitrary depth). Theorem:  Testing containment of XML queries with fixed depth is coNP-complete  Testing containment of XML queries with arbitrary depth is in coNEXPTIME

48 Effect of the Depth on Containment of XML Queries – Cont. Lemma 3: Let Q and Q’ be two cc-XQueries. Q Q’ iff for each KCA of Q  1. KCA is a Canonical Answer of Q’.  2. DB’ KCA DB KCA. The size of a KCA is O(bcd) d The number of KCA is O(m (bcd) d )  b = #(query blocks in Q).  m = #(maximum fanout in Q).

49 Effect of the Depth on Containment of XML Queries – Cont. Lemma 3: Let Q and Q’ be two cc-XQueries. Q Q’ iff for each KCA of Q  1. KCA is a Canonical Answer of Q’.  2. DB’ KCA DB KCA. The size of a KCA is O(bcd) d The number of KCA is O(m (bcd) d )  b = #(query blocks in Q).  m = #(maximum fanout in Q).

50 Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME

51 Containment Checking in Practice Analyze element cardinality to reduce the number of canonical answers for containment checking  Given the query structure and the underlying XML database schema, we can infer the cardinality of elements in the query answer. Specifically, CAs are pruned according to the following 3 rules:  1. (=1) The schema implies that the a certain element occurs exactly once under its parent element.  2. (≥1) A schema implies that t will occur at least once under its parent element.  3. (≤1) Schema indicates a certain element occurs at most once under its parent element.

52 Containment Checking in Practice – Example Q: for $g in /group where $g/gname/text() = “database” return { for $p in $g/person return {$p/text()} {for $q in $g/paper where $q/author/text() = $p/text() return {$q/title/text()} } } Q’: for $g in /group return { for $p in $g/person return {$p/text()} {$g/gname/text()} {for $q in $g/paper where $q/author/text() = $p/text() return {$q/title/text()} } } #canonical answers – originally : 71  after analysis : 2

53 Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME

54 An Example Query that Returns Tag Variables for $x in dbGrp return { for $y in $x/proj return { for $u in $y/member return $u/text() for $v in $y/paper return $v/text() }

55 Deciding Query Containment Leverage previous results – simulation mapping [Levy and Suciu, PODS’97] Check query simulation mapping for every canonical answer Complexity  Simulation mapping can be checked in polynomial time in terms of query size  Complexity of checking containment does not arise

56 Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME

57 Other Extensions Query Type No tag variables With tag variables With unions With neg With // With euiq- join on tags With arith comp Un- nested PTIME coNP complete NP complete  2 P complete Fan- out=1 PTIME coNP complete NP complete  2 P complete Fixed- depth coNP complete  2 P complete Generalin coNEXPTIME

58 Conclusions Contributions  A sound and complete condition for containment of nested XML queries  Detailed complexity analysis Future work  Evaluate and optimize the containment algorithm with element cardinality analysis  Answering nested XML queries using views

59


Download ppt "Containment of Nested XML Queries Presented by: Orly Goren Xin Dong, Igor TatarinovAlon Halevy,"

Similar presentations


Ads by Google