Download presentation
Presentation is loading. Please wait.
Published bySkyla Comfort Modified over 9 years ago
1
Containment of Nested XML Queries Xin (Luna) Dong, Alon Halevy, Igor Tatarinov University of Washington
2
Query Containment The most fundamental relationship between a pair of queries Query Q is contained in Q’ if: For any database D, Q(D) is a subset of Q’(D)
3
Applications of Query Containment Semantic caching Reasoning about contents of data sources in data integration Verification of integrity constraints Verification of knowledge bases Determining queries independent of updates Query answering using views
4
Query Processing in PDMS XML Query Containment in Peer Data Management System (PDMS) Answering queries using views to extract remote data Removing redundant queries to enhance performance [Tatarinov and Halevy, SIGMOD 2004] M WS M PW M SB M BW QWQW Q W UW Stanford Berkeley UPenn QWQW QPQP Q B1 Q B2 QSQS Q B1 QSQS Q B2 Q B1
5
Query Containment: Relational v.s. XML Relational Input DSets of tuples Output Q(D)A set of tuples Instance containment Q(D) Q’(D) – Subset Query containment Q Q’ – for every input D, Q(D) Q’(D)
6
Query Containment: Relational v.s. XML RelationalXML Input DSets of tuplesAn XML instance tree Output Q(D)A set of tuplesAn XML instance tree Instance containment Q(D) Q’(D) – Subset Q(D) Q’(D) – Tree homomorphism Query containment Q Q’ – for every input D, Q(D) Q’(D) Q Q’ – for every input D, Q(D) Q’(D)
7
Example – An XML Instance D: Alice Bob project member AliceBob
8
Example – An XML Query Q: for $x in /project return { for $y in $x/member return { where $y=“Alice” return where $y=“Bob” return } D: Q(D): group name group name AliceBob project member AliceBob
9
Example – Another XML Query Q’: for $x in /project return { for $y in /project/member return { where $y=“Alice” return where $y=“Bob” return } D: Q’(D): name group name AliceBob project member AliceBob
10
Q’(D): Q(D): X Example – Tree Homomorphism and Query Containment Q (D) Q’(D) Q’(D) Q (D) name group name AliceBob group name group name AliceBob Q’(D): Q(D): name group name AliceBob group name group name AliceBob
11
Query Containment Problem From answer containment to query containment Our problems Given queries Q and Q’, decide whether Q Q’ The complexity of query containment Q’(D) Q (D) Q’ Q Q (D) Q’(D) Q Q’
12
Previous Work (I) Relational query containment Conjunctive queries [Chandra and Merlin, STOC 1977] Acyclic queries [Yannakakis, VLDB 1981] Queries with union [Sagiv and Yannakakis, JACM 1980] Queries with negation [Levy and Sagiv, VLDB 1993] Queries with arithmetic comparisons [Klug, JACM 1988] Recursive queries [Shmueli, 1993], [Chaudhuri and Vardi, 1992] Queries over bags [Ioannidis and Ramakrishnan, 1995]
13
Previous Work (II) XML query containment – two new challenges XPath containment With *, // and […] [Miklau and Suciu, PODS 2002] With equality testing on tag variables [Deutsch and Tannen, KRDB 2001] Conjunctive queries over path expressions [Florescu, Levy and Suciu, PODS 1998] Nested query containment
14
Containment Cannot be Determined Solely by Comparing XPath Components Q: for $g in /group where $g/gname/text() = “database” return { for $p in $g/person return {$p/text()} {for $q in $g/paper where $q/author/text() = $p/text() return {$q/title/text()} } } Q’: for $g in /group return { for $p in $g/person return {$p/text()} {$g/gname/text()} {for $q in $g/paper where $q/author/text() = $p/text() return {$q/title/text()} } }
15
Previous Work (II) XML query containment – two new challenges XPath containment With *, // and […] [Miklau and Suciu, PODS 2002] With equality testing on tag variables [Deutsch and Tannen, KRDB 2001] Conjunctive queries over path expressions [Florescu, Levy and Suciu, PODS 1998] Nested query containment Complex object query containment [Levy and Suciu, PODS 1997] Containment of nested XML queries has not been fully studied
16
Our Focus: Nested XML Queries Returned tag constants Conjunctive – no two sibling query blocks return the same tag XPath: HAVE Child axis (/) Wildcards (*) Branches ([…]) NOT HAVE descendant // Arithmetic comparison Union Here, XPath containment is in PTIME
17
Complexity Result (I) Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME
18
Complexity Result (II) Query Type No tag variables With tag variables With unions With neg With // With euiq- join on tags With arith comp Un- nested PTIME coNP complete NP complete 2 P complete Fan- out=1 PTIME Fixed- depth coNP complete General in coNEXP TIME
19
Complexity Result (II) Query Type No tag variables With tag variables With unions With neg With // With euiq- join on tags With arith comp Un- nested PTIME coNP complete NP complete 2 P complete Fan- out=1 PTIME Fixed- depth coNP complete General in coNEXP TIME
20
Complexity Result (II) Query Type No tag variables With tag variables With unions With neg With // With euiq- join on tags With arith comp Un- nested PTIME coNP complete NP complete 2 P complete Fan- out=1 PTIME coNP complete NP complete 2 P complete Fixed- depth coNP complete 2 P complete Generalin coNEXPTIME
21
Roadmap Introduction and problem definition Containment of a subset of XML queries Query containment is decidable Query containment in practice Relaxing the assumptions Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME
22
Deciding Q Q’? How to find a property for an infinite number of input XML instances Standard technique Find a finite set of input representatives – Canonical Databases Relational query: each canonical database is a minimal input to generate the answer template XML query answers have infinite number of shapes Find a finite set of answer templates – Canonical Answers
23
Answer Shapes Determined by the Head Tree Q’: for $x in /project return { for $y in /project/member return { where $y=“Alice” return where $y=“Bob” return } Alice Bob Head Tree: group name group name group Alice name group name Bob
24
group Alice name Bob Head Tree: An Additional Candidate Answer name group name AliceBob group name group Alice name group name Bob
25
group Alice name Bob Head Tree: Why Consider the Additional Case name group name AliceBob project member AliceBob Q(D): group name group name AliceBob Q’(D): D:
26
What can Serve as Canonical Answers? Prefix subtrees of the head tree? – necessary but not sufficient Trees contained in the head tree? – necessary and sufficient – but, too many and too complex
27
A Head Tree can Have Many Trees Contained in it group name AliceBobAlice group name Alice Bob AliceBob name group Alice Bob AliceBob group name group Alice name Bob Head Tree:
28
What can Serve as Canonical Answers? Prefix subtrees of the head tree? – necessary but not sufficient Trees contained in the head tree? – necessary and sufficient – but, too many and too complex Our solution: consider only minimal trees that are contained in the head tree
29
Canonical Answer A minimal XML instance: No two sibling subtrees where one is contained in the other Canonical Answer : A minimal XML instance contained in the head tree Every answer A of query Q corresponds to a unique canonical answer CA, s.t. A CA, CA A group name Alice Bob Alice group Alice name Bob group name AliceBob
30
Canonical Database Canonical Database: DB CA The minimal XML instance to generate CA project member project member Alice Bob project group name AliceBob CA: DB: for $x in /project return { for $y in /project/member return { where $y=“Alice” return where $y=“Bob” return }
31
Sound and Complete Conditions for Nested Query Containment Theorem 1. Q Q’, if and only if for every canonical database DB of Q, Q(DB) Q’(DB) Theorem 2. Q Q’, if and only if for every canonical answer CA of Q, CA is a canonical answer of Q’ DB’ CA DB CA
32
Query Containment Algorithm Algorithm: for every canonical answer CA of Q do 1. check whether CA is a canonical answer of Q’ 2. generate DB CA and DB’ CA 3. check DB’ CA DB CA
33
Roadmap Introduction and problem definition Containment of a subset of XML queries Query containment is decidable Query containment in practice Relaxing the assumptions Conclusions Depth Fanout FixedArbitrary = 1?? Arbitrary??
34
Query Containment Algorithm Algorithm: for every canonical answer CA of Q do 1. check whether CA is a canonical answer of Q’ 2. generate DB CA and DB’ CA 3. check DB’ CA DB CA Polynomial in the size and number of canonical answers What are the sizes of canonical answers? What is the number of canonical answers?
35
Containment of XML Queries with Fanout 1 E.g. d=3 – the depth; m=1 – the maximum fanout Canonical Answers and Complexity Number: the depth of the query Size: bounded by the depth of the query Complexity: O( d·|Q|·|Q’|) Theorem: Testing containment of XML Queries with fanout 1 is in PTIME for $x in /project return {for $y in /project/member return {where $y =“Alice” return } group Alice name group name group Nesting with fanout 1 does not increase complexity
36
Roadmap Introduction and problem definition Containment of a subset of XML queries Query containment is decidable Query containment in practice Relaxing the assumptions Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary??
37
Containment of XML Queries with Arbitrary Fanout E.g. d=4 – the depth; m=3 – the maximum fanout Canonical AnswersComplexity Number: Size: Theorem: Testing containment of XML Queries with depth 2 and arbitrary fanout is coNP-hard 1 2 3 1223311223233131121223123 d-1 d-2 d-1
38
Roadmap Introduction and problem definition Containment of a subset of XML queries Query containment is decidable NOT TIGHT Query containment in practice Relaxing the assumptions Conclusions Depth Fanout FixedArbitrary = 1PTIME ArbitrarycoNP hard
39
Effect of the Depth on Containment of XML Queries Insight: Kernel Canonical Answer The root node has a single child In any subtree, a path pattern is repeated no more than cd times. d – query depth c – #(maximum path steps in a query block) The size of kernel canonical answers Polynomial in the query size Exponential in the query depth Theorem: Testing containment of XML queries with fixed depth is coNP-complete Testing containment of XML queries with arbitrary depth is in coNEXPTIME
40
Roadmap Introduction and problem definition Containment of a subset of XML queries Query containment is decidable Query containment in practice Relaxing the assumptions Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME
41
Containment Checking in Practice Q: for $g in /group where $g/gname/text() = “database” return { for $p in $g/person return {$p/text()} {for $q in $g/paper where $q/author/text() = $p/text() return {$q/title/text()} } } Q’: for $g in /group return { for $p in $g/person return {$p/text()} {$g/gname/text()} {for $q in $g/paper where $q/author/text() = $p/text() return {$q/title/text()} } } Analyze element cardinality to reduce the number of canonical answers for containment checking #canonical answers – originally : 71 after analysis : 2
42
Roadmap Introduction and problem definition Containment of a subset of XML queries Query containment is decidable Query containment in practice Relaxing the assumptions Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME
43
An Example Query that Returns Tag Variables for $x in dbGrp return { for $y in $x/proj return { for $u in $y/member return $u/text() for $v in $y/paper return $v/text() }
44
Deciding Query Containment Leverage previous results – simulation mapping [Levy and Suciu, PODS’97] Check query simulation mapping for every canonical answer Complexity Simulation mapping can be checked in polynomial time in terms of query size Complexity of checking containment does not arise
45
Other Extensions Query Type No tag variables With tag variables With unions With neg With // With euiq- join on tags With arith comp Un- nested PTIME coNP complete NP complete 2 P complete Fan- out=1 PTIME coNP complete NP complete 2 P complete Fixed- depth coNP complete 2 P complete Generalin coNEXPTIME
46
Conclusions Contributions A sound and complete condition for containment of nested XML queries Detailed complexity analysis Future work Fill in the open gap of complexity in case of queries with arbitrary fanout and arbitrary nesting depth Evaluate and optimize the containment algorithm with element cardinality analysis Answering nested XML queries using views
47
Containment of Nested XML Queries @VLDB 2004 Xin (Luna) Dong, Alon Halevy, Igor Tatarinov University of Washington www.cs.washington.edu/homes/lunadong
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.