Containment of Nested XML Queries Presented by: Orly Goren Xin Dong, Igor TatarinovAlon Halevy,

Slides:



Advertisements
Similar presentations
Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh.
Advertisements

Completeness and Expressiveness
Review: Search problem formulation
Recap: Mining association rules from large datasets
XML: Extensible Markup Language
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
Fast Algorithms For Hierarchical Range Histogram Constructions
Determinization of Büchi Automata
Containment of Nested XML Queries Xin (Luna) Dong, Alon Halevy, Igor Tatarinov University of Washington.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Efficient Query Evaluation on Probabilistic Databases
1 Conditional XPath, the first order complete XPath dialect Maarten Marx Presented by: Einav Bar-Ner.
1 Answering Queries Using Views Alon Y. Halevy Based on Levy et al. PODS ‘95.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Data Flow Analysis Compiler Design Nov. 3, 2005.
The Theory of NP-Completeness
A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
NP-Complete Problems Problems in Computer Science are classified into
Winter 2004/5Pls – inductive – Catriel Beeri1 Inductive Definitions (our meta-language for specifications)  Examples  Syntax  Semantics  Proof Trees.
Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.
Near-Optimal Network Design with Selfish Agents By Elliot Anshelevich, Anirban Dasgupta, Eva Tardos, Tom Wexler STOC’03 Presented by Mustafa Suleyman CIFTCI.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Chapter 11: Limitations of Algorithmic Power
Managing XML and Semistructured Data Lecture 16: Indexes Prof. Dan Suciu Spring 2001.
Inbal Yahav A Framework for Using Materialized XPath Views in XML Query Processing VLDB ‘04 DB Seminar, Spring 2005 By: Andrey Balmin Fatma Ozcan Kevin.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Selective and Authentic Third-Party distribution of XML Documents - Yashaswini Harsha Kumar - Netaji Mandava (Oct 16 th 2006)
Querying Structured Text in an XML Database By Xuemei Luo.
Logic Circuits Chapter 2. Overview  Many important functions computed with straight-line programs No loops nor branches Conveniently described with circuits.
Approximation Algorithms
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
Presented by Jiwen Sun, Lihui Zhao 24/3/2004
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA.
Algorithms for hard problems WQO theory and applications to parameterized complexity Juris Viksna, 2015.
Set Theory Concepts Set – A collection of “elements” (objects, members) denoted by upper case letters A, B, etc. elements are lower case brackets are used.
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
Introduction to NP Instructor: Neelima Gupta 1.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
Approximation Algorithms based on linear programming.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Computability and Complexity
ICS 353: Design and Analysis of Algorithms
Complexity 6-1 The Class P Complexity Andrei Bulatov.
Chapter 11 Limitations of Algorithm Power
Normalization cs3431.
Materializing Views With Minimal Size To Answer Queries
Switching Lemmas and Proof Complexity
Presentation transcript:

Containment of Nested XML Queries Presented by: Orly Goren Xin Dong, Igor TatarinovAlon Halevy,

Query Containment The most fundamental relationship between a pair of queries Query Q is contained in Q’ if:  For any database D,  Q(D) is a subset of Q’(D)

Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME

Applications of Query Containment Semantic caching Determining independence of database updates Query answering using views Detecting that a reformulated query is redundant Query minimization Verification of knowledge bases

Query Processing in PDMS XML Query Containment in Peer Data Management System (PDMS)  Answering queries using views to extract remote data  Removing redundant queries to enhance performance M WS M PW M SB M BW QWQW Q W UW Stanford Berkeley UPenn QWQW QPQP Q B1 Q B2 QSQS Q B1 QSQS Q B2 Q B1

Query Containment: Relational v.s. XML Relational Input DSets of tuples Output Q(D)A set of tuples Instance containment Q(D)  Q’(D) – Subset Query containment Q  Q’ – for every input D, Q(D)  Q’(D)

Query Containment: Relational v.s. XML RelationalXML Input DSets of tuplesAn XML instance tree Output Q(D)A set of tuplesAn XML instance tree Instance containment Q(D)  Q’(D) – Subset Q(D) Q’(D) – Tree embedding Query containment Q  Q’ – for every input D, Q(D)  Q’(D) Q Q’ – for every input D, Q(D) Q’(D)

Example – An XML Instance D: Alice Bob project member AliceBob

Example – An XML Query Q: for $x in /project return { for $y in $x/member return { where $y=“Alice” return where $y=“Bob” return } D: Q(D): group name group name AliceBob project member AliceBob

Example – Another XML Query Q’: for $x in /project return { for $y in /project/member return { where $y=“Alice” return where $y=“Bob” return } D: Q’(D): name group name AliceBob project member AliceBob

Tree Embedding Given two trees, a node mapping ψ from T 1 to T 2 is said to be an embedding from T 1 to T 2 if:  ψ maps the root of T 1 to the root of T 2.  If node n 2 is a child of node n 1 in T 1, then ψ (n 2 ) is a child of ψ (n 1 ), and the labels of n 1 and n 2 has the same labels as ψ (n 1 ) and ψ (n 2 ). What is the time complexity of finding an embedding from t 1 to t 2 ?

Let e and e’ be two XML instances. e is contained in e’, denoted as e e’, if the tree of e can be embedded in the tree of e’.  Containment is reflexive and transitive.  Containment is not antisymmetric: e e’ and e’ e do not imply e = e’. XML Instance Containment a a b a b Two XML instances that contain each other but are not equivalent.

XML Query Containment Let Q and Q’ be two XML queries. Q is contained in Q’, denoted as Q Q’, if for every input XML instance D, Q(D) Q’(D).

Q’(D): Q(D): X Example – Tree Embedding and Query Containment Q (D) Q’(D) Q’(D) Q (D) name group name AliceBob group name group name AliceBob Q’(D): Q(D): name group name AliceBob group name group name AliceBob

Query Containment Problem From answer containment to query containment Our problems  Given queries Q and Q’, decide whether Q Q’  The complexity of query containment Q’(D) Q (D)  Q’ Q Q (D) Q’(D)  Q Q’

Previous Work (I) Relational query containment  Conjunctive queries [Chandra and Merlin, STOC 1977]  Acyclic queries [Yannakakis, VLDB 1981]  Queries with union [Sagiv and Yannakakis, JACM 1980]  Queries with negation [Levy and Sagiv, VLDB 1993]  Queries with arithmetic comparisons [Klug, JACM 1988]  Recursive queries [Shmueli, 1993], [Chaudhuri and Vardi, 1992]  Queries over bags [Ioannidis and Ramakrishnan, 1995]

Previous Work (II) XML query containment – two new challenges  XPath containment With *, // and […] [Miklau and Suciu, PODS 2002] With equality testing on tag variables [Deutsch and Tannen, KRDB 2001] Conjunctive queries over path expressions [Florescu, Levy and Suciu, PODS 1998]  Nested query containment

Containment Cannot be Determined Solely by Comparing XPath Components Q: for $g in /group where $g/gname/text() = “database” return { for $p in $g/person return {$p/text()} {for $q in $g/paper where $q/author/text() = $p/text() return {$q/title/text()} } } Q’: for $g in /group return { for $p in $g/person return {$p/text()} {$g/gname/text()} {for $q in $g/paper where $q/author/text() = $p/text() return {$q/title/text()} } }

Previous Work (II) XML query containment – two new challenges  XPath containment With *, // and […] [Miklau and Suciu, PODS 2002] With equality testing on tag variables [Deutsch and Tannen, KRDB 2001] Conjunctive queries over path expressions [Florescu, Levy and Suciu, PODS 1998]  Nested query containment Complex object query containment [Levy and Suciu, PODS 1997] Containment of nested XML queries has not been fully studied

Conjunctive XML Queries (c-XQueries) Returned variables are bound to tag names or text values only. Conjunctive – no two sibling query blocks return the same tag XPath:  HAVE Child axis (/) Wildcards (*) Branches ([…])  NOT HAVE descendant // Arithmetic comparison Union Here, XPath containment is in PTIME

Conjunctive Queries – cont. A c-XQuery consists of nested query blocks. The fan-out of a query block is the number of its immediate sub-blocks. The nesting depth of a query is 1 plus the maximal nesting depth if its sub-blocks.  The nesting depth of the query is the depth of its outer-most block.

Query Head Tree The structure of an XML query and its answers can be described using a query head tree.  Edges represents query blocks. The label of the node n in the head tree is the returned tag of the block corresponding to the incoming edge of n in Q.  A head tree is also an XML instance if its variables are substituted with actual values.

Query Head Tree Example: Q: for $x in /project return { for $s in $x/title/text() return {$s} } { for $t in $x/member/text() return {$t} } Query Head Tree group name projtitles t What is the fan-out and the nesting depth of Q?

Constant Conjunctive XML Queries (cc-XQueries) A cc-XQuery is a c-XQuery that does not return tag variables. The head tree of a cc-XQuery has constant labels only.

Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME

Deciding Q Q’? How to find a property for an infinite number of input XML instances Standard technique  Find a finite set of input representatives – Canonical Databases  Relational query: each canonical database is a minimal input to generate the answer template  XML query answers have infinite number of shapes Find a finite set of answer templates – Canonical Answers

Answer Shapes Determined by the Head Tree Q’: for $x in /project return { for $y in /project/member return { where $y=“Alice” return where $y=“Bob” return } Alice Bob Head Tree: group name group name group Alice name group name Bob

group Alice name Bob Head Tree: An Additional Candidate Answer name group name AliceBob group name group Alice name group name Bob

group Alice name Bob Head Tree: Why Consider the Additional Case name group name AliceBob project member AliceBob Q(D): group name group name AliceBob Q’(D): D:

What can Serve as Canonical Answers?  Prefix subtrees of the head tree? – necessary but not sufficient Trees contained in the head tree? – necessary and sufficient – but, too many and too complex 

A Head Tree can Have Many Trees Contained in it group name AliceBobAlice group name Alice Bob AliceBob name group Alice Bob AliceBob group name group Alice name Bob Head Tree:

What can Serve as Canonical Answers?  Prefix subtrees of the head tree? – necessary but not sufficient  Trees contained in the head tree? – necessary and sufficient – but, too many and too complex Solution: consider only minimal trees that are contained in the head tree

Canonical Answer A minimal XML instance: No two sibling subtrees where one is contained in the other Canonical Answer : A minimal XML instance contained in the head tree Every answer A of query Q corresponds to a unique canonical answer CA, s.t. A CA, CA A group name Alice Bob Alice group Alice name Bob  group name AliceBob

Canonical Database Canonical Database: DB CA  The minimal XML instance to generate CA project member project member Alice Bob project group name AliceBob CA: DB: for $x in /project return { for $y in /project/member return { where $y=“Alice” return where $y=“Bob” return }

Canonical Database – Formal Def. Canonical Database of a cc-XQuery – DB CA. DB CA is an XML instance, s.t. for each node N of CA where N’s generator query block is q n the following holds: Let p 0 /p 1 /…p n be a path expression in q n, where p 0 is an optional node variable from an ancestor query block. For each p i, i [1,n], there is a distinct node, labeled i, that is a child of the node for p i-1. If p 0 is absent, then p 1 is a child of DB CA ’s root.

Sound and Complete Conditions for Nested Query Containment Let Q and Q’ be two cc-XQueries. The following three conditions are equivalent: 1. Q Q’ 2. For every canonical database DB of Q, Q(DB) Q’(DB) 3. For every canonical answer CA of Q, a) CA is a canonical answer of Q’ b) DB’ CA DB CA

Properties of Canonical Answers and Databases. Lemma 1: Let Q be a cc-XQuery and D be an XML instance. There exist a unique canonical answer CA of Q, s.t. Q(D) CA and CA Q(D). Lemma 2: Let Q be a cc-XQuery, CA be a canonical answer of Q, DB CA be the canonical database for CA of Q, and D be an XML instance. CA Q(D) if only if DB CA D.

Containment of cc-XQueries – Proof (1) 1) => 2) Follows from definition. 2) => 3) CA Q(DB CA ) Q(DB CA ) Q’(DB CA ) CA Q’(DB CA ) a) holds. CA is a canonical answer of Q’ (a), CA Q’(DB CA ), DB’ CA DB CA b) holds. Lemma 22) Containment is transitive Lemma 2

Containment of cc-XQueries – Proof (2) 3) => 2) To show Q Q’, we need to show for every XML instance D, Q(D) Q’(D). There exists a unique CA of Q, s.t. Q(D) CA and CA Q(D) DB CA D. DB’ CA DB CA DB’ CA D. CA Q’(D) Q(D) Q’(D). Lemma 1 Lemma 2 3) b) transitive Lemma 2 transitive

Query Containment Algorithm Algorithm: for every canonical answer CA of Q do 1. check whether CA is a canonical answer of Q’ 2. generate DB CA and DB’ CA 3. check DB’ CA DB CA

Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1?? Arbitrary??

Query Containment Algorithm Algorithm: for every canonical answer CA of Q do 1. check whether CA is a canonical answer of Q’ 2. generate DB CA and DB’ CA 3. check DB’ CA DB CA Polynomial in the size and number of canonical answers  What are the sizes of canonical answers?  What is the number of canonical answers?

Containment of XML Queries with Fanout 1 E.g. d=3 – the depth; m=1 – the maximum fanout Canonical Answers and Complexity  Number: the depth of the query  Size: bounded by the depth of the query  Complexity: O( d·|Q|·|Q’|) Theorem: Testing containment of XML Queries with fanout 1 is in PTIME for $x in /project return {for $y in /project/member return {where $y =“Alice” return } group Alice name group name group Nesting with fanout 1 does not increase complexity

Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary??

Containment of XML Queries with Arbitrary Fanout E.g. d=4 – the depth; m=3 – the maximum fanout Canonical AnswersComplexity  Number:  Size: Theorem: Testing containment of XML Queries with depth 2 and arbitrary fanout is coNP-hard d d-1 d

Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable  NOT TIGHT  Query containment in practice  Conclusions Depth Fanout FixedArbitrary = 1PTIME ArbitrarycoNP hard

Effect of the Depth on Containment of XML Queries Insight: Kernel Canonical Answer  The root node has a single child  In any subtree, a path pattern is repeated no more than cd times. d – query depth c – #(maximum path steps in a query block) The size of kernel canonical answers  Polynomial in the query size (for fixed nesting depth).  Exponential in the query depth (for arbitrary depth). Theorem:  Testing containment of XML queries with fixed depth is coNP-complete  Testing containment of XML queries with arbitrary depth is in coNEXPTIME

Effect of the Depth on Containment of XML Queries – Cont. Lemma 3: Let Q and Q’ be two cc-XQueries. Q Q’ iff for each KCA of Q  1. KCA is a Canonical Answer of Q’.  2. DB’ KCA DB KCA. The size of a KCA is O(bcd) d The number of KCA is O(m (bcd) d )  b = #(query blocks in Q).  m = #(maximum fanout in Q).

Effect of the Depth on Containment of XML Queries – Cont. Lemma 3: Let Q and Q’ be two cc-XQueries. Q Q’ iff for each KCA of Q  1. KCA is a Canonical Answer of Q’.  2. DB’ KCA DB KCA. The size of a KCA is O(bcd) d The number of KCA is O(m (bcd) d )  b = #(query blocks in Q).  m = #(maximum fanout in Q).

Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME

Containment Checking in Practice Analyze element cardinality to reduce the number of canonical answers for containment checking  Given the query structure and the underlying XML database schema, we can infer the cardinality of elements in the query answer. Specifically, CAs are pruned according to the following 3 rules:  1. (=1) The schema implies that the a certain element occurs exactly once under its parent element.  2. (≥1) A schema implies that t will occur at least once under its parent element.  3. (≤1) Schema indicates a certain element occurs at most once under its parent element.

Containment Checking in Practice – Example Q: for $g in /group where $g/gname/text() = “database” return { for $p in $g/person return {$p/text()} {for $q in $g/paper where $q/author/text() = $p/text() return {$q/title/text()} } } Q’: for $g in /group return { for $p in $g/person return {$p/text()} {$g/gname/text()} {for $q in $g/paper where $q/author/text() = $p/text() return {$q/title/text()} } } #canonical answers – originally : 71  after analysis : 2

Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME

An Example Query that Returns Tag Variables for $x in dbGrp return { for $y in $x/proj return { for $u in $y/member return $u/text() for $v in $y/paper return $v/text() }

Deciding Query Containment Leverage previous results – simulation mapping [Levy and Suciu, PODS’97] Check query simulation mapping for every canonical answer Complexity  Simulation mapping can be checked in polynomial time in terms of query size  Complexity of checking containment does not arise

Roadmap  Introduction and problem definition  Containment of a subset of XML queries  Query containment is decidable   Query containment in practice  Relaxing the assumptions  Conclusions Depth Fanout FixedArbitrary = 1PTIME Arbitrary coNP complete In coNEXPTIME

Other Extensions Query Type No tag variables With tag variables With unions With neg With // With euiq- join on tags With arith comp Un- nested PTIME coNP complete NP complete  2 P complete Fan- out=1 PTIME coNP complete NP complete  2 P complete Fixed- depth coNP complete  2 P complete Generalin coNEXPTIME

Conclusions Contributions  A sound and complete condition for containment of nested XML queries  Detailed complexity analysis Future work  Evaluate and optimize the containment algorithm with element cardinality analysis  Answering nested XML queries using views