1 Decidable Containment of Recursive Queries Diego Calvanese, Giuseppe De Giacomo, Moshe Y. Vardi presented by Axel Polleres

Slides:



Advertisements
Similar presentations
Chapter 11 Trees Graphs III (Trees, MSTs) Reading: Epp Chp 11.5, 11.6.
Advertisements

Mathematical Preliminaries
Applications Computational LogicLecture 11 Michael Genesereth Spring 2004.
Model Checking Lecture 2. Three important decisions when choosing system properties: 1automata vs. logic 2branching vs. linear time 3safety vs. liveness.
Analysis of Algorithms
Fundamental Relationship between Node Density and Delay in Wireless Ad Hoc Networks with Unreliable Links Shizhen Zhao, Luoyi Fu, Xinbing Wang Department.
Combining Like Terms. Only combine terms that are exactly the same!! Whats the same mean? –If numbers have a variable, then you can combine only ones.
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Automata Theory Part 1: Introduction & NFA November 2002.
1 LP, extended maxflow, TRW OR: How to understand Vladimirs most recent work Ramin Zabih Cornell University.
Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh.
Reductions Complexity ©D.Moshkovitz.
R O O T S Field-Sensitive Points-to-Analysis Eda GÜNGÖR
P, NP, NP-Complete Problems
1 Decision Procedures An algorithmic point of view Equality Logic and Uninterpreted Functions.
110/6/2014CSE Suprakash Datta datta[at]cse.yorku.ca CSE 3101: Introduction to the Design and Analysis of Algorithms.
Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.
Complexity Classes: P and NP
Query Answering for OWL-DL with Rules Boris Motik Ulrike Sattler Rudi Studer.
11 = This is the fact family. You say: 8+3=11 and 3+8=11
Finite-state Recognizers
18-Dec-14 Pruning. 2 Exponential growth How many leaves are there in a complete binary tree of depth N? This is easy to demonstrate: Count “going left”
Rizwan Rehman Centre for Computer Studies Dibrugarh University
Week 10 Generalised functions, or distributions
Tree Clustering for Constraint Networks 1 Chris Reeson Advanced Constraint Processing Fall 2009 By Rina Dechter & Judea Pearl Artificial Intelligence,
Trees Chapter 11.
Bart Jansen 1.  Problem definition  Instance: Connected graph G, positive integer k  Question: Is there a spanning tree for G with at least k leaves?
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Epp, section 10.? CS 202 Aaron Bloomfield
The Pumping Lemma for CFL’s
1 Decomposing Hypergraphs with Hypertrees Raphael Yuster University of Haifa - Oranim.
Parallel algorithms for expression evaluation Part1. Simultaneous substitution method (SimSub) Part2. A parallel pebble game.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
1 Conditional XPath, the first order complete XPath dialect Maarten Marx Presented by: Einav Bar-Ner.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Introduction to Computability Theory
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Chapter 11: Limitations of Algorithmic Power
Regular Model Checking Ahmed Bouajjani,Benget Jonsson, Marcus Nillson and Tayssir Touili Moran Ben Tulila
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
February 18, 2015CS21 Lecture 181 CS21 Decidability and Tractability Lecture 18 February 18, 2015.
Lecture 22 More NPC problems
Theory of Computing Lecture 17 MAS 714 Hartmut Klauck.
XML Data Management 10. Deterministic DTDs and Schemas Werner Nutt.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
Theory of Computing Lecture 21 MAS 714 Hartmut Klauck.
COSC 2007 Data Structures II Chapter 14 Graphs I.
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Extensions of Datalog Wednesday, February 13, 2001.
Linear Bounded Automata LBAs
Logic Based Query Languages
Datalog Inspired by the impedance mismatch in relational databases.
Conjunctive Queries, Views, Datalog Monday, 4/29/2002
Switching Lemmas and Proof Complexity
Presentation transcript:

1 Decidable Containment of Recursive Queries Diego Calvanese, Giuseppe De Giacomo, Moshe Y. Vardi presented by Axel Polleres

2 Query Containment Checking whether one query yields necessarily a subset of the result of another one for every database Important for information integration, query rewriting, verification, information integration, cooperative answering, integrity checking, etc.

3 Conjunctive Queries vs. full Datalog A conjunctive query is a query of the form: ans(X 0 ) :- r 1 (X 1 ), r 1 (X 1 ), …, r n (X n ). where the X i = (x 1i, …, x ni ) range over a set of variables {u 1, …, u k } and the variables in X 0 are called distinguished variables. In SQL often called S(elect)P(roject)J(oin)-Queries Containment of conjunctive queries is decidable! In fact, NP-complete: [14] Proof Sketch (membership in NP): A conj. Query Q1 is contained in Q2 iff there is a containment mapping from (the variables in) Q2 to (the variables in) Q1. Guessing and checking that homomorphism is clearly in NP. Also completeness can be shown (e.g. by reduction of “exact cover”, cf. [])

4 Full Datalog vs CQ: Full Datalog add Union and Recursion to CQ Containment is undecidable  Undecidability can be shown by reduction from containment for context free grammars [22] So, CQ and Full Datalog span two extremes But …not all is lost! There are interesting classes in between!

5 Decidable containment Problems: Containment Monadic Datalog (all rule heads use a single variable) is decidable Checking containment of full Datalog in non- recursive Datalog is decidable in exponential time Checking containment of non-recursive Datalog in full Datalog is decidable in triple exponential time, i.e. O( 2 ) – When the non-recursive query is unfolded then “only” double exponential. 2 2 n

6 In this paper: Regular Path Queries: Query containment in the context of conceptual graphs (e.g. RDF-graphs), namely for Regular Path Queries, i.e.: Asking for all pairs of objects in a graph that are connected by a path conforming to a regular expression: i.,e.: E(x,y) … where E is a regular expression over graph edges Refinement: - 2RPQs: “inverse” is allowed in traversal of

7 UC2RPQs: A conjunctive 2-way regular path query (C2RPQ) of arity n is a query of the form: where are 2RPQs. UC2RQPs are then unions of conjunctive 2-way regular path queries (C2RPQs) with the same arity. Here, the answer set to Note that CQs (with only binary body predicates) are just a special case of 2RPQs!

8 Containment of Datalog in a UC2RPQ: We define for a datalog program Π, an IDB predicate Q and a database (EDB predicates) G: i.e. the set of facts Q (fixpoint) which can be obtained by applications of rules in Π, then:

9 Idea: Reuse of variables is allowed, as long as the variables are not “connected” in the tree. So, we can build proof trees with a bound number of variables by twice the number of the maximum of variables occurring in IDB atoms num_var(r) in rules r of Π = num_var(Π). A proof tree is then simply an expansion tree only using variables from {x 1,…,x num_var(Π) } Containment of Datalog in a Unions of Conjunctive queries:

10 Containment of Datalog in a Unions of Conjunctive queries: Approach: the notion of a containment mapping is generalized to Datalog and to UC2RPQs by expansions of Datalog programs: can be defined via an infinite sequence of conjunctive queries: Let trees(Q, Π) be the set of trees for predicate Q labeled with a Rule at each node, such that the children of a node N always are labeled with rules having as head atoms corresponding to the IDB atoms of the rule of N and leaves are rules labeled with rules having EDB predicates only in their bodies. Note that trees(Q, Π) can be infinite. Intuition: Π is contained in a union of conjunctive queries if there is a containment mapping from some to each expansion tree in trees(Q, Π). … not yet, since the number of variables and hence the number of node labels is unbounded.

11 Connected variables in proof trees: To reconstruct an expansion tree for a gicen proof tree, we need to distinguish among occurrences of variables: Let g 1, g 2 be nodes in a proof tree, then we call occurrences x 1, x 2 of variable x in the rules labeling g 1, and g 2, respectively connected if every rule on the path from g 1 to g 2 (except maybe the lowest common ancestor g 0 ) has an occurrence of x in the head. We say that an occurrence x of a variable x in τ is a distinguished occurrence if it is connected to an occurrence of x in the head of the root of τ.

12 Containment of Datalog in a Unions of Conjunctive queries: A strong containment mapping from a conjunctive query ϕ to a proof tree τ is a containment mapping h from ϕ to τ with: – h maps distinguished occurrences in ϕ to distinguished occurrences in τ, and – if x1 and x2 are two occurrences of a variable x in ϕ, then the occurrences h(x1) and h(x2) in τ are connected. Then:

13 This can be similarly exploited for C2RPQS An expansion of a C2RPQ is a CQ of the form:

14 In the rest of the paper… The authors show how to check this condition using tree-automata: Idea: The set of proof trees for a Datalog program Π with a goal predicate Q can be described by a nondeterministic tree automaton (doubly exponential in the size of Π), accepting exactly the proof trees. … concluding:

15 Conclusions Adding transitive to CQ closure does not increase upper-bound-results for containment of Datalog (2EXP matches the upper bound for containment in unions of conjunctive queries) [25] However whether this upper-bound is tight is not clear, but conjectured by the authors (lower bound EXPSPACE follows from containment of UC2RPQs in UC2RPQs [34]) Observe: Containment in the other direction already undecidable for RPQs [22]

16 Questions/Interesting for WSMO/L How do te proof obligations we need relate to RPQs/2RPQs/UC2RPQs How do RPQs/2RPQs/UC2RPQs relate to OWL DL/Light/Flight and rule extensions thereof? Decidable yes, but (hardly) scalable, or no? Not necessarily if queries/programs are of moderate size. We need more use cases to show what kinds of containment we need!

17 Important references from the paper