ICDT'2001, London, UK1 On Answering Queries in the Presence of Limited Access Patterns Chen Li Stanford University joint work with Edward Chang, UC Santa.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh.
Completeness and Expressiveness
Comparative Succinctness of KR Formalisms Paolo Liberatore.
Theory of Computation CS3102 – Spring 2014 A tale of computers, math, problem solving, life, love and tragic death Nathan Brunelle Department of Computer.
ICDT'2001, London, UK1 Minimizing View Sets without Losing Query-Answering Power Chen Li Stanford University joint work with Mayank Bawa and Jeff Ullman.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
1 Extended Conjunctive Queries Unions Arithmetic Negation.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 24 MAS 714 Hartmut Klauck
Query Folding Xiaolei Qian Presented by Ram Kumar Vangala.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
EE 553 Integer Programming
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
CSE115/ENGR160 Discrete Mathematics 02/28/12
Discrete Structure Li Tak Sing( 李德成 ) Lectures
Efficient Query Evaluation on Probabilistic Databases
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman.
CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Beyond selfish routing: Network Formation Games. Network Formation Games NFGs model the various ways in which selfish agents might create/use networks.
CSE115/ENGR160 Discrete Mathematics 03/03/11 Ming-Hsuan Yang UC Merced 1.
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
The Theory of NP-Completeness
1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Near-Optimal Network Design with Selfish Agents By Elliot Anshelevich, Anirban Dasgupta, Eva Tardos, Tom Wexler STOC’03 Presented by Mustafa Suleyman CIFTCI.
CSE 636 Data Integration Answering Queries Using Views Overview.
Chapter 11: Limitations of Algorithmic Power
Normal forms for Context-Free Grammars
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Variable-Length Codes: Huffman Codes
1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.
Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine) Materializing Views With Minimal Size To Answer Queries.
CS5371 Theory of Computation Lecture 12: Computability III (Decidable Languages relating to DFA, NFA, and CFG)
Mining Association Rules of Simple Conjunctive Queries Bart Goethals Wim Le Page Heikki Mannila SIAM /8/261.
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
Chapter 11 Limitations of Algorithm Power. Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples:
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
1 Query Processing in the Presence of Limited Source Capabilities Chen Li Information and Computer Science UC Irvine.
Private Approximation of Search Problems Amos Beimel Paz Carmi Kobbi Nissim Enav Weinreb (Technion)
Answering Queries Using Views LMSS’95 Laks V.S. Lakshmanan Dept. of Comp. Science UBC.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
INFORMATION INTEGRATION Shengyu Li CS-257 ID-211.
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
Beyond selfish routing: Network Games. Network Games NGs model the various ways in which selfish agents strategically interact in using a network They.
Naïve Set Theory. Basic Definitions Naïve set theory is the non-axiomatic treatment of set theory. In the axiomatic treatment, which we will only allude.
CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL.
Beyond selfish routing: Network Games. Network Games NGs model the various ways in which selfish users (i.e., players) strategically interact in using.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
 2005 SDU Lecture13 Reducibility — A methodology for proving un- decidability.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View Basic Concepts and Background.
Answering Queries using Templates with Binding Patterns
Chen Li Information and Computer Science
Materializing Views With Minimal Size To Answer Queries
Presentation transcript:

ICDT'2001, London, UK1 On Answering Queries in the Presence of Limited Access Patterns Chen Li Stanford University joint work with Edward Chang, UC Santa Barbara

2 r(Star, Movie)s(Movie, Award) Harrison FordAir Force One Henry FondaOn Golden Pond Kevin SpaceyAmerican Beauty …… On Golden PondOscar, Best Actor On Golden Pond Oscar, Best Actress American BeautyOscar, Best Picture …… A movie database Q(Award) :- r(henry fonda,Movie), s(Movie,Award)

3 r(Star, Movie) s(Movie, Award) Harrison FordAir Force One Henry FondaOn Golden Pond Kevin SpaceyAmerican Beauty …… On Golden PondOscar, Best Actor On Golden Pond Oscar, Best Actress American BeautyOscar, Best Picture …… Limited access patterns Should provide a star.Should provide a movie.

4 r(Star, Movie) s(Movie, Award) Harrison FordAir Force One Henry FondaOn Golden Pond Kevin SpaceyAmerican Beauty …… On Golden PondOscar, Best Actor On Golden Pond Oscar, Best Actress American BeautyOscar, Best Picture …… Answering Q given the restrictions Q(Award) :- r(henry fonda,Movie), s(Movie,Award)

5 Harrison FordAir Force One Henry FondaOn Golden Pond Kevin SpaceyAmerican Beauty …… On Golden PondOscar, Best Actor On Golden Pond Oscar, Best Actress American BeautyOscar, Best Picture …… The answer is complete Q(Award) :- r(henry fonda,Movie), s(Movie,Award) r(Star, Movie) s(Movie, Award) We did not retrieve all the tuples from the relations. Still we computed all tuples in the answer to the query.

6 Harrison FordAir Force One Henry FondaOn Golden Pond Kevin SpaceyAmerican Beauty …… On Golden PondOscar, Best Actor On Golden Pond Oscar, Best Actress American BeautyOscar, Best Picture …… Change the restriction Q(Award) :- r(henry fonda,Movie), s(Movie,Award) r(Star, Movie) s(Movie, Award) We cannot compute the complete answer to Q. There can always be some tuples that are not retrievable.

7 General questions Given a query on relations with limited access patterns, can we compute its complete answer by accessing the relations with legal patterns? –Stable queries Different classes of queries Another problem studied: testing query containment in the presence of binding patterns.

8 Rest of the talk Binding patterns, query stability Testing stability of queries: –Conjunctive queries –Unions of conjunctive queries –Conjunctive queries with arithmetic comparisons –Datalog queries Dynamic computability of complete answer to conjunctive queries Conclusion and related work

9 (I) Binding patterns Attributes with adornments: –b: bound –f: free Example: r(Star b, Movie f ), s(Movie b, Award f ) A relation can have multiple binding patterns.

10 Reasons of the restrictions: –Web search forms –Legacy databases –Security concerns Observations: If a relation does not have an “all-free” binding pattern, then after certain queries are sent to this relation, there can always be some tuples that have not been retrieved.

11 Query stability A query Q on relations with binding patterns is stable if for any database, we can compute Q’s complete answer by accessing the relations with legal patterns. The complete answer is the computable answer if we could retrieve all the tuples from the relations. Use partial tuples to derive the complete answer: we need reasoning.

12 Assumptions about bindings Use values from Q and results from the relations as bindings: –The definition says “for any database” –Relations not in the query can be assumed to be empty Not allowed: try arbitrary strings as bindings to access the relations –Does not terminate –Impractical

13 (II) Testing stability of queries Conjunctive query: q(X) :- g 1 (X 1 ),…,g n (X n ) Feasible order of some subgoals of a CQ Q. –Each subgoal in the order is executable –That is, we have enough bound variables to satisfy one binding pattern of the relation Example: Q(Award) :- r(henry fonda,Movie), s(Movie,Award)

14 Feasible CQs A CQ is feasible if it has a feasible order of all its subgoals. Lemma: A feasible CQ is stable. Testing feasibility of a CQ –A greedy algorithm: Inflationary

15 What if Q is not feasible? Q’(Award) :- r(henry fonda,Movie), s(Movie,Award),r(Star,Movie) Not feasible: variable Star cannot be bound Equivalent to the old query: Q(Award) :- r(henry fonda,Movie), s(Movie,Award) The new query Q’ is stable!

16 Testing stability of a CQ Theorem: A CQ Q is stable iff its minimal equivalent Q m is feasible. Minimal equivalent query Q m Q m is unique

17 Main idea of the proof Construct two databases of the relations They have the same observable tuples, but yield different answers to the query Thus, we cannot tell whether the computed answer is complete or not Same observable tuples Database D1 Database D2 Different answers to Q

18 Two algorithms for CQs Algorithm CQStable –Minimize Q, get its minimal equivalent Q m –Test feasibility of Q m by calling Inflationary Algorithm CQStable* –Compute all executable subgoals of Q –If all subgoals become executable, then Q is stable –Otherwise, test equivalence between Q and the new query with the executable subgoals CQStable* is more efficient than CQStable Testing stability of a CQ is NP-complete.

19 Other classes of queries Unions of CQs: two algorithms CQs with arithmetic comparisons: –An algorithm for the testing stability Datalog queries: –Undecidable –Give a sufficient condition for stability of Datalog

20 (III) Dynamic computability of complete answer to CQs For a nonstable CQ Q, for certain database, Q’s complete answer might be computed.

21 An example Q1: ans(B) :- r(a,B,C),s(C,D) Not stable For the following database, we can still compute Q1’s complete answer: {b1,b2}. d1 d2 … r(A b, B f, C f ) ab1 …… c1 ab2c2 ab2c3 … d1 … c1 d2c2 … s(C f, D b )p(D f )

22 Change the head argument Q2: ans(D) :- r(a,B,C),s(C,D) Still not stable For the database, we cannot compute Q2’s complete answer. d1 d2 … r(A b, B f, C f ) ab1 …… c1 ab2c2 ab2c3 … d1 … c1 d2c2 … s(C f, D b )p(D f )

23 Difference between Q1 and Q2 b f f f b Q1: ans(B) :- r(a,B,C),s(C,D) Q2: ans(D) :- r(a,B,C),s(C,D) Q1’s head argument B is bound by the executable subgoal r(a,B,C). Q2’s head argument D is not bound by the executable subgoal r(a,B,C).

24 Generalization q(X) :- g 1 (X 1 ), …, g k (X k ), g k+1 (X k+1 ), …, g n (X n ) Executable subgoals: E = g 1 (X 1 ),…, g k (X k ) If all arguments in X are bound in E: –we might compute its complete answer. –The computability is database dependent. If some arguments in X are not bound in E: –we can never compute its complete answer. –Unless the relation after the subgoals in E is empty.

25 A decision tree It guides the planning process of computing the complete answer to a query. Two approaches while traversing the tree: –optimistic –pessimistic

26 Conclusion Stability of queries with binding patterns Various classes of queries: –CQs (two algorithms) –Unions of CQs (two algorithms) –CQs with arithmetic comparisons (one algorithm) –Datalog (undecidable) Dynamic computability of a CQ’s complete answer Another contribution: decidability result of testing relative query containment with binding restrictions

27 Related work Answering queries using views with binding patterns [RSU95] Query optimization [YLUGM99,FLMS99] Computing maximal answer to queries [DL97,LC00] Our work considers whether the complete answer to a query is computable.