Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

1 Decidable Containment of Recursive Queries Diego Calvanese, Giuseppe De Giacomo, Moshe Y. Vardi presented by Axel Polleres
Relational Calculus and Datalog
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
1 541: Relational Calculus. 2 Relational Calculus  Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).  Calculus.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Deductive Databases Chapter 25.
Efficient Query Evaluation on Probabilistic Databases
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
2005conjunctive1 Query languages, equivalence & containment  conjunctive queries – CQ’s  More expressive languages.
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
Embedded SQL Direct SQL is rarely used: usually, SQL is embedded in some application code. We need some method to reference SQL statements. But: there.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
Deductive Databases Chapter 25
Rutgers University Relational Calculus 198:541 Rutgers University.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
Databases 1 8th lecture. Topics of the lecture Multivalued Dependencies Fourth Normal Form Datalog 2.
The Relational Model: Relational Calculus
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
1 Programming Languages and Paradigms Functional Programming.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4.
CSE 544 Relational Calculus Lecture #2 January 11 th, Dan Suciu , Winter 2011.
Computing & Information Sciences Kansas State University Thursday, 08 Feb 2007CIS 560: Database System Concepts Lecture 11 of 42 Thursday, 08 February.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
1 Relational Algebra and Calculas Chapter 4, Part A.
Datalog –Another query language –cleaner – closer to a “logic” notation, prolog – more convenient for analysis – can express queries that are not expressible.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
Outline Logistics (Project) & Review First Order Predicate Calculus Relational Algebra Datalog Information Integration Softbots Query Containment Rewriting.
1 Georgia Tech, IIC, GVU, 2006 MAGIC Lab Rossignac Lecture 02: QUANTIFIERS Sections 1.3 and 1.4 Jarek Rossignac CS1050:
Computing & Information Sciences Kansas State University Wednesday, 17 Sep 2008CIS 560: Database System Concepts Lecture 9 of 42 Wednesday, 18 September.
Row Types in SQL-3 Row types define types for tuples, and they can be nested. CREATE ROW TYPE AddressType{ street CHAR(50), city CHAR(25), zipcode CHAR(10)
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
Database Management Systems Course Faculty of Computer Science Technion – Israel Institute of Technology Lecture 5: Queries in Logic.
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L5_Relational Calculus 1 Relational Calculus Chapter 4 – Part B.
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
Datalog Another formalism for expressing queries: - cleaner - closer to a “logic” notation - more convenient for analysis - equivalent in power to relational.
ICS 321 Fall 2011 Algebraic and Logical Query Languages (ii) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at.
Security and User Authorization in SQL. Lu Chaojun, SJTU 2 Security Two aspects: –Users only see the data they’re supposed to; –Guard against malicious.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
Answering Queries Using Views Presented by: Mahmoud ELIAS.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Extensions of Datalog Wednesday, February 13, 2001.
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
Relational Calculus Chapter 4, Section 4.3.
Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman
Relational Calculus Chapter 4, Part B
Modifying the Database
Containment Mappings Canonical Databases Sariaya’s Algorithm
Semantics of Datalog With Negation
Cse 344 January 29th – Datalog.
Motivation for Datalog
Lecture 9: Relational Algebra (continued), Datalog
Local-as-View Mediators
Logic Based Query Languages
Datalog Inspired by the impedance mismatch in relational databases.
Relational Calculus Chapter 4, Part B 7/1/2019.
Rules Programs Negation
Relational Calculus Chapter 4, Part B
Presentation transcript:

Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti

Introduction Information integration systems –Interaction with a uniform interface. –A set of virtual relation names to formulate queries. –The actual data in external sources. –A mapping between the virtual and the source relations. e.g.) db 1 (P,A) :- paper(P), author(P,A), ai(A) –paper, author and ai are virtual relations. –db 1 is a source relation. –Query rewriting Translating the virtual relations to a query that mentions only the source relations.

Query Rewriting Problem An equivalent rewriting of the query –Available sources may not contain all the information needed to answer a query. –Maximally-contained rewritings e.g.) Ask for all papers by Computer Science researcher. q(P,Y,A) :- db(P,Y,A) Limitations on the binding patterns –A name server of an institution will provide the address for a given name. Functional dependencies –The year of a conference functionally determines its location.

Datalog Main expressive advantage: recursive queries. More convenient for analysis: papers look better. Without recursion but with negation it is equivalent in power to relational algebra Has affected real practice: (e.g., recursion in SQL3, magic sets transformations).

Datalog Concepts Atoms Datalog rules, datalog programs EDB predicates, IDB predicates Conjunctive queries Recursion Built-in predicates Negated atoms, stratified programs. Semantics: least fixpoint.

Predicates and Atoms - Relations are represented by predicates - Tuples are represented by atoms. Purchase( “joe”, “bob”, “Nike Town”, “Nike Air”, 2/2/98) - arithmetic, built-in, atoms: X Z/2 - negated atoms: NOT Product(“Brooklyn Bridge”, $100, “Microsoft”)

Datalog Rules and Queries A pure datalog rule has the following form: head :- atom1, atom2, …., atom,… where all the atoms are non-negated and relational. BritishProduct(X) :- Product(X,Y,P) & Company(P, “UK”, SP) A datalog program is a set of datalog rules. A program with a single rule is a conjunctive query. We distinguish EDB predicates and IDB predicates EDB’s are stored in the database, appear only in the bodies IDB’s are intensionally defined, appear in both bodies and heads.

The Meaning of Datalog Rules Repeat the following until you cannot derive any new facts: Consider every assignment from the variables in the body to the constants in the database. If each of the atoms in the body is made true by the assignment, then add the tuple for the head into the relation of the head. Start with the facts in the EDB and iteratively derive facts for IDBs.

Transitive Closure Suppose we are representing a graph by a relation Edge(X,Y): Edge(a,b), Edge (a,c), Edge(b,d), Edge(c,d), Edge(d,e) a b c d e I want to express the query: Find all nodes reachable from a.

Recursion in Datalog Path( X, Y ) :- Edge( X, Y ) Path( X, Y ) :- Path( X, Z ), Path( Z, Y ). Semantics: evaluate the rules until a fixedpoint: Iteration #0: Edge: {(a,b), (a,c), (b,d), (c,d), (d,e)} Path: {} Iteration #1: Path: {(a,b), (a,c), (b,d), (c,d), (d,e)} Iteration #2: Path gets the new tuples: (a,d), (b,e), (c,e) Iteration #3: Path gets the new tuple: (a,e) Iteration #4: Nothing changes -> We stop. Note: number of iterations depends on the data. Cannot be anticipated by only looking at the query!

Built in Predicates Rules may include atoms with built-in predicates: ExpensiveProduct(X) :- Product(X,Y,P) & P > $100 But: we need to restrict the use of built-in atoms in rules. P(X) :- R(X) & X<Y What does this mean? We could use active domain semantics, but that’s problematic. Hence, we require that every variable that appears in a built-in atom also appears in a relational atom.

Negated Subgoals Rules may include negated subgoals, but in restricted forms: P(X,Y) :- Between(X,Y,Z) & NOT Direct(X,Z) Bad: P(X, Y) :- R(X) & NOT S(Y) Bad but ok: P(X) :- R(X) & NOT S(X,Y) We’ll rewrite as: S’(X) :- S(X,Y) P(X) :- R(X) & NOT S’(X)

Relations and Queries A function-free Horn rule p(X) :- p 1 (X 1 ) & p 2 (X 2 ) &.. & p n (X n ), –p and p 1, … p n are relation names, and X, X 1,…,X n are tuples of variables and constants. –Any variables appearing in X appears also in X 1,..., X n. –p(X): the head of the rule, p 1 (X 1 ),…,p n (X n ): the body of the rule. –The base relations only in the bodies, not in the heads of the rules. –A dependency graph Nodes are the relations appearing in the rules. An arc is from the node of relation p i to the node of predicate p if p i appears in the body of a rule p. – Recursive rule: a cycle in the dependency graph –A query is a set of function-free Horn rules. –A conjunctive query is a single non-recursive Horn rule.

Query Containment Given two queries q 1 and q 2, q 1 is contained in q 2 if for every database D, q 1 (D) q 2 (D), where q(D) is the result of evaluating query q on D.

Functional Dependencies A relation p satisfies the functional dependency A 1, …, A n -> B if for every two tuples t and u in p with t.A i = u.A i for i= 1, …, n, also t.B= u.B. Relative containment –Query q 1 is contained in query q 2 relative to, denoted q 1 q 2, if for each database D satisfying the functional dependencies in, q 1 (D) q 2 (D). – : a set of functional dependencies

Modeling Information Sources and Query Plans Domain model: a set of virtual relations Source relations – The contents of the external information sources Source descriptions –A set of conjunctive queries –Bodies contain only virtual relations and their heads are source relations. Query plan –Given a query q from user, the agent formulates a query plan from the source relations. –A set of Horn rules only including the source relations.

Example Consider a domain model where parent, male and female are virtual relations. v 1 and v 2 are the source relations. v 1 (X,Y) :- parent(X,Y), male(X) v 2 (X,Y) :- parent(X,Y), female(X) Query Plan: all grandparents of ann from the available sources. answer(X) :- parent(X,Z), parent(Z,ann) parent(X,Y) :- v 1 (X,Y) parent(X,Y) :- v 2 (X,Y)

Functional Dependencies Suppose the virtual relations: conference(Paper, Conference), year(Paper, Year), location(Conference, Year, Location) Functional dependencies conference: Paper -> Conference year:Paper -> Year location:Conference, Year -> Location Information sources v 1 (P,C,Y) :- conference(P,C), year(P,Y) v 2 (P,L) :- conference(P,C), year(P,Y), location(C,Y,L) Query: q(L):- location(ijcai, 1991, L) Answer: answer(L) :- v 1 (P, ijcai, 1991), v 2 (P, L)

Definition (inverse rule): Let v be a source description Then for j=1, …, n, is an inverse rule of v. –Modifying to obtain as follows: if X is a constant or is a variable in,then X is unchanged in. Otherwise, X is one of the variables X i appearing in the body of v but not in, and X is replaced by in. –Purpose is to recover tuples of the virtual relations from the source relations. Inverse Rule

Sources relations: v 1 (P,C,Y) :- conference(P,C), year(P,Y) v 2 (P,L) :- conference(P,C), year(P,Y), location(C,Y,L) The inverse rules: Information sources: v 1 (“Fuzzy”, “IJCAI”, 1991), v 2 (“Fuzzy”, “Sydney”) Derived facts: conference (with r 1 ) (r 3 ) year (r 2 ) (r 4 ) location (r 5 ) r 1 : conference(P,C) :- v 1 (P,C,Y) r 2 : year(P,Y) :- v 1 (P,C,Y) r 3 : conference(P, f 1 (P,L)) :- v 2 (P,L) r 4 : year(P, f 2 (P,L)) :- v 2 (P,L) r 5 : location(f 1 (P,L), f 2 (P,L),L) :- v 2 (P,L)

Definition (chase rules): Let be a functional dependency satisfied by a virtual relation p. Let be the attributes of p that are not in. The chase rule corresponding to, denoted chase( ), is the following rule: –Functional dependencies conference: Paper -> Conference year: Paper -> Year location: Conference, Year -> Location –In our example, the chase rules are: e(C,C’) :- conference(P,C), conference(P’,C’),e(P,P’) e(Y,Y’) :- year(P,Y), year(P’,Y’), e(P,P’) e(L,L’) :- location(C,Y,L), location(C’,Y’,L’), e(C,C’), e(Y,Y’) –Derived facts: e

Query Rewriting Define q’ by modifying q iteratively as follows: –If c is a constant in one of the subgoals of q, replace it by a new variable Z, and add the subgoal e(Z,c). –If X is a variable in the head of q, replace X in the body of q by a new variable X’, and add subgoal e(X,X’). –If Y that is not in the head of q appears in two subgoals of q, replace one of its occurrences by Y’, and add the subgoal e(Y,Y’). In our example: q(L):- location(ijcai, 1991, L) q’(L) ;- location(C,Y,L’), e(C,ijcai), e(Y,1991), e(L,L’) Evaluating query q’ on the reconstructed virtual relations and the derived equivalence relation e yields: IJCAI ’91 was help in Sydney.

Limitations on Binding Patterns To model source capabilities, attach to each source relation an adornment. –An adornment of a source relation v is a string of b’s and f’s of length n, where n is the arity of v. –v bf : the first argument is bounded on v. Definition(executable Horn rule) –Let V be a set of relations with binding adornment, and let r be the following Horn rule whose body relations are in V: –The rule r is executable if the following holds for i=1,…,n: let j be a bounded argument position of v i and let a be the the j’th element in. Then, either a is a constant, or a appears in.

Example –Sources: :- ijcaiPapers(X) :- cites(X,Y) :- awardPaper(X) –Query: q(X):- awardPaper(X) –Executable conjunctive query plan: q n (Z n ) :- v 1 (Z 0 ), v 2 (Z 0, Z 1 ), …, v 2 (Z n-1, Z n ), v 3 (Z n ). By allowing recursive plans, produce a maximally-contained plan. papers(X) :- papers(X) :- papers(Y), q(X):- papers(X),

Domain Rules Definition (domain rules) Let be a source relation of arity n. Suppose the adornment v says that the arguments in positions 1,…, l need to be bound, and the arguments l +1, …,n, the following rule is a domain rule: dom(X i ) :- dom(X 1 ), …, dom(X l ), v(X 1, …,X n ). –All domain rules are executable and relation dom has adornment f. –Every query plan P can be transformed to an executable query plan by inserting dom(X) before subgoals g in P with a bounded variable X.

Summary Given a query Construct “inverse rules” Construct “chase rules” Construct “domain rules” Above rules comprise the “query plan”