Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti.

Similar presentations


Presentation on theme: "Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti."— Presentation transcript:

1 Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti

2 Introduction Information integration systems –Interaction with a uniform interface. –A set of virtual relation names to formulate queries. –The actual data in external sources. –A mapping between the virtual and the source relations. e.g.) db 1 (P,A) :- paper(P), author(P,A), ai(A) –paper, author and ai are virtual relations. –db 1 is a source relation. –Query rewriting Translating the virtual relations to a query that mentions only the source relations.

3 Query Rewriting Problem An equivalent rewriting of the query –Available sources may not contain all the information needed to answer a query. –Maximally-contained rewritings e.g.) Ask for all papers by Computer Science researcher. q(P,Y,A) :- db(P,Y,A) Limitations on the binding patterns –A name server of an institution will provide the address for a given name. Functional dependencies –The year of a conference functionally determines its location.

4 Datalog Main expressive advantage: recursive queries. More convenient for analysis: papers look better. Without recursion but with negation it is equivalent in power to relational algebra Has affected real practice: (e.g., recursion in SQL3, magic sets transformations).

5 Datalog Concepts Atoms Datalog rules, datalog programs EDB predicates, IDB predicates Conjunctive queries Recursion Built-in predicates Negated atoms, stratified programs. Semantics: least fixpoint.

6 Predicates and Atoms - Relations are represented by predicates - Tuples are represented by atoms. Purchase( “joe”, “bob”, “Nike Town”, “Nike Air”, 2/2/98) - arithmetic, built-in, atoms: X Z/2 - negated atoms: NOT Product(“Brooklyn Bridge”, $100, “Microsoft”)

7 Datalog Rules and Queries A pure datalog rule has the following form: head :- atom1, atom2, …., atom,… where all the atoms are non-negated and relational. BritishProduct(X) :- Product(X,Y,P) & Company(P, “UK”, SP) A datalog program is a set of datalog rules. A program with a single rule is a conjunctive query. We distinguish EDB predicates and IDB predicates EDB’s are stored in the database, appear only in the bodies IDB’s are intensionally defined, appear in both bodies and heads.

8 The Meaning of Datalog Rules Repeat the following until you cannot derive any new facts: Consider every assignment from the variables in the body to the constants in the database. If each of the atoms in the body is made true by the assignment, then add the tuple for the head into the relation of the head. Start with the facts in the EDB and iteratively derive facts for IDBs.

9 Transitive Closure Suppose we are representing a graph by a relation Edge(X,Y): Edge(a,b), Edge (a,c), Edge(b,d), Edge(c,d), Edge(d,e) a b c d e I want to express the query: Find all nodes reachable from a.

10 Recursion in Datalog Path( X, Y ) :- Edge( X, Y ) Path( X, Y ) :- Path( X, Z ), Path( Z, Y ). Semantics: evaluate the rules until a fixedpoint: Iteration #0: Edge: {(a,b), (a,c), (b,d), (c,d), (d,e)} Path: {} Iteration #1: Path: {(a,b), (a,c), (b,d), (c,d), (d,e)} Iteration #2: Path gets the new tuples: (a,d), (b,e), (c,e) Iteration #3: Path gets the new tuple: (a,e) Iteration #4: Nothing changes -> We stop. Note: number of iterations depends on the data. Cannot be anticipated by only looking at the query!

11 Built in Predicates Rules may include atoms with built-in predicates: ExpensiveProduct(X) :- Product(X,Y,P) & P > $100 But: we need to restrict the use of built-in atoms in rules. P(X) :- R(X) & X<Y What does this mean? We could use active domain semantics, but that’s problematic. Hence, we require that every variable that appears in a built-in atom also appears in a relational atom.

12 Negated Subgoals Rules may include negated subgoals, but in restricted forms: P(X,Y) :- Between(X,Y,Z) & NOT Direct(X,Z) Bad: P(X, Y) :- R(X) & NOT S(Y) Bad but ok: P(X) :- R(X) & NOT S(X,Y) We’ll rewrite as: S’(X) :- S(X,Y) P(X) :- R(X) & NOT S’(X)

13 Relations and Queries A function-free Horn rule p(X) :- p 1 (X 1 ) & p 2 (X 2 ) &.. & p n (X n ), –p and p 1, … p n are relation names, and X, X 1,…,X n are tuples of variables and constants. –Any variables appearing in X appears also in X 1,..., X n. –p(X): the head of the rule, p 1 (X 1 ),…,p n (X n ): the body of the rule. –The base relations only in the bodies, not in the heads of the rules. –A dependency graph Nodes are the relations appearing in the rules. An arc is from the node of relation p i to the node of predicate p if p i appears in the body of a rule p. – Recursive rule: a cycle in the dependency graph –A query is a set of function-free Horn rules. –A conjunctive query is a single non-recursive Horn rule.

14 Query Containment Given two queries q 1 and q 2, q 1 is contained in q 2 if for every database D, q 1 (D) q 2 (D), where q(D) is the result of evaluating query q on D.

15 Functional Dependencies A relation p satisfies the functional dependency A 1, …, A n -> B if for every two tuples t and u in p with t.A i = u.A i for i= 1, …, n, also t.B= u.B. Relative containment –Query q 1 is contained in query q 2 relative to, denoted q 1 q 2, if for each database D satisfying the functional dependencies in, q 1 (D) q 2 (D). – : a set of functional dependencies

16 Modeling Information Sources and Query Plans Domain model: a set of virtual relations Source relations – The contents of the external information sources Source descriptions –A set of conjunctive queries –Bodies contain only virtual relations and their heads are source relations. Query plan –Given a query q from user, the agent formulates a query plan from the source relations. –A set of Horn rules only including the source relations.

17 Example Consider a domain model where parent, male and female are virtual relations. v 1 and v 2 are the source relations. v 1 (X,Y) :- parent(X,Y), male(X) v 2 (X,Y) :- parent(X,Y), female(X) Query Plan: all grandparents of ann from the available sources. answer(X) :- parent(X,Z), parent(Z,ann) parent(X,Y) :- v 1 (X,Y) parent(X,Y) :- v 2 (X,Y)

18 Functional Dependencies Suppose the virtual relations: conference(Paper, Conference), year(Paper, Year), location(Conference, Year, Location) Functional dependencies conference: Paper -> Conference year:Paper -> Year location:Conference, Year -> Location Information sources v 1 (P,C,Y) :- conference(P,C), year(P,Y) v 2 (P,L) :- conference(P,C), year(P,Y), location(C,Y,L) Query: q(L):- location(ijcai, 1991, L) Answer: answer(L) :- v 1 (P, ijcai, 1991), v 2 (P, L)

19 Definition (inverse rule): Let v be a source description Then for j=1, …, n, is an inverse rule of v. –Modifying to obtain as follows: if X is a constant or is a variable in,then X is unchanged in. Otherwise, X is one of the variables X i appearing in the body of v but not in, and X is replaced by in. –Purpose is to recover tuples of the virtual relations from the source relations. Inverse Rule

20 Sources relations: v 1 (P,C,Y) :- conference(P,C), year(P,Y) v 2 (P,L) :- conference(P,C), year(P,Y), location(C,Y,L) The inverse rules: Information sources: v 1 (“Fuzzy”, “IJCAI”, 1991), v 2 (“Fuzzy”, “Sydney”) Derived facts: conference (with r 1 ) (r 3 ) year (r 2 ) (r 4 ) location (r 5 ) r 1 : conference(P,C) :- v 1 (P,C,Y) r 2 : year(P,Y) :- v 1 (P,C,Y) r 3 : conference(P, f 1 (P,L)) :- v 2 (P,L) r 4 : year(P, f 2 (P,L)) :- v 2 (P,L) r 5 : location(f 1 (P,L), f 2 (P,L),L) :- v 2 (P,L)

21 Definition (chase rules): Let be a functional dependency satisfied by a virtual relation p. Let be the attributes of p that are not in. The chase rule corresponding to, denoted chase( ), is the following rule: –Functional dependencies conference: Paper -> Conference year: Paper -> Year location: Conference, Year -> Location –In our example, the chase rules are: e(C,C’) :- conference(P,C), conference(P’,C’),e(P,P’) e(Y,Y’) :- year(P,Y), year(P’,Y’), e(P,P’) e(L,L’) :- location(C,Y,L), location(C’,Y’,L’), e(C,C’), e(Y,Y’) –Derived facts: e

22 Query Rewriting Define q’ by modifying q iteratively as follows: –If c is a constant in one of the subgoals of q, replace it by a new variable Z, and add the subgoal e(Z,c). –If X is a variable in the head of q, replace X in the body of q by a new variable X’, and add subgoal e(X,X’). –If Y that is not in the head of q appears in two subgoals of q, replace one of its occurrences by Y’, and add the subgoal e(Y,Y’). In our example: q(L):- location(ijcai, 1991, L) q’(L) ;- location(C,Y,L’), e(C,ijcai), e(Y,1991), e(L,L’) Evaluating query q’ on the reconstructed virtual relations and the derived equivalence relation e yields: IJCAI ’91 was help in Sydney.

23 Limitations on Binding Patterns To model source capabilities, attach to each source relation an adornment. –An adornment of a source relation v is a string of b’s and f’s of length n, where n is the arity of v. –v bf : the first argument is bounded on v. Definition(executable Horn rule) –Let V be a set of relations with binding adornment, and let r be the following Horn rule whose body relations are in V: –The rule r is executable if the following holds for i=1,…,n: let j be a bounded argument position of v i and let a be the the j’th element in. Then, either a is a constant, or a appears in.

24 Example –Sources: :- ijcaiPapers(X) :- cites(X,Y) :- awardPaper(X) –Query: q(X):- awardPaper(X) –Executable conjunctive query plan: q n (Z n ) :- v 1 (Z 0 ), v 2 (Z 0, Z 1 ), …, v 2 (Z n-1, Z n ), v 3 (Z n ). By allowing recursive plans, produce a maximally-contained plan. papers(X) :- papers(X) :- papers(Y), q(X):- papers(X),

25 Domain Rules Definition (domain rules) Let be a source relation of arity n. Suppose the adornment v says that the arguments in positions 1,…, l need to be bound, and the arguments l +1, …,n, the following rule is a domain rule: dom(X i ) :- dom(X 1 ), …, dom(X l ), v(X 1, …,X n ). –All domain rules are executable and relation dom has adornment f. –Every query plan P can be transformed to an executable query plan by inserting dom(X) before subgoals g in P with a bounded variable X.

26 Summary Given a query Construct “inverse rules” Construct “chase rules” Construct “domain rules” Above rules comprise the “query plan”


Download ppt "Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti."

Similar presentations


Ads by Google