Download presentation
Presentation is loading. Please wait.
1
1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara
2
2 Heterogeneous information sources on the WWW Information-integration systems Limited query capabilities –Music stores: amazon.com, cdnow.com. –Must specify a value of Artist or Title. –The sources do not answer queries such as “Give me all your information about CDs.” Motivation
3
3 Sources View SchemasMust Bind 1 v1(Song, CD)Song 2 v2(CD, Artist, Price)CD 3 v3(CD, Artist, Price)Artist Query: “Find the prices of CDs containing a song titled Friends.” Example v1( Friends, CD) v2(CD, Artist, Price) v1( Friends, CD) v3(CD, Artist, Price)
4
4 Source tuples v 1 (Song, CD) v 2 (CD, Artist, Price) v 3 (CD, Artist, Price) Not all the tuples could be retrieved from the sources due to the restrictions.
5
5 Traditional approach: consider each join at a time. v1 v2: {$15} v1 v3: empty, no binding for Artist. v 1 (Song, CD) v 2 (CD, Artist, Price) v 3 (CD, Artist, Price)
6
6 Our approach: retrieve as many tuples as possible. X X X X X X This approach could save the user $15 - $10 = $5! v 1 (Song, CD) v 2 (CD, Artist, Price) v 3 (CD, Artist, Price) v1 v2: {$15} v1 v3: {$10}
7
7 Access views not in a join to retrieve bindings; Recursive process; Some tuples in the answer cannot be retrieved. X X X X X X v 1 (Song, CD) v 2 (CD, Artist, Price) v 3 (CD, Artist, Price) Observations
8
8 How to compute the maximal answer? When should we access sources not in a query? What sources should be accessed? Questions
9
9 Source views A set of source views V with binding patterns: – b: a value must be specified for the attribute – f: free Each view schema uses a set of global attributes CDArtistPrice Song b f v 1 (Song, CD) b f f v 2 (CD, Artist, Price) f b f v 3 (CD, Artist, Price) Hypergraph representation:
10
10 A query Q includes: –Input attributes: I; –Output attributes: O. Queries Input attribute: {Song} Output attribute: {Price} CDArtistPrice Song v 1 (Song, CD) v 2 (CD, Artist, Price) v 3 (CD, Artist, Price)
11
11 Connection: a set of views that connect I and O in Q. Meaning: natural join of the views. Universal-relation-like assumptions, but connections can be generated in various ways. Connections T 1 ={v 1,v 2 }, T 2 ={v 1,v 3 } CDArtistPrice Song v 1 (Song, CD) v 2 (CD, Artist, Price) v 3 (CD, Artist, Price)
12
12 Question 1: Computing the maximal answer Translate a query and source views into a Datalog program. Borrowed the idea from Duschka and Levy [IJCAI-97]. –We eliminate useless source accesses. Why Datalog programs? Recursion.
13
13 Constructing program (Q,V) Connection rules: ans(P) :- V 1 (s 1, C) & V 2 (C, A, P) ans(P) :- V 1 (s 1, C) & V 3 (C, A, P) Fact rule: song(s 1 ) :- } v 1 (Song, CD) -rule : V 1 (S, C) :- song(S) & v 1 (S,C) Domain rule: cd(C) :- song(S) & v 1 (S, C) } v 2 (CD, Artist, Price) } v 3 (CD, Artist, Price) V 2 (C, A, P) :- cd(C) & v 2 (C, A, P) artist (A) :- cd(C) & v 2 (C, A, P) price (P) :- cd(C) & v 2 (C, A, P) V 3 (C, A, P) :- artist(A) & v 3 (C, A, P) cd(C) :- artist(A) & v 3 (C, A, P) price(P) :- artist(A) & v 3 (C, A, P)
14
14 Binding assumptions: –A binding for an attribute is from the attribute’s domain; –Do not allow the “strategy” of trying all the possible strings to “test” the source (may not terminate); –Any binding is either obtained from the query, or from a tuple returned by a source query. The program (Q,V) computes the maximal answer.
15
15 A B CD EF f f b v 2 (A, B, C) b f v 3 (C, D) b f v 1 (A, C) b f v 5 (E, F) f f v 4 (C, E) Query: Input: A = a 1 Output: D = ? Connections: T 1 = {v 1,v 3 }, T 2 = {v 2,v 3 } Not all the views need to accessed. Question 2: when to access off-query sources?
16
16 T 1 : accessing outside T 1 sources is NOT necessary. A C v 3 (C, D)v 1 (A, C) D T 2 : accessing outside T 2 sources is necessary to get C bindings. A B C D v 2 (A, B, C) v 3 (C, D)
17
17 Independent connections A connection T is independent if all the views in T can be queried starting from the input attributes as the initial bindings and using only the views in T. T 2 is not independent, it needs C bindings. A B C D v 2 (A, B, C) v 3 (C, D) T 1 is independent. A C v 3 (C, D)v 1 (A, C) D Theorem: off-connection source accesses are only necessary for nonindependent connections.
18
18 A view v is relevant to connection T if we may miss some answers to T when v is not used. A B C D EF v 2 (A, B, C) v 3 (C, D)v 1 (A, C) v 5 (E, F)v 4 (C, E) The relevant views of T 2 are: v 2, v 3, v 1, v 4. How to find all the relevant views of a nonindependent connection? Question 3: what sources should be accessed?
19
19 Kernel A kernel of a connection is a minimal set of attributes that need to be initially bound in addition to the input attributes to query the full connection. A connection may have multiple kernels. T 1 has one kernel: {} A C v 3 (C, D)v 1 (A, C) D T 2 has one kernel: {C} A B C D v 2 (A, B, C) v 3 (C, D)
20
20 Algorithm FIND_REL: Finding relevant views of a connection Find all the relevant views of connection T 2 = {v 2,v 3 }: A B C D EF v 2 (A, B, C) v 3 (C, D)v 1 (A, C) v 5 (E, F)v 4 (C, E) (1) Compute queryable views: {v 1,v 2,v 3,v 4,v 5 }; (2) Find a kernel K of T 2 : K = {C}; (4) Return R T 2 = {v 1,v 2,v 3,v 4 }. (3) Compute all the views that can help produce bindings for the attributes in K: R = {v 1,v 2,v 4 } ;
21
21 Constructing an efficient program Compute the relevant views for each connection; Take the union of all these relevant source views; Use these views to construct a new program; Remove useless rules.
22
22 Conclusions A query-planning framework to compute the maximal answer to a query (Duschka and Levy [IJCAI-97]). Techniques for telling when to access off-query views; Algorithms: –finding all the relevant sources for a query; –constructing an efficient program.
23
23 Other related work Rajaraman, Sagiv, and Ullman [PODS-95]: –Shows how to find an equivalent query rewriting using views with binding restrictions; –We give the maximal rewriting of a query. Optimizing conjunctive queries with binding restrictions: –Yerneni, Li, Garcia-Molina, and Ullman [ICDT-99]; –Florescu et al. [SIGMOD-99]. Testing connection containment: –Li [Stanford-CS-TR 2000], using results of monadic programs to prove the problem is decidable.
24
24 Predicates EDB predicatesIDB predicates v 1 (S, C)V 1 (S, C) v 2 (C, A,P)V 2 (C, A, P) v 3 (C, A, P)V 3 (C, A, P) cd(C) song(S) artist(A) price(P) ans(P) } -predicates } domain predicates
25
25 Evaluating program (Q,V) Assume the right side of an -rule or a domain rule is: domA 1 (A 1 ), …, domA p (A p ), v i (A 1,…, A m ) Once we have bindings for domA 1 (A 1 ), …, domA p (A p ), evaluate the rule and populate the domain predicates and -predicate. Repeat until no more facts can be derived. Compute the maximal answer to the query.
26
26 Forward-closure Given views W V, and attributes X, the forward-closure of X given W, denoted f-closure(X,W), is the the set of views in W that can be eventually queried by using the views in W, starting from the initial bindings X. f-closure({A},{v 1,v 2,v 3 }) = {v 1,v 2,v 3 } f-closure({D},{v 1,v 2,v 3 }) = {} A B C D EF v 2 (A, B, C) v 3 (C, D)v 1 (A, C) v 5 (E, F)v 4 (C, E)
27
27 Backward-closure of a set of attributes X: b-closure(X), is the set of views that can help retrieve bindings for X. Backward-closure Lemma: All backward-closures of a connection are the same. b-closure(C) = {v 1,v 2,v 4 } A B C D EF v 2 (A, B, C) v 3 (C, D)v 1 (A, C) v 5 (E, F)v 4 (C, E)
28
28 BF-chain: Backward-closure: BF-chain, backward-closure free bound free A B C D EF v 2 (A, B, C) v 3 (C, D)v 1 (A, C) v 5 (E, F)v 4 (C, E) b-closure(C) = {v 1,v 2,v 4 }
29
29 Other possibilities of obtaining bindings Cached data: For a cached tuple t i (a 1,a 2 ) for view v i (A 1,A 2 ), add the following rules to the program (Q, V): v i (a 1,a 2 ) :- domA 1 (a 1 ) :- domA 2 (a 2 ) :- Domain knowledge: –student(name, dept, GPA). –dept = CS, Physics, Chemistry, etc.
30
30 Computing a partial answer Independent connections: complete answers are computable. Nonindependent connections: access some relevant views. May terminate evaluating the program after some results are computed.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.