2005lav-ii1 Local as View: Some refinements IM: Filtering irrelevant sources Views with restricted access patterns A summary of IM
2005lav-ii2 IM: Filtering irrelevant sources When there are many sources, it is important to weed out those that are irrelevant to a query Comparison constraints can help (e.g., qu >= w98) What more can be done? The IM system suggests to introduce classes with a class hierarchy into source descriptions
2005lav-ii3 Example : -- disjoint classes Additionally, the global schema contains a relation details(car, year, mileage, price, sellerContact) [ c, y, mi, p, s ] (we will also abbreviate class names) car carForSale usedCar newCar GermanCar AmericanCarEurpoeanCar JapaneseCar ItalianCarFrenchCar
2005lav-ii4 The views: v1(c, y, mi, p, s) :- details(c,y,mi,p,s), cFSale(c), uCar(c),,y >= 1990 v2(c, y, p, s) :- details(c,y,mi,p,s), cFSale(c), EurCar(c) v3(c, y, p, s) :- details(c,y,mi,p,s), cFSale(c), uCar(c), p>= $25000 // luxury cars v4(c, y, p):- details(c,y,mi,p,s), cFSale(c), uCar(c), y<= 1980 //vintage cars v5(c, y, p, s) :- details(c, mc, y, p, s), cFSale(c), nCar(c), c=Toyota Assume a query: Q: q(c, mc, y, p, s) :- details(c, y, mi, p, s), cFSale(c), Jcar(c), y>= 1992, p<= $12000 Some candidate rewritings will be rejected, since they are inconsistent with Q
2005lav-ii5 When a view is considered for consistency with Q, v4 will be discarded – y =1992 is inconsistent v3 will be discarded – p>=$25000, p<=$12000 is inconsistent v2 will be discarded – EurCar(c), JCar(c) is inconsistent v5 – depends on what is known about the relationship between Toyota and the various car classes Reasoning about disjoint-ness of classes (given a hierarchy as above) is easy and efficient
2005lav-ii6 The true story (a side trip): IM uses a (PTIME) Description Logic for source description A DL is a formalism that describes classes & binary relationships intentionally. For example, a class can be given by a name (e.g. JCar) or by an expression that describes its properties: cheapJCar :- uCar and JCar and price < $9000 A DL also contains containment and disjoint-ness axioms for class expressions (containment is called subsumption in DL jargon) To be useful, a DL needs to support containment and disjoint- ness queries on classes and membership queries on individuals – this is an inference problem
2005lav-ii7 Many DL’s are known Complexity (for subsumption) ranges from polynomial (rare), to NP-complete, to exptime-complete, to undecidable Recent interest focuses on using DL’s for the Semantic Web The W3C OWL standard is essentially a DL (this use is essentially the same as in IM) That is it on DL’s
2005lav-ii8 Views with restricted access patterns Many sources do not support full SQL: They are legacy systems, e.g. –finger on UNIX accepts , returns other attributes –A bibliography source requires author, or title, or but does not accept a year as input They do not want to disclose all their data, e.g., –a carSale source will not present all the cars it has for sale –An airline requires from and destination as input for flight info The questions: How do we describe such sources? What are good rewritings and do we find them?
2005lav-ii9 Restricted sources can be described by binding patterns Two equivalent styles : (there are more sophisticated schemes) Example: assume global relations (F, L, E), office(F, L, O), phone(O, P) (F-first, L-last, E- , O-office, P-phone) The views are finger, userId, described as follows: Adding $ to attributes that can be given as input finger(F, L, $E, O, P) :- (F, L, E), office(F, L, O), phone(O, P) userId($O, E) :- office(F, L, O), (F, L, E) Using b, f strings on predicates, where b means bound (i.e., in) finger ffbff (F, L, E, O, P) :- (F, L, E), office(F, L, O), phone(O, P) userId bf (O, E) :- office(F, L, O), (F, L, E)
2005lav-ii10 Example, cont’d : Q: q bf (O, F) :- office(F, L, O) (or q($O, F) :- office(F, L, O) ) Cannot be answered by using finger – it requires E as input Cannot be answered by using userId – it does not return F The following is a good rewriting: q’(O, F):- userId(O, E), finger(F, L, E, O, P) jumpjump For two reasons: It is executable with respect to the sources: executing the body left-to-right respects the access restrictions O for userId –from the query, E for finger – from userId Its expansion is contained in the query (check!)
2005lav-ii11 These two reasons are a characterization of a good rewriting: It is executable with respect to the sources: executing the body left-to-right respects the access restrictions Its expansion is contained in the query (check!) Indeed If it is not a contained rewriting, then being executable is no good Being contained but not executable is also no good
2005lav-ii12 The IM approach: After a rewriting is found to be consistent and contained, it is checked for being executable – can the sub-goals in the body be ordered so that the input required for each is supplied from the query or the sub-goals to its left
2005lav-ii13 A summary of IM Introduced (with other concurrent systems) the notion of LAV and query rewriting using views Also, detailed source descriptions using DL’s An efficient algorithm for finding contained and executable rewritings Worked well, for about 100 sources
2005lav-ii14 Here is a graph from the paper
2005lav-ii15 But : The fact that a contained rewriting needs a number of views at most the number of atoms in the query has been proved only for CQ’s, without comparisons, access restrictions constraints on the global db Does it hold for these cases? (see example in p. 10)p. 10 For access restricted sources, it has been proved that for equivalent rewritings one needs at most n+m views, where n is the number of atoms in the query, m is the number of different variables in it The proof does not hold for contained rewritings
2005lav-ii16 Even for “pure” CQ’s, is the bucket algorithm guaranteed to find all rewritings? The answers to all these questions are negative! The bucket algorithm does not find all rewritings For the more general cases, longer rewritings are needed; actually, there may be an infinite number of them, with no bound on length There is a need for another approach