Presentation is loading. Please wait.

Presentation is loading. Please wait.

2005lav-ii1 Local as View: Some refinements  IM: Filtering irrelevant sources  Views with restricted access patterns  A summary of IM.

Similar presentations


Presentation on theme: "2005lav-ii1 Local as View: Some refinements  IM: Filtering irrelevant sources  Views with restricted access patterns  A summary of IM."— Presentation transcript:

1 2005lav-ii1 Local as View: Some refinements  IM: Filtering irrelevant sources  Views with restricted access patterns  A summary of IM

2 2005lav-ii2  IM: Filtering irrelevant sources When there are many sources, it is important to weed out those that are irrelevant to a query Comparison constraints can help (e.g., qu >= w98) What more can be done? The IM system suggests to introduce classes with a class hierarchy into source descriptions

3 2005lav-ii3 Example : -- disjoint classes Additionally, the global schema contains a relation details(car, year, mileage, price, sellerContact) [ c, y, mi, p, s ] (we will also abbreviate class names) car carForSale usedCar newCar GermanCar AmericanCarEurpoeanCar JapaneseCar ItalianCarFrenchCar

4 2005lav-ii4 The views: v1(c, y, mi, p, s) :- details(c,y,mi,p,s), cFSale(c), uCar(c),,y >= 1990 v2(c, y, p, s) :- details(c,y,mi,p,s), cFSale(c), EurCar(c) v3(c, y, p, s) :- details(c,y,mi,p,s), cFSale(c), uCar(c), p>= $25000 // luxury cars v4(c, y, p):- details(c,y,mi,p,s), cFSale(c), uCar(c), y<= 1980 //vintage cars v5(c, y, p, s) :- details(c, mc, y, p, s), cFSale(c), nCar(c), c=Toyota Assume a query: Q: q(c, mc, y, p, s) :- details(c, y, mi, p, s), cFSale(c), Jcar(c), y>= 1992, p<= $12000 Some candidate rewritings will be rejected, since they are inconsistent with Q

5 2005lav-ii5 When a view is considered for consistency with Q, v4 will be discarded – y =1992 is inconsistent v3 will be discarded – p>=$25000, p<=$12000 is inconsistent v2 will be discarded – EurCar(c), JCar(c) is inconsistent v5 – depends on what is known about the relationship between Toyota and the various car classes Reasoning about disjoint-ness of classes (given a hierarchy as above) is easy and efficient

6 2005lav-ii6 The true story (a side trip): IM uses a (PTIME) Description Logic for source description A DL is a formalism that describes classes & binary relationships intentionally. For example, a class can be given by a name (e.g. JCar) or by an expression that describes its properties: cheapJCar :- uCar and JCar and price < $9000 A DL also contains containment and disjoint-ness axioms for class expressions (containment is called subsumption in DL jargon) To be useful, a DL needs to support containment and disjoint- ness queries on classes and membership queries on individuals – this is an inference problem

7 2005lav-ii7 Many DL’s are known Complexity (for subsumption) ranges from polynomial (rare), to NP-complete, to exptime-complete, to undecidable Recent interest focuses on using DL’s for the Semantic Web The W3C OWL standard is essentially a DL (this use is essentially the same as in IM) That is it on DL’s

8 2005lav-ii8  Views with restricted access patterns Many sources do not support full SQL: They are legacy systems, e.g. –finger on UNIX accepts email, returns other attributes –A bibliography source requires author, or title, or but does not accept a year as input They do not want to disclose all their data, e.g., –a carSale source will not present all the cars it has for sale –An airline requires from and destination as input for flight info The questions: How do we describe such sources? What are good rewritings and do we find them?

9 2005lav-ii9 Restricted sources can be described by binding patterns Two equivalent styles : (there are more sophisticated schemes) Example: assume global relations email(F, L, E), office(F, L, O), phone(O, P) (F-first, L-last, E-email, O-office, P-phone) The views are finger, userId, described as follows: Adding $ to attributes that can be given as input finger(F, L, $E, O, P) :- email(F, L, E), office(F, L, O), phone(O, P) userId($O, E) :- office(F, L, O), email(F, L, E) Using b, f strings on predicates, where b means bound (i.e., in) finger ffbff (F, L, E, O, P) :- email(F, L, E), office(F, L, O), phone(O, P) userId bf (O, E) :- office(F, L, O), email(F, L, E)

10 2005lav-ii10 Example, cont’d : Q: q bf (O, F) :- office(F, L, O) (or q($O, F) :- office(F, L, O) ) Cannot be answered by using finger – it requires E as input Cannot be answered by using userId – it does not return F The following is a good rewriting: q’(O, F):- userId(O, E), finger(F, L, E, O, P) jumpjump For two reasons: It is executable with respect to the sources: executing the body left-to-right respects the access restrictions O for userId –from the query, E for finger – from userId Its expansion is contained in the query (check!)

11 2005lav-ii11 These two reasons are a characterization of a good rewriting: It is executable with respect to the sources: executing the body left-to-right respects the access restrictions Its expansion is contained in the query (check!) Indeed If it is not a contained rewriting, then being executable is no good Being contained but not executable is also no good

12 2005lav-ii12 The IM approach: After a rewriting is found to be consistent and contained, it is checked for being executable – can the sub-goals in the body be ordered so that the input required for each is supplied from the query or the sub-goals to its left

13 2005lav-ii13  A summary of IM Introduced (with other concurrent systems) the notion of LAV and query rewriting using views Also, detailed source descriptions using DL’s An efficient algorithm for finding contained and executable rewritings Worked well, for about 100 sources

14 2005lav-ii14 Here is a graph from the paper

15 2005lav-ii15 But : The fact that a contained rewriting needs a number of views at most the number of atoms in the query has been proved only for CQ’s, without comparisons, access restrictions constraints on the global db Does it hold for these cases? (see example in p. 10)p. 10 For access restricted sources, it has been proved that for equivalent rewritings one needs at most n+m views, where n is the number of atoms in the query, m is the number of different variables in it The proof does not hold for contained rewritings

16 2005lav-ii16 Even for “pure” CQ’s, is the bucket algorithm guaranteed to find all rewritings? The answers to all these questions are negative! The bucket algorithm does not find all rewritings For the more general cases, longer rewritings are needed; actually, there may be an infinite number of them, with no bound on length There is a need for another approach


Download ppt "2005lav-ii1 Local as View: Some refinements  IM: Filtering irrelevant sources  Views with restricted access patterns  A summary of IM."

Similar presentations


Ads by Google