Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFORMATION INTEGRATION Shengyu Li CS-257 ID-211.

Similar presentations


Presentation on theme: "INFORMATION INTEGRATION Shengyu Li CS-257 ID-211."— Presentation transcript:

1 INFORMATION INTEGRATION Shengyu Li CS-257 ID-211

2 Outline Basic Capability-Based Optimization Optimizing Mediator Queries

3 Basic titleyearlengthtype Gone With the Wind1939231drama Star Wars1977124sciFi Wayne's World199295comedy Attributes: Appear at the tops of the columns. Describe the meaning of entries in the column below. Example: title, year, length, type Movie:

4 Basic titleyearlengthtype Gone With the Wind1939231drama Star Wars1977124sciFi Wayne's World199295comedy Schemas: The name of the relation + the set of attributes Example: Movies (title, year, length, type) Movie:

5 Basic titleyearlengthtype Gone With the Wind1939231drama Star Wars1977124sciFi Wayne's World199295comedy Tuples: The rows of a relation, other than the header row containing the attribute names. A tuple has one component for each attribute of relations. Example: Tuple 1 has 4 components: Gone With the Wind, 1939, 231, drama for attributes title, year, length, and type. Movie:

6 Basic titleyearlengthtype Gone With the Wind1939231drama Star Wars1977124sciFi Wayne's World199295comedy Projection: The projection operator is used to produce from a relation R a new relation that has only some of R’s columns: Example: π Title, year, length (Movies) The resulting relation is: Movie: titleyearlength Gone With the Wind1939231 Star Wars1977124 Wayne's World199295

7 Basic ABBCDA R. B S. BCDABCD 12256122561256 34478124783478 910111291011 RS 34256 34478 3491011 RXS Natural join of R and S

8 Basic Datalog Rules and Queries: 1.A relational atom called the head, followed by 2.The symbol <-, which we often read “if”, followed by 3.A body consisting one or more atoms, called subgoals, which may be either relational or arithmetic. Subgoals are connected by AND, and any subgoal may optionally be preceded by the logical operatior NOT. Ex: LongMovie(title,year) 100

9 Capability-Based Optimization Limited Source Capabilities  Web-based Interfaces The top 20 sellers? SELECT * FROM Books

10 Why?  Legacy Sources Archaic/unique system  Security “tell me about all your books” Medical database  Indexes on large databases Books database infeasible queries Capability-Based Optimization

11 Source Capabilities Notation – adornments:  Sequences of codes that represent the requirement for attributes of the relation for relational data f (free): can be specified, we choose b (bound): must be specified u (unspecified): is not permitted to specified Capability-Based Optimization

12 c[S] (choice from set S): a value must be specified and that value must be one of the values in the finite set S. o[S] (optional, from set S): we either do not specify a value, or we specify one of the values in the finite set S. Place a prime on a code to indicate the attribute is not part of the output of the query. Capability-Based Optimization

13 Capabilities Specification:  A set of adornments  to query the source successfully, the query must match one of the adornments in its capabilities specification  For f (free) or o[S], queries with different sets of attributes may match that adornment. Capability-Based Optimization

14 Example: Cars (serialNo, model, color, autoTrans, navi) Dealer 1 might allow this data to be queried: 1. The user specifies a serial number. All the information about the car with that serial number is produced as output. 2. The user specifies a model and color, and perhaps whether or not automatic transmission and navigation system. All five attributes are printed for all matching cars. Capability-Based Optimization

15 Capability-Based Query-Plan Selection  Capability based query optimizer: consider what queries that will help to answer the query. takes binding for some more attributes may make some more queries at the sources possible.  This process will repeat until either: feasible: answer the query Impossible query: no more valid forms Capability-Based Optimization

16 Capability-Based Query-Plan Selection  The simplest form of mediator query for which we need to apply this strategy is a join of relations, each of which is available, with certain adornments, at one or more sources. If so, then the search strategy is to try to get tuples for each relation in the join, by providing enough argument bindings that some source allows a query about that relation to be asked and answered. Capability-Based Optimization

17 Capability-Based Query-Plan Selection  Example: Autos (serial, model, color) Options (serial, option)  adornment:  Autos: ubf  Options: two adornments: bu and uc[autoTrans, navi] Query: find the serial numbers and colors of Gobi models with navigation system” Capability-Based Optimization

18 Adding Cost Based Optimization  Cost-based optimization requires that the mediator has to know about the cost of the queries involved.  Since the sources are usually independent of the mediator, it is difficult to estimate the cost. Capability-Based Optimization

19 Optimizing Mediator Queries  Chain algorithm greedy algorithm sends a sequence of requests to its sources always finds a way to answer the query provides at least one solution exists  The class of queries that can be handled involve joins of relations that come from the sources followed by an optional selection can be expressed as Datalog rules  To describe a relational algebra Optimizing Mediator Queries

20 Simplified Adornment Notation  b (bound) and f (free) adornments  Use c[S] adornment as soon as we know all possible values of interest for that attribute  Free for o[S], u Optimizing Mediator Queries

21 Example: Autos buu (serial, model, color) Options uc[autoTrans, navi] (serial, option) - “find the serial numbers and colors of Gobi models with a navigation system” - Answer (s, c) <- Autos fbf (s, “Gobi”, c) AND Options fb (s, “navi”) Optimizing Mediator Queries

22 Obtaining Answers for Subgoals  Supposed we have a subgoal: R x1x2…xn (a1,a2,…,an) xi: b or f R: Relation that can be queried at some source y1y2…yn: one of the adornments for R at its source, It is possible to obtain a relation for the subgoal provided, for each i = 1,2,…n, privided: - If yi is b or of the form c[S], then xi = b - If xi = f, then yi is not output restricted (i.e. not primed) We say that the adornment on the subgoal matches the adornment at the source. Optimizing Mediator Queries

23 Example:  Subgoal: R bbff (p,q,r,s)  Adornments for R at its sources are: α1 = fc[S1]uo[S2] -- set q as member of S1 α2 = c[S3]bfc[S4] -- not match Optimizing Mediator Queries

24 The Chain Algorithm  Greedy approach to select an order in which we obtain relations for each of the subgoals of a Datalog rule.  Not guaranteed to provide the most efficient solution, but it will provide a solution whenever one exists.  In practice, it is very likely to obtain the most efficient solution. Optimizing Mediator Queries

25 Chain Algorithm maintain 2 kinds of information:  An adornment is maintained for each subgoal. Initially, the adornment for a subgoal has b if and only if the mediator query provides a constant binding for the corresponding argument of that subgoal, as for instance: - Answer (s, c) <- Autos fbf (s, “Gobi”, c) AND Options fb (s, “navi”) Optimizing Mediator Queries

26  Consider the mediator query Q: Answer(c) <- R bf (1,a) AND S ff (a,b) AND T ff (b,c) There are three sources that provide answers to queries about R, S, and T, respectively: Optimizing Mediator Queries RelationRST Datawxxyyz 122446 133557 14 58 Adornmentbfc'[2,3,5]fbubu

27 Thank You


Download ppt "INFORMATION INTEGRATION Shengyu Li CS-257 ID-211."

Similar presentations


Ads by Google