Download presentation
Presentation is loading. Please wait.
Published byNatalie Sheena French Modified over 8 years ago
1
INFORMATION INTEGRATION Shengyu Li CS-257 ID-211
2
Outline Basic Capability-Based Optimization Optimizing Mediator Queries
3
Basic titleyearlengthtype Gone With the Wind1939231drama Star Wars1977124sciFi Wayne's World199295comedy Attributes: Appear at the tops of the columns. Describe the meaning of entries in the column below. Example: title, year, length, type Movie:
4
Basic titleyearlengthtype Gone With the Wind1939231drama Star Wars1977124sciFi Wayne's World199295comedy Schemas: The name of the relation + the set of attributes Example: Movies (title, year, length, type) Movie:
5
Basic titleyearlengthtype Gone With the Wind1939231drama Star Wars1977124sciFi Wayne's World199295comedy Tuples: The rows of a relation, other than the header row containing the attribute names. A tuple has one component for each attribute of relations. Example: Tuple 1 has 4 components: Gone With the Wind, 1939, 231, drama for attributes title, year, length, and type. Movie:
6
Basic titleyearlengthtype Gone With the Wind1939231drama Star Wars1977124sciFi Wayne's World199295comedy Projection: The projection operator is used to produce from a relation R a new relation that has only some of R’s columns: Example: π Title, year, length (Movies) The resulting relation is: Movie: titleyearlength Gone With the Wind1939231 Star Wars1977124 Wayne's World199295
7
Basic ABBCDA R. B S. BCDABCD 12256122561256 34478124783478 910111291011 RS 34256 34478 3491011 RXS Natural join of R and S
8
Basic Datalog Rules and Queries: 1.A relational atom called the head, followed by 2.The symbol <-, which we often read “if”, followed by 3.A body consisting one or more atoms, called subgoals, which may be either relational or arithmetic. Subgoals are connected by AND, and any subgoal may optionally be preceded by the logical operatior NOT. Ex: LongMovie(title,year) 100
9
Capability-Based Optimization Limited Source Capabilities Web-based Interfaces The top 20 sellers? SELECT * FROM Books
10
Why? Legacy Sources Archaic/unique system Security “tell me about all your books” Medical database Indexes on large databases Books database infeasible queries Capability-Based Optimization
11
Source Capabilities Notation – adornments: Sequences of codes that represent the requirement for attributes of the relation for relational data f (free): can be specified, we choose b (bound): must be specified u (unspecified): is not permitted to specified Capability-Based Optimization
12
c[S] (choice from set S): a value must be specified and that value must be one of the values in the finite set S. o[S] (optional, from set S): we either do not specify a value, or we specify one of the values in the finite set S. Place a prime on a code to indicate the attribute is not part of the output of the query. Capability-Based Optimization
13
Capabilities Specification: A set of adornments to query the source successfully, the query must match one of the adornments in its capabilities specification For f (free) or o[S], queries with different sets of attributes may match that adornment. Capability-Based Optimization
14
Example: Cars (serialNo, model, color, autoTrans, navi) Dealer 1 might allow this data to be queried: 1. The user specifies a serial number. All the information about the car with that serial number is produced as output. 2. The user specifies a model and color, and perhaps whether or not automatic transmission and navigation system. All five attributes are printed for all matching cars. Capability-Based Optimization
15
Capability-Based Query-Plan Selection Capability based query optimizer: consider what queries that will help to answer the query. takes binding for some more attributes may make some more queries at the sources possible. This process will repeat until either: feasible: answer the query Impossible query: no more valid forms Capability-Based Optimization
16
Capability-Based Query-Plan Selection The simplest form of mediator query for which we need to apply this strategy is a join of relations, each of which is available, with certain adornments, at one or more sources. If so, then the search strategy is to try to get tuples for each relation in the join, by providing enough argument bindings that some source allows a query about that relation to be asked and answered. Capability-Based Optimization
17
Capability-Based Query-Plan Selection Example: Autos (serial, model, color) Options (serial, option) adornment: Autos: ubf Options: two adornments: bu and uc[autoTrans, navi] Query: find the serial numbers and colors of Gobi models with navigation system” Capability-Based Optimization
18
Adding Cost Based Optimization Cost-based optimization requires that the mediator has to know about the cost of the queries involved. Since the sources are usually independent of the mediator, it is difficult to estimate the cost. Capability-Based Optimization
19
Optimizing Mediator Queries Chain algorithm greedy algorithm sends a sequence of requests to its sources always finds a way to answer the query provides at least one solution exists The class of queries that can be handled involve joins of relations that come from the sources followed by an optional selection can be expressed as Datalog rules To describe a relational algebra Optimizing Mediator Queries
20
Simplified Adornment Notation b (bound) and f (free) adornments Use c[S] adornment as soon as we know all possible values of interest for that attribute Free for o[S], u Optimizing Mediator Queries
21
Example: Autos buu (serial, model, color) Options uc[autoTrans, navi] (serial, option) - “find the serial numbers and colors of Gobi models with a navigation system” - Answer (s, c) <- Autos fbf (s, “Gobi”, c) AND Options fb (s, “navi”) Optimizing Mediator Queries
22
Obtaining Answers for Subgoals Supposed we have a subgoal: R x1x2…xn (a1,a2,…,an) xi: b or f R: Relation that can be queried at some source y1y2…yn: one of the adornments for R at its source, It is possible to obtain a relation for the subgoal provided, for each i = 1,2,…n, privided: - If yi is b or of the form c[S], then xi = b - If xi = f, then yi is not output restricted (i.e. not primed) We say that the adornment on the subgoal matches the adornment at the source. Optimizing Mediator Queries
23
Example: Subgoal: R bbff (p,q,r,s) Adornments for R at its sources are: α1 = fc[S1]uo[S2] -- set q as member of S1 α2 = c[S3]bfc[S4] -- not match Optimizing Mediator Queries
24
The Chain Algorithm Greedy approach to select an order in which we obtain relations for each of the subgoals of a Datalog rule. Not guaranteed to provide the most efficient solution, but it will provide a solution whenever one exists. In practice, it is very likely to obtain the most efficient solution. Optimizing Mediator Queries
25
Chain Algorithm maintain 2 kinds of information: An adornment is maintained for each subgoal. Initially, the adornment for a subgoal has b if and only if the mediator query provides a constant binding for the corresponding argument of that subgoal, as for instance: - Answer (s, c) <- Autos fbf (s, “Gobi”, c) AND Options fb (s, “navi”) Optimizing Mediator Queries
26
Consider the mediator query Q: Answer(c) <- R bf (1,a) AND S ff (a,b) AND T ff (b,c) There are three sources that provide answers to queries about R, S, and T, respectively: Optimizing Mediator Queries RelationRST Datawxxyyz 122446 133557 14 58 Adornmentbfc'[2,3,5]fbubu
27
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.