Download presentation
Presentation is loading. Please wait.
1
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION
2
Presentation Outline 21.4 Capability Based Optimization 21.4.1The Problem of Limited Source Capabilities 21.4.2 A notation for Describing Source Capabilities 21.4.3 Capability-Based Query-Plan Selection 21.4.4 Adding Cost-Based Optimization 21.5 Optimizing Mediator Queries 21.5.1 Simplified Adornment Notation 21.5.2 Obtaining Answers for Subgoals 21.5.3 The Chain Algorithm 21.5.4 Incorporating Union Views at the Mediator
3
21.4 Capability Based Optimization Introduction Typical DBMS estimates the cost of each query plan and picks what it believes to be the best Mediator – has knowledge of how long its sources will take to answer Optimization of mediator queries cannot rely on cost measure alone to select a query plan Optimization by mediator follows capability based optimization
4
21.4.1 The Problem of Limited Source Capabilities Many sources have only Web Based interfaces Web sources usually allow querying through a query form E.g. Amazon.com interface allows us to query about books in many different ways. But we cannot ask questions that are too general E.g. Select * from books;
5
21.4.1 The Problem of Limited Source Capabilities (con’t) Reasons why a source may limit the ways in which queries can be asked Earliest database did not use relational DBMS that supports SQL queries Indexes on large database may make certain queries feasible, while others are too expensive to execute Security reasons E.g. Medical database may answer queries about averages, but won’t disclose details of a particular patient's information
6
21.4.2 A Notation for Describing Source Capabilities For relational data, the legal forms of queries are described by adornments Adornments – Sequences of codes that represent the requirements for the attributes of the relation, in their standard order f(free) – attribute can be specified or not b(bound) – must specify a value for an attribute but any value is allowed u(unspecified) – not permitted to specify a value for a attribute
7
21.4.2 A notation for Describing Source Capabilities….(cont’d) c[S](choice from set S) means that a value must be specified and value must be from finite set S. o[S](optional from set S) means either do not specify a value or we specify a value from finite set S A prime (f’) specifies that an attribute is not a part of the output of the query A capabilities specification is a set of adornments A query must match one of the adornments in its capabilities specification
8
21.4.2 A notation for Describing Source Capabilities….(cont’d) E.g. Dealer 1 is a source of data in the form: Cars (serialNo, model, color, autoTrans, navi) The adornment for this query form is b’uuuu
9
21.4.3 Capability-Based Query-Plan Selection Given a query at the mediator, a capability based query optimizer first considers what queries it can ask at the sources to help answer the query The process is repeated until: Enough queries are asked at the sources to resolve all the conditions of the mediator query and therefore query is answered. Such a plan is called feasible. We can construct no more valid forms of source queries, yet still cannot answer the mediator query. It has been an impossible query.
10
21.4.3 Capability-Based Query-Plan Selection (cont’d) The simplest form of mediator query where we need to apply the above strategy is join relations E.g we have sources for dealer 2 Autos(serial, model, color) Options(serial, option) Suppose that ubf is the sole adornment for Auto and Options have two adornments, bu and uc[autoTrans, navi] Query is – find the serial numbers and colors of Gobi models with a navigation system
11
21.4.4 Adding Cost-Based Optimization Mediator’s Query optimizer is not done when the capabilities of the sources are examined Having found feasible plans, it must choose among them Making an intelligent, cost based query optimization requires that the mediator knows a great deal about the costs of queries involved Sources are independent of the mediator, so it is difficult to estimate the cost
12
21.5 Optimizing Mediator Queries Chain algorithm – a greed algorithm that finds a way to answer the query by sending a sequence of requests to its sources. Will always find a solution assuming at least one solution exists. The solution may not be optimal.
13
21.5.1 Simplified Adornment Notation A query at the mediator is limited to b (bound) and f (free) adornments. We use the following convention for describing adornments: name adornments (attributes) where: name is the name of the relation the number of adornments = the number of attributes
14
21.5.2 Obtaining Answers for Subgoals Rules for subgoals and sources: Suppose we have the following subgoal: R x 1 x 2 …x n (a 1, a 2, …, a n ), and source adornments for R are: y 1 y 2 …y n. If y i is b or c[S], then x i = b. If x i = f, then y i is not output restricted. The adornment on the subgoal matches the adornment at the source: If y i is f, u, or o[S] and x i is either b or f.
15
21.5.3 The Chain Algorithm Maintains 2 types of information: An adornment for each subgoal. A relation X that is the join of the relations for all the subgoals that have been resolved. Initially, the adornment for a subgoal is b iff the mediator query provides a constant binding for the corresponding argument of that subgoal. Initially, X is a relation over no attributes, containing just an empty tuple.
16
21.5.3 The Chain Algorithm (con’t) First, initialize adornments of subgoals and X. Then, repeatedly select a subgoal that can be resolved. Let R α (a 1, a 2, …, a n ) be the subgoal: 1. Wherever α has a b, we shall find the argument in R is a constant, or a variable in the schema of R. Project X onto its variables that appear in R.
17
21.5.3 The Chain Algorithm (con’t) 2. For each tuple t in the project of X, issue a query to the source as follows ( β is a source adornment). If a component of β is b, then the corresponding component of α is b, and we can use the corresponding component of t for source query. If a component of β is c[S], and the corresponding component of t is in S, then the corresponding component of α is b, and we can use the corresponding component of t for the source query. If a component of β is f, and the corresponding component of α is b, provide a constant value for source query.
18
21.5.3 The Chain Algorithm (con’t) If a component of β is u, then provide no binding for this component in the source query. If a component of β is o[S], and the corresponding component of α is f, then treat it as if it was a f. If a component of β is o[S], and the corresponding component of α is b, then treat it as if it was c[S]. 3. Every variable among a 1, a 2, …, a n is now bound. For each remaining unresolved subgoal, change its adornment so any position holding one of these variables is b.
19
21.5.3 The Chain Algorithm (con’t) 4. Replace X with X π s(R), where S is all of the variables among: a 1, a 2, …, a n. 5. Project out of X all components that correspond to variables that do not appear in the head or in any unresolved subgoal. If every subgoal is resolved, then X is the answer. If every subgoal is not resolved, then the algorithm fails. α
20
21.5.3 The Chain Algorithm Example Mediator query: Q: Answer(c) ← R bf (1,a) AND S ff (a,b) AND T ff (b,c) Example: Relation R S T Data Adornment bfc’[2,3,5]f bu wx 12 13 14 xy 24 35 yz 46 57 58
21
21.5.3 The Chain Algorithm Example (con’t) Initially, the adornments on the subgoals are the same as Q, and X contains an empty tuple. S and T cannot be resolved because they each have ff adornments, but the sources have either a b or c. R(1,a) can be resolved because its adornments are matched by the source’s adornments. Send R(w,x) with w=1 to get the tables on the previous page.
22
21.5.3 The Chain Algorithm Example (con’t) Project the subgoal’s relation onto its second component, since only the second component of R(1,a) is a variable. This is joined with X, resulting in X equaling this relation. Change adornment on S from ff to bf. a 2 3 4
23
21.5.3 The Chain Algorithm Example (con’t) Now we resolve S bf (a,b): Project X onto a, resulting in X. Now, search S for tuples with attribute a equivalent to attribute a in X. Join this relation with X, and remove a because it doesn’t appear in the head nor any unresolved subgoal: ab 24 35 b 4 5
24
21.5.3 The Chain Algorithm Example (con’t) Now we resolve T bf (b,c): Join this relation with X and project onto the c attribute to get the relation for the head. Solution is {(6), (7), (8)}. bc 46 57 58
25
21.5.4 Incorporating Union Views at the Mediator This implementation of the Chain Algorithm does not consider that several sources can contribute tuples to a relation. If specific sources have tuples to contribute that other sources may not have, it adds complexity. To resolve this, we can consult all sources, or make best efforts to return all the answers.
26
21.5.4 Incorporating Union Views at the Mediator (con’t) Consulting All Sources We can only resolve a subgoal when each source for its relation has an adornment matched by the current adornment of the subgoal. Less practical because it makes queries harder to answer and impossible if any source is down. Best Efforts We need only 1 source with a matching adornment to resolve a subgoal. Need to modify chain algorithm to revisit each subgoal when that subgoal has new bound requirements.
27
Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.