Presentation is loading. Please wait.

Presentation is loading. Please wait.

SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.

Similar presentations


Presentation on theme: "SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION."— Presentation transcript:

1 SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION

2 Presentation Outline  21.4 Capability Based Optimization  21.4.1The Problem of Limited Source Capabilities  21.4.2 A notation for Describing Source Capabilities  21.4.3 Capability-Based Query-Plan Selection  21.4.4 Adding Cost-Based Optimization  21.5 Optimizing Mediator Queries  21.5.1 Simplified Adornment Notation  21.5.2 Obtaining Answers for Subgoals  21.5.3 The Chain Algorithm  21.5.4 Incorporating Union Views at the Mediator

3 21.4 Capability Based Optimization  Introduction  Typical DBMS estimates the cost of each query plan and picks what it believes to be the best  Mediator – has knowledge of how long its sources will take to answer  Optimization of mediator queries cannot rely on cost measure alone to select a query plan  Optimization by mediator follows capability based optimization

4 21.4.1 The Problem of Limited Source Capabilities  Many sources have only Web Based interfaces  Web sources usually allow querying through a query form  E.g. Amazon.com interface allows us to query about books in many different ways.  But we cannot ask questions that are too general  E.g. Select * from books;

5 21.4.1 The Problem of Limited Source Capabilities (con’t)  Reasons why a source may limit the ways in which queries can be asked  Earliest database did not use relational DBMS that supports SQL queries  Indexes on large database may make certain queries feasible, while others are too expensive to execute  Security reasons E.g. Medical database may answer queries about averages, but won’t disclose details of a particular patient's information

6 21.4.2 A Notation for Describing Source Capabilities  For relational data, the legal forms of queries are described by adornments  Adornments – Sequences of codes that represent the requirements for the attributes of the relation, in their standard order  f(free) – attribute can be specified or not  b(bound) – must specify a value for an attribute but any value is allowed  u(unspecified) – not permitted to specify a value for a attribute

7 21.4.2 A notation for Describing Source Capabilities….(cont’d)  c[S](choice from set S) means that a value must be specified and value must be from finite set S.  o[S](optional from set S) means either do not specify a value or we specify a value from finite set S  A prime (f’) specifies that an attribute is not a part of the output of the query  A capabilities specification is a set of adornments  A query must match one of the adornments in its capabilities specification

8 21.4.2 A notation for Describing Source Capabilities….(cont’d)  E.g. Dealer 1 is a source of data in the form: Cars (serialNo, model, color, autoTrans, navi) The adornment for this query form is b’uuuu

9 21.4.3 Capability-Based Query-Plan Selection  Given a query at the mediator, a capability based query optimizer first considers what queries it can ask at the sources to help answer the query  The process is repeated until:  Enough queries are asked at the sources to resolve all the conditions of the mediator query and therefore query is answered. Such a plan is called feasible.  We can construct no more valid forms of source queries, yet still cannot answer the mediator query. It has been an impossible query.

10 21.4.3 Capability-Based Query-Plan Selection (cont’d)  The simplest form of mediator query where we need to apply the above strategy is join relations  E.g we have sources for dealer 2  Autos(serial, model, color)  Options(serial, option) Suppose that ubf is the sole adornment for Auto and Options have two adornments, bu and uc[autoTrans, navi] Query is – find the serial numbers and colors of Gobi models with a navigation system

11 21.4.4 Adding Cost-Based Optimization  Mediator’s Query optimizer is not done when the capabilities of the sources are examined  Having found feasible plans, it must choose among them  Making an intelligent, cost based query optimization requires that the mediator knows a great deal about the costs of queries involved  Sources are independent of the mediator, so it is difficult to estimate the cost

12 21.5 Optimizing Mediator Queries  Chain algorithm – a greed algorithm that finds a way to answer the query by sending a sequence of requests to its sources.  Will always find a solution assuming at least one solution exists.  The solution may not be optimal.

13 21.5.1 Simplified Adornment Notation  A query at the mediator is limited to b (bound) and f (free) adornments.  We use the following convention for describing adornments:  name adornments (attributes)  where: name is the name of the relation the number of adornments = the number of attributes

14 21.5.2 Obtaining Answers for Subgoals  Rules for subgoals and sources:  Suppose we have the following subgoal: R x 1 x 2 …x n (a 1, a 2, …, a n ), and source adornments for R are: y 1 y 2 …y n. If y i is b or c[S], then x i = b. If x i = f, then y i is not output restricted.  The adornment on the subgoal matches the adornment at the source: If y i is f, u, or o[S] and x i is either b or f.

15 21.5.3 The Chain Algorithm  Maintains 2 types of information:  An adornment for each subgoal.  A relation X that is the join of the relations for all the subgoals that have been resolved.  Initially, the adornment for a subgoal is b iff the mediator query provides a constant binding for the corresponding argument of that subgoal.  Initially, X is a relation over no attributes, containing just an empty tuple.

16 21.5.3 The Chain Algorithm (con’t)  First, initialize adornments of subgoals and X.  Then, repeatedly select a subgoal that can be resolved. Let R α (a 1, a 2, …, a n ) be the subgoal: 1. Wherever α has a b, we shall find the argument in R is a constant, or a variable in the schema of R.  Project X onto its variables that appear in R.

17 21.5.3 The Chain Algorithm (con’t) 2. For each tuple t in the project of X, issue a query to the source as follows ( β is a source adornment).  If a component of β is b, then the corresponding component of α is b, and we can use the corresponding component of t for source query.  If a component of β is c[S], and the corresponding component of t is in S, then the corresponding component of α is b, and we can use the corresponding component of t for the source query.  If a component of β is f, and the corresponding component of α is b, provide a constant value for source query.

18 21.5.3 The Chain Algorithm (con’t)  If a component of β is u, then provide no binding for this component in the source query.  If a component of β is o[S], and the corresponding component of α is f, then treat it as if it was a f.  If a component of β is o[S], and the corresponding component of α is b, then treat it as if it was c[S]. 3. Every variable among a 1, a 2, …, a n is now bound. For each remaining unresolved subgoal, change its adornment so any position holding one of these variables is b.

19 21.5.3 The Chain Algorithm (con’t) 4. Replace X with X π s(R), where S is all of the variables among: a 1, a 2, …, a n. 5. Project out of X all components that correspond to variables that do not appear in the head or in any unresolved subgoal.  If every subgoal is resolved, then X is the answer.  If every subgoal is not resolved, then the algorithm fails. α

20 21.5.3 The Chain Algorithm Example  Mediator query:  Q: Answer(c) ← R bf (1,a) AND S ff (a,b) AND T ff (b,c)  Example: Relation R S T Data Adornment bfc’[2,3,5]f bu wx 12 13 14 xy 24 35 yz 46 57 58

21 21.5.3 The Chain Algorithm Example (con’t)  Initially, the adornments on the subgoals are the same as Q, and X contains an empty tuple.  S and T cannot be resolved because they each have ff adornments, but the sources have either a b or c.  R(1,a) can be resolved because its adornments are matched by the source’s adornments.  Send R(w,x) with w=1 to get the tables on the previous page.

22 21.5.3 The Chain Algorithm Example (con’t)  Project the subgoal’s relation onto its second component, since only the second component of R(1,a) is a variable.  This is joined with X, resulting in X equaling this relation.  Change adornment on S from ff to bf. a 2 3 4

23 21.5.3 The Chain Algorithm Example (con’t)  Now we resolve S bf (a,b):  Project X onto a, resulting in X.  Now, search S for tuples with attribute a equivalent to attribute a in X.  Join this relation with X, and remove a because it doesn’t appear in the head nor any unresolved subgoal: ab 24 35 b 4 5

24 21.5.3 The Chain Algorithm Example (con’t)  Now we resolve T bf (b,c):  Join this relation with X and project onto the c attribute to get the relation for the head.  Solution is {(6), (7), (8)}. bc 46 57 58

25 21.5.4 Incorporating Union Views at the Mediator  This implementation of the Chain Algorithm does not consider that several sources can contribute tuples to a relation.  If specific sources have tuples to contribute that other sources may not have, it adds complexity.  To resolve this, we can consult all sources, or make best efforts to return all the answers.

26 21.5.4 Incorporating Union Views at the Mediator (con’t)  Consulting All Sources  We can only resolve a subgoal when each source for its relation has an adornment matched by the current adornment of the subgoal.  Less practical because it makes queries harder to answer and impossible if any source is down.  Best Efforts  We need only 1 source with a matching adornment to resolve a subgoal.  Need to modify chain algorithm to revisit each subgoal when that subgoal has new bound requirements.

27 Questions


Download ppt "SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION."

Similar presentations


Ads by Google