SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.

Slides:



Advertisements
Similar presentations
By Snigdha Rao Parvatneni
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
CS CS4432: Database Systems II Logical Plan Rewriting.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
Wrappers in Mediator-Based Systems Chapter 21.3 Information Integration Presented By Annie Hii Toderici.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
Capability-Based Optimization in Mediators Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Chapter 21 Information Integration 21.3 Wrappers in Mediator-Based Systems Presented by: Kai Zhu Professor: Dr. T.Y. Lin Class ID: 220.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Database Systems More SQL Database Design -- More SQL1.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
CSCD343- Introduction to databases- A. Vaisman1 Relational Algebra.
Relational Algebra, R. Ramakrishnan and J. Gehrke (with additions by Ch. Eick) 1 Relational Algebra.
Objectives of the Lecture :
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Chapter 21.2 Modes of Information Integration ID: 219 Name: Qun Yu Class: CS Spring 2009 Instructor: Dr. T.Y.Lin.
Database Management 9. course. Execution of queries.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006.
The Relational Model: Relational Calculus
20.5 Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Submitted by: Deepti Kundu Submitted to: Dr.T.Y.Lin
1 Relational Algebra and Calculas Chapter 4, Part A.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
Relational Algebra.
INFORMATION INTEGRATION Shengyu Li CS-257 ID-211.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
OR Simplex method (algebraic interpretation) Add slack variables( 여유변수 ) to each constraint to convert them to equations. (We may refer it as.
CSCE Database Systems Chapter 15: Query Execution 1.
Relational Algebra p BIT DBMS II.
Section 20.1 Modes of Information Integration Anilkumar Panicker CS257: Database Systems ID: 118.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Information Integration(cntd.)
Database Management System
Foreign Keys Local and Global Constraints Triggers
Presented by: Kai Zhu Professor: Dr. T.Y. Lin Class ID: 220
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Chapter 12 Query Processing (1)
Local-as-View Mediators
Chapter 8 Views and Indexes
Presentation transcript:

SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION

Presentation Outline  21.4 Capability Based Optimization  The Problem of Limited Source Capabilities  A notation for Describing Source Capabilities  Capability-Based Query-Plan Selection  Adding Cost-Based Optimization  21.5 Optimizing Mediator Queries  Simplified Adornment Notation  Obtaining Answers for Subgoals  The Chain Algorithm  Incorporating Union Views at the Mediator

21.4 Capability Based Optimization  Introduction  Typical DBMS estimates the cost of each query plan and picks what it believes to be the best  Mediator – has knowledge of how long its sources will take to answer  Optimization of mediator queries cannot rely on cost measure alone to select a query plan  Optimization by mediator follows capability based optimization

The Problem of Limited Source Capabilities  Many sources have only Web Based interfaces  Web sources usually allow querying through a query form  E.g. Amazon.com interface allows us to query about books in many different ways.  But we cannot ask questions that are too general  E.g. Select * from books;

The Problem of Limited Source Capabilities (con’t)  Reasons why a source may limit the ways in which queries can be asked  Earliest database did not use relational DBMS that supports SQL queries  Indexes on large database may make certain queries feasible, while others are too expensive to execute  Security reasons E.g. Medical database may answer queries about averages, but won’t disclose details of a particular patient's information

A Notation for Describing Source Capabilities  For relational data, the legal forms of queries are described by adornments  Adornments – Sequences of codes that represent the requirements for the attributes of the relation, in their standard order  f(free) – attribute can be specified or not  b(bound) – must specify a value for an attribute but any value is allowed  u(unspecified) – not permitted to specify a value for a attribute

A notation for Describing Source Capabilities….(cont’d)  c[S](choice from set S) means that a value must be specified and value must be from finite set S.  o[S](optional from set S) means either do not specify a value or we specify a value from finite set S  A prime (f’) specifies that an attribute is not a part of the output of the query  A capabilities specification is a set of adornments  A query must match one of the adornments in its capabilities specification

A notation for Describing Source Capabilities….(cont’d)  E.g. Dealer 1 is a source of data in the form: Cars (serialNo, model, color, autoTrans, navi) The adornment for this query form is b’uuuu

Capability-Based Query-Plan Selection  Given a query at the mediator, a capability based query optimizer first considers what queries it can ask at the sources to help answer the query  The process is repeated until:  Enough queries are asked at the sources to resolve all the conditions of the mediator query and therefore query is answered. Such a plan is called feasible.  We can construct no more valid forms of source queries, yet still cannot answer the mediator query. It has been an impossible query.

Capability-Based Query-Plan Selection (cont’d)  The simplest form of mediator query where we need to apply the above strategy is join relations  E.g we have sources for dealer 2  Autos(serial, model, color)  Options(serial, option) Suppose that ubf is the sole adornment for Auto and Options have two adornments, bu and uc[autoTrans, navi] Query is – find the serial numbers and colors of Gobi models with a navigation system

Adding Cost-Based Optimization  Mediator’s Query optimizer is not done when the capabilities of the sources are examined  Having found feasible plans, it must choose among them  Making an intelligent, cost based query optimization requires that the mediator knows a great deal about the costs of queries involved  Sources are independent of the mediator, so it is difficult to estimate the cost

21.5 Optimizing Mediator Queries  Chain algorithm – a greed algorithm that finds a way to answer the query by sending a sequence of requests to its sources.  Will always find a solution assuming at least one solution exists.  The solution may not be optimal.

Simplified Adornment Notation  A query at the mediator is limited to b (bound) and f (free) adornments.  We use the following convention for describing adornments:  name adornments (attributes)  where: name is the name of the relation the number of adornments = the number of attributes

Obtaining Answers for Subgoals  Rules for subgoals and sources:  Suppose we have the following subgoal: R x 1 x 2 …x n (a 1, a 2, …, a n ), and source adornments for R are: y 1 y 2 …y n. If y i is b or c[S], then x i = b. If x i = f, then y i is not output restricted.  The adornment on the subgoal matches the adornment at the source: If y i is f, u, or o[S] and x i is either b or f.

The Chain Algorithm  Maintains 2 types of information:  An adornment for each subgoal.  A relation X that is the join of the relations for all the subgoals that have been resolved.  Initially, the adornment for a subgoal is b iff the mediator query provides a constant binding for the corresponding argument of that subgoal.  Initially, X is a relation over no attributes, containing just an empty tuple.

The Chain Algorithm (con’t)  First, initialize adornments of subgoals and X.  Then, repeatedly select a subgoal that can be resolved. Let R α (a 1, a 2, …, a n ) be the subgoal: 1. Wherever α has a b, we shall find the argument in R is a constant, or a variable in the schema of R.  Project X onto its variables that appear in R.

The Chain Algorithm (con’t) 2. For each tuple t in the project of X, issue a query to the source as follows ( β is a source adornment).  If a component of β is b, then the corresponding component of α is b, and we can use the corresponding component of t for source query.  If a component of β is c[S], and the corresponding component of t is in S, then the corresponding component of α is b, and we can use the corresponding component of t for the source query.  If a component of β is f, and the corresponding component of α is b, provide a constant value for source query.

The Chain Algorithm (con’t)  If a component of β is u, then provide no binding for this component in the source query.  If a component of β is o[S], and the corresponding component of α is f, then treat it as if it was a f.  If a component of β is o[S], and the corresponding component of α is b, then treat it as if it was c[S]. 3. Every variable among a 1, a 2, …, a n is now bound. For each remaining unresolved subgoal, change its adornment so any position holding one of these variables is b.

The Chain Algorithm (con’t) 4. Replace X with X π s(R), where S is all of the variables among: a 1, a 2, …, a n. 5. Project out of X all components that correspond to variables that do not appear in the head or in any unresolved subgoal.  If every subgoal is resolved, then X is the answer.  If every subgoal is not resolved, then the algorithm fails. α

The Chain Algorithm Example  Mediator query:  Q: Answer(c) ← R bf (1,a) AND S ff (a,b) AND T ff (b,c)  Example: Relation R S T Data Adornment bfc’[2,3,5]f bu wx xy yz

The Chain Algorithm Example (con’t)  Initially, the adornments on the subgoals are the same as Q, and X contains an empty tuple.  S and T cannot be resolved because they each have ff adornments, but the sources have either a b or c.  R(1,a) can be resolved because its adornments are matched by the source’s adornments.  Send R(w,x) with w=1 to get the tables on the previous page.

The Chain Algorithm Example (con’t)  Project the subgoal’s relation onto its second component, since only the second component of R(1,a) is a variable.  This is joined with X, resulting in X equaling this relation.  Change adornment on S from ff to bf. a 2 3 4

The Chain Algorithm Example (con’t)  Now we resolve S bf (a,b):  Project X onto a, resulting in X.  Now, search S for tuples with attribute a equivalent to attribute a in X.  Join this relation with X, and remove a because it doesn’t appear in the head nor any unresolved subgoal: ab b 4 5

The Chain Algorithm Example (con’t)  Now we resolve T bf (b,c):  Join this relation with X and project onto the c attribute to get the relation for the head.  Solution is {(6), (7), (8)}. bc

Incorporating Union Views at the Mediator  This implementation of the Chain Algorithm does not consider that several sources can contribute tuples to a relation.  If specific sources have tuples to contribute that other sources may not have, it adds complexity.  To resolve this, we can consult all sources, or make best efforts to return all the answers.

Incorporating Union Views at the Mediator (con’t)  Consulting All Sources  We can only resolve a subgoal when each source for its relation has an adornment matched by the current adornment of the subgoal.  Less practical because it makes queries harder to answer and impossible if any source is down.  Best Efforts  We need only 1 source with a matching adornment to resolve a subgoal.  Need to modify chain algorithm to revisit each subgoal when that subgoal has new bound requirements.

Questions