INFORMATION INTEGRATION Shengyu Li CS-257 ID-211.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

1 Relational Algebra* and Tuple Calculus * The slides in this lecture are adapted from slides used in Standford's CS145 course.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. 2 Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
Relational Algebra Dashiell Fryer. What is Relational Algebra? Relational algebra is a procedural query language. Relational algebra is a procedural query.
Wrappers in Mediator-Based Systems Chapter 21.3 Information Integration Presented By Annie Hii Toderici.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
ICDT'2001, London, UK1 On Answering Queries in the Presence of Limited Access Patterns Chen Li Stanford University joint work with Edward Chang, UC Santa.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Functional Dependencies - Example
Efficient Query Evaluation on Probabilistic Databases
Domain Relational Calculus and Query-by-Example CS157a John Eagle.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
1.2 Row Reduction and Echelon Forms
Linear Equations in Linear Algebra
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Capability-Based Optimization in Mediators Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
CS 4432query processing1 CS4432: Database Systems II.
Chapter 21 Information Integration 21.3 Wrappers in Mediator-Based Systems Presented by: Kai Zhu Professor: Dr. T.Y. Lin Class ID: 220.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.
CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id: Cs257_107_ch13_13.7.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Access Tutorial 3 Maintaining and Querying a Database
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Query Optimization CS 157B Ch. 14 Mien Siao. Outline Introduction Steps in Cost-based query optimization- Query Flow Projection Example Query Interaction.
Objectives of the Lecture :
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model Pearson Education © 2014.
Databases 1 First lecture. Informations Lecture: Monday 12:15-13:45 (3.716) Practice: Thursday 10:15-11:45 (2-519) Website of the course:
PHP meets MySQL.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006.
Page 1 Topic 4 Relational Databases CPS510 Database Systems Abdolreza Abhari School of Computer Science Ryerson University.
Submitted by: Deepti Kundu Submitted to: Dr.T.Y.Lin
CP Summer School Modelling for Constraint Programming Barbara Smith 2. Implied Constraints, Optimization, Dominance Rules.
Relational Algebra (Chapter 7)
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
CS 4432query processing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
Relational Algebra.
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
Operations in the Relational Model COP 4720 Lecture 8 Lecture Notes.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
The relational model A data model (in general) : Integrated collection of concepts for describing data (data requirements). Relational model was introduced.
CPSC 603 Database Systems Lecturer: Laurie Webster II, M.S.S.E., M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 4 Introduction to a First Course in Database Systems.
Relational Algebra p BIT DBMS II.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
1 1.2 Linear Equations in Linear Algebra Row Reduction and Echelon Forms © 2016 Pearson Education, Ltd.
Database (Microsoft Access). Database A database is an organized collection of related data about a specific topic or purpose. Examples of databases include:
1. Chapter 2: The relational Database Modeling Section 2.4: An algebraic Query Language Chapter 5: Algebraic and logical Query Languages Section 5.1:
Logical Database Design and the Rational Model
Information Integration(cntd.)
CS4432: Database Systems II
Chapter 2: Relational Model
Presented by: Kai Zhu Professor: Dr. T.Y. Lin Class ID: 220
Chapter 15 QUERY EXECUTION.
Chapter 3 The Relational Database Model
Linear Equations in Linear Algebra
Basic Operations Algebra of Bags
Database Design: Relational Model
INSTRUCTOR: MRS T.G. ZHOU
Linear Equations in Linear Algebra
Presentation transcript:

INFORMATION INTEGRATION Shengyu Li CS-257 ID-211

Outline Basic Capability-Based Optimization Optimizing Mediator Queries

Basic titleyearlengthtype Gone With the Wind drama Star Wars sciFi Wayne's World199295comedy Attributes: Appear at the tops of the columns. Describe the meaning of entries in the column below. Example: title, year, length, type Movie:

Basic titleyearlengthtype Gone With the Wind drama Star Wars sciFi Wayne's World199295comedy Schemas: The name of the relation + the set of attributes Example: Movies (title, year, length, type) Movie:

Basic titleyearlengthtype Gone With the Wind drama Star Wars sciFi Wayne's World199295comedy Tuples: The rows of a relation, other than the header row containing the attribute names. A tuple has one component for each attribute of relations. Example: Tuple 1 has 4 components: Gone With the Wind, 1939, 231, drama for attributes title, year, length, and type. Movie:

Basic titleyearlengthtype Gone With the Wind drama Star Wars sciFi Wayne's World199295comedy Projection: The projection operator is used to produce from a relation R a new relation that has only some of R’s columns: Example: π Title, year, length (Movies) The resulting relation is: Movie: titleyearlength Gone With the Wind Star Wars Wayne's World199295

Basic ABBCDA R. B S. BCDABCD RS RXS Natural join of R and S

Basic Datalog Rules and Queries: 1.A relational atom called the head, followed by 2.The symbol <-, which we often read “if”, followed by 3.A body consisting one or more atoms, called subgoals, which may be either relational or arithmetic. Subgoals are connected by AND, and any subgoal may optionally be preceded by the logical operatior NOT. Ex: LongMovie(title,year) 100

Capability-Based Optimization Limited Source Capabilities  Web-based Interfaces The top 20 sellers? SELECT * FROM Books

Why?  Legacy Sources Archaic/unique system  Security “tell me about all your books” Medical database  Indexes on large databases Books database infeasible queries Capability-Based Optimization

Source Capabilities Notation – adornments:  Sequences of codes that represent the requirement for attributes of the relation for relational data f (free): can be specified, we choose b (bound): must be specified u (unspecified): is not permitted to specified Capability-Based Optimization

c[S] (choice from set S): a value must be specified and that value must be one of the values in the finite set S. o[S] (optional, from set S): we either do not specify a value, or we specify one of the values in the finite set S. Place a prime on a code to indicate the attribute is not part of the output of the query. Capability-Based Optimization

Capabilities Specification:  A set of adornments  to query the source successfully, the query must match one of the adornments in its capabilities specification  For f (free) or o[S], queries with different sets of attributes may match that adornment. Capability-Based Optimization

Example: Cars (serialNo, model, color, autoTrans, navi) Dealer 1 might allow this data to be queried: 1. The user specifies a serial number. All the information about the car with that serial number is produced as output. 2. The user specifies a model and color, and perhaps whether or not automatic transmission and navigation system. All five attributes are printed for all matching cars. Capability-Based Optimization

Capability-Based Query-Plan Selection  Capability based query optimizer: consider what queries that will help to answer the query. takes binding for some more attributes may make some more queries at the sources possible.  This process will repeat until either: feasible: answer the query Impossible query: no more valid forms Capability-Based Optimization

Capability-Based Query-Plan Selection  The simplest form of mediator query for which we need to apply this strategy is a join of relations, each of which is available, with certain adornments, at one or more sources. If so, then the search strategy is to try to get tuples for each relation in the join, by providing enough argument bindings that some source allows a query about that relation to be asked and answered. Capability-Based Optimization

Capability-Based Query-Plan Selection  Example: Autos (serial, model, color) Options (serial, option)  adornment:  Autos: ubf  Options: two adornments: bu and uc[autoTrans, navi] Query: find the serial numbers and colors of Gobi models with navigation system” Capability-Based Optimization

Adding Cost Based Optimization  Cost-based optimization requires that the mediator has to know about the cost of the queries involved.  Since the sources are usually independent of the mediator, it is difficult to estimate the cost. Capability-Based Optimization

Optimizing Mediator Queries  Chain algorithm greedy algorithm sends a sequence of requests to its sources always finds a way to answer the query provides at least one solution exists  The class of queries that can be handled involve joins of relations that come from the sources followed by an optional selection can be expressed as Datalog rules  To describe a relational algebra Optimizing Mediator Queries

Simplified Adornment Notation  b (bound) and f (free) adornments  Use c[S] adornment as soon as we know all possible values of interest for that attribute  Free for o[S], u Optimizing Mediator Queries

Example: Autos buu (serial, model, color) Options uc[autoTrans, navi] (serial, option) - “find the serial numbers and colors of Gobi models with a navigation system” - Answer (s, c) <- Autos fbf (s, “Gobi”, c) AND Options fb (s, “navi”) Optimizing Mediator Queries

Obtaining Answers for Subgoals  Supposed we have a subgoal: R x1x2…xn (a1,a2,…,an) xi: b or f R: Relation that can be queried at some source y1y2…yn: one of the adornments for R at its source, It is possible to obtain a relation for the subgoal provided, for each i = 1,2,…n, privided: - If yi is b or of the form c[S], then xi = b - If xi = f, then yi is not output restricted (i.e. not primed) We say that the adornment on the subgoal matches the adornment at the source. Optimizing Mediator Queries

Example:  Subgoal: R bbff (p,q,r,s)  Adornments for R at its sources are: α1 = fc[S1]uo[S2] -- set q as member of S1 α2 = c[S3]bfc[S4] -- not match Optimizing Mediator Queries

The Chain Algorithm  Greedy approach to select an order in which we obtain relations for each of the subgoals of a Datalog rule.  Not guaranteed to provide the most efficient solution, but it will provide a solution whenever one exists.  In practice, it is very likely to obtain the most efficient solution. Optimizing Mediator Queries

Chain Algorithm maintain 2 kinds of information:  An adornment is maintained for each subgoal. Initially, the adornment for a subgoal has b if and only if the mediator query provides a constant binding for the corresponding argument of that subgoal, as for instance: - Answer (s, c) <- Autos fbf (s, “Gobi”, c) AND Options fb (s, “navi”) Optimizing Mediator Queries

 Consider the mediator query Q: Answer(c) <- R bf (1,a) AND S ff (a,b) AND T ff (b,c) There are three sources that provide answers to queries about R, S, and T, respectively: Optimizing Mediator Queries RelationRST Datawxxyyz Adornmentbfc'[2,3,5]fbubu

Thank You