The role of a Mediator in R-GMA Manfred Oevers IBM Andrew Cooke Heriot Watt Laurence Field RAL Steve Fisher RAL James Magowan IBM Werner Nutt Heriot Watt.

Slides:



Advertisements
Similar presentations
WP3 Werner Nutt (Heriot-Watt University) R-GMA – Architecture and Query Mediation 24/4/2003.
Advertisements

Steve Fisher/RAL - 12/6/2002R-GMA and WP71 R-GMA Use the GMA from GGF A relational implementation Applied to both information and monitoring Creates impression.
March DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.
Transaction.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
WP3 R-GMA Revisited 23/7/2002 Werner Nutt / Heriot-Watt University.
1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,
1 Relational Model. 2 Relational Database: Definitions  Relational database: a set of relations  Relation: made up of 2 parts: – Instance : a table,
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Designing for Performance Announcement: The 3-rd class test is coming up soon. Open book. It will cover the chapter on Design Theory of Relational Databases.
Rutgers University Relational Algebra 198:541 Rutgers University.
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
Relational Algebra, R. Ramakrishnan and J. Gehrke (with additions by Ch. Eick) 1 Relational Algebra.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Republishers in a Publish/Subscribe Architecture for Data Streams Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences, Heriot-Watt.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
CS 346 – Chapter 8 Main memory –Addressing –Swapping –Allocation and fragmentation –Paging –Segmentation Commitment –Please finish chapter 8.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
Republishing Mechanisms for R-GMA Benefits and Approaches. Talk by: Alasdair Gray Collaborators: Andy Cooke, Lisha Ma, and Werner Nutt Heriot-Watt University.
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
DataGrid is a project funded by the European Union CHEP March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3)
DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Objectives Functionalities and services Architecture and software technologies Potential Applications –Link to research problems.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
INFSO-RI Enabling Grids for E-sciencE R-GMA Server Installation Tony Calanducci INFN Catania - Italy First Latin American Workshop.
GLite Information System(s) Antonio Juan Rubio Montero CIEMAT 10 th EELA Tutorial. Madrid, May 7 th -11 th,2007.
INFORMATION MANAGEMENT Unit 2 SO 4 Explain the advantages of using a database approach compared to using traditional file processing; Advantages including.
G53SEC 1 Reference Monitors Enforcement of Access Control.
1 Relational Algebra and Calculas Chapter 4, Part A.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
An information and monitoring system for static and dynamic information about grid resources, applications, networks … RDBMS Servlet aware of API during.
E-infrastructure shared between Europe and Latin America 12th EELA Tutorial for Users and System Administrators gLite Information System.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
WP3 RGMA Deployment Laurence Field / RAL Steve Fisher / RAL.
WP3 R-GMA: A Relational Grid information and monitoring system Steve Fisher / RAL 13/12/2002.
Databases.  A database is simply a collection of information stored in an orderly manner.  A database can be as simple as a birthday book, address book.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
Website: Answering Continuous Queries Using Views Over Data Streams Alasdair J G Gray Werner.
LCG Accounting John Gordon Grid Deployment Board 13 th January 2004.
WP3 Werner Nutt (Heriot-Watt University) R-GMA – DataGrid’s Monitoring System 1/7/2003.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
R-GMA – an Update A reminder of R-GMA The need for a mediator Work with WP7 Release 1.2 and beyond Some Implications of OGSA.
Dr Gordon Russell, Napier University Unit 5.1a - Database Administration - V2.0 1 CO22001 Database Administrator Section 5.1a.
WP3 The status of the EU DataGrid's R-GMA system Steve Fisher / RAL 24/4/2003.
32nd International Conference on Very Large Data Bases September , 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J.
E-infrastructure shared between Europe and Latin America gLite Information System(s) Manuel Rubio del Solar CETA-CIEMAT EELA Tutorial, Mérida,
The impact of R-GMA (upon WP1 and WP4). EDG (Paris) 6 Mar James MagowanImpact of R-GMA Grid Monitoring Architecture (GMA) We use it not only for.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
Supporting Join Queries Talk by: Andy Cooke Collaborators: Alasdair Gray, Lisha Ma, and Werner Nutt Heriot-Watt University.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
The Mediator: What Next? Talk by: Andy Cooke Collaborators: Alasdair Gray, Lisha Ma, and Werner Nutt Heriot-Watt University.
Edexcel OnCourse Databases Unit 9. Edexcel OnCourse Database Structure Presentation Unit 9Slide 2 What is a Database? Databases are everywhere! Student.
SLC/VER1.0/OS CONCEPTS/OCT'99
Relational Algebra Chapter 4, Part A
Chapter 15 QUERY EXECUTION.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Database Systems Instructor Name: Lecture-3.
Chen Li Information and Computer Science
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

The role of a Mediator in R-GMA Manfred Oevers IBM Andrew Cooke Heriot Watt Laurence Field RAL Steve Fisher RAL James Magowan IBM Werner Nutt Heriot Watt Howard Williams Heriot Watt

Schema & Contributions CPULoad (Global Schema) CountrySiteFacilityLoadTimestamp UKRALCDF UKRALATLAS UKGLACDF UKGLAALICE CHCERNALICE CHCERNCDF CPULoad (Producer3) CHCERNATLAS CHCERNCDF CPULoad (Producer 1) UKRALCDF UKRALATLAS CPULoad (Producer 2) UKGLACDF UKGLAALICE

Contributions are Views CPULoad (Producer 1) UKRALCDF UKRALATLAS CPULoad (Producer 2) UKGLACDF UKGLAALICE SELECT * FROM CPULoad WHERE Country = ‘UK’ AND Site = ‘RAL’ SELECT * FROM CPULoad WHERE Country = ‘UK’ AND Site = ‘GLA’

The Scenario G a relational schema (for a virtual database) q queries posed against G p producers, associated with views on G Currently views have the form: SELECT* FROMr WHERE The Mediator : how to match q with the p ’s

A Concise Notation CREATE TABLE cpuLoad(Loc,M,L) SELECT Loc,M FROM cpuLoad WHERE Loc=‘RAL’ and L >= 70 (Loc,M) | cpuLoad(RAL,M,L) & L >=80

Satisfiability The Problem: “For all locations give me all machines with a cpu load L >= 70” q: (Loc, M)| cpuLoad(Loc, M, L) & L >= 70 p1: (ral, M, L)| cpuLoad(ral, M, L) & L >= 80 p2: (hw, M, L)| cpuLoad(hw, M, L) & L >= 50 p3: (gla, M, L)| cpuLoad(gla, M, L) & L <= 20 The Query Plan: (Loc, M) | p1(Loc, M, L) U (Loc, M)| p2(Loc, M, L) & L >= 70

Implementation: What are suitable sources? This involves checking satisfiability of constraints - a task for the Registry? Who computes “load L >= 70” ? –The Mediator? Or the Producer? –What are the capabilities of a Producer? –Which are relevant? – Where are these recorded? Satisfiability (issues)

Completeness The Problem: “Find all machines that are not in USA and have diskspace S >= 100” q: M | DiskSpace(M, S) & S > 100 & NOT InUSA(M) p1: (M, S) | DiskSpace(M, S) p2: M | InUSA(M) The Query Plan: M | p1(M, S) & S > 100 & NOT InUSA(M)

Implementation: What if p1 doesn’t know about all machines? We might not get all answers for our query (“incompleteness”) What if p2 doesn’t know about all US machines? –We might get answers that don’t satisfy our query (“incorrect” answers). –What is the yardstick for completeness? Completeness (issues)

Projection Views (1) Popular queries stored by an Archiver ar may involve projection, e.g. “all machines with disk space S >= 50” ar: M | DiskSpace(M, S) & S >= 50 The Problem: “get all machines with S >= 30” q: M | DiskSpace(M, S) & S >= 30 Can we compute answers for q, even though no diskspace values are stored?

Query Plan: In all possible instances of this database, machines stored in ar have diskspace S >= 50 Thus, ar provides certain answers to query q What if the values 50/ 30 are swapped? Projection Views (2)

Projection Views (3) “all machines with disk space S >= 30” ar: M | DiskSpace(M, S) & S >= 30 The Problem: “get all machines with S >= 50” q: M | DiskSpace(M, S) & S >= 50 In some instances, all machines in ar will be correct answers to q … in others, not. Thus, ar would not provide certain answers.

Computing certain answers can be costly (1) ral007ibm747 hw666 gla999 Diskspace = 24Diskspace = 90 Link(x,y) Diskspace = ? Diskspace = 10 The Problem: q: M | Link(X,Y) & DiskSpace(Y, S1) & S1 >= 50 & Link(Y,Z) & DiskSpace(Z, S2) & S2 < 50 Is ral007 a certain answer?

The Problem: “Find all machines that are linked to another with a diskspace >= 50, which is in turn linked to one with a diskspace < 50.” q: X | Link(X,Y) & DiskSpace(Y, S1) & S1 >= 50 & Link(Y,Z) & DiskSpace(Z, S2) & S2 < 50 Is ral007 a certain answer? The Answer: It is! But we have to reason about all cases... Computing certain answers can be costly (2)

Early Conclusions (1) First Problem: Semantics What are the answers we expect from our queries? Certain answers? A subset of these? So far we have not looked at time, which will raise further questions. We need to clarify what producer views mean? (Completeness? To what degree?) Semantics are not too difficult when there are no projection views (or aggregation). Query planning techniques exist for special cases, e.g. select/project/join views and queries without comparisons (, …).

Early Conclusions (2) The Mediator needs Helpers Who decides which sources are relevant for a query? –The Registry? –The Mediator? (but higher network load). Can Producers do: –selections? –joins? (several producers may be attached to one DBMS)

Early Conclusions (3) What will the Mediator do? Construct a set of logical plans = query over some producers Identify logical plans that are feasible (e.g. input bindings: “no phone no. without a name”) Construct an execution plan –which concrete operations, when (e.g. selection, sort-merge join... –joining becomes complex! Choose the best/ cheapest plan Execute the plan