Presentation is loading. Please wait.

Presentation is loading. Please wait.

The role of a Mediator in R-GMA Manfred Oevers IBM Andrew Cooke Heriot Watt Laurence Field RAL Steve Fisher RAL James Magowan IBM Werner Nutt Heriot Watt.

Similar presentations


Presentation on theme: "The role of a Mediator in R-GMA Manfred Oevers IBM Andrew Cooke Heriot Watt Laurence Field RAL Steve Fisher RAL James Magowan IBM Werner Nutt Heriot Watt."— Presentation transcript:

1 The role of a Mediator in R-GMA Manfred Oevers IBM Andrew Cooke Heriot Watt Laurence Field RAL Steve Fisher RAL James Magowan IBM Werner Nutt Heriot Watt Howard Williams Heriot Watt

2 Schema & Contributions CPULoad (Global Schema) CountrySiteFacilityLoadTimestamp UKRALCDF0.319055711022002 UKRALATLAS1.619055611022002 UKGLACDF0.419055811022002 UKGLAALICE0.519055611022002 CHCERNALICE0.919055611022002 CHCERNCDF0.619055511022002 CPULoad (Producer3) CHCERNATLAS1.619055611022002 CHCERNCDF0.619055511022002 CPULoad (Producer 1) UKRALCDF0.319055711022002 UKRALATLAS1.619055611022002 CPULoad (Producer 2) UKGLACDF0.419055811022002 UKGLAALICE0.519055611022002

3 Contributions are Views CPULoad (Producer 1) UKRALCDF0.319055711022002 UKRALATLAS1.619055611022002 CPULoad (Producer 2) UKGLACDF0.419055811022002 UKGLAALICE0.519055611022002 SELECT * FROM CPULoad WHERE Country = ‘UK’ AND Site = ‘RAL’ SELECT * FROM CPULoad WHERE Country = ‘UK’ AND Site = ‘GLA’

4 The Scenario G a relational schema (for a virtual database) q queries posed against G p producers, associated with views on G Currently views have the form: SELECT* FROMr WHERE The Mediator : how to match q with the p ’s

5 A Concise Notation CREATE TABLE cpuLoad(Loc,M,L) SELECT Loc,M FROM cpuLoad WHERE Loc=‘RAL’ and L >= 70 (Loc,M) | cpuLoad(RAL,M,L) & L >=80

6 Satisfiability The Problem: “For all locations give me all machines with a cpu load L >= 70” q: (Loc, M)| cpuLoad(Loc, M, L) & L >= 70 p1: (ral, M, L)| cpuLoad(ral, M, L) & L >= 80 p2: (hw, M, L)| cpuLoad(hw, M, L) & L >= 50 p3: (gla, M, L)| cpuLoad(gla, M, L) & L <= 20 The Query Plan: (Loc, M) | p1(Loc, M, L) U (Loc, M)| p2(Loc, M, L) & L >= 70

7 Implementation: What are suitable sources? This involves checking satisfiability of constraints - a task for the Registry? Who computes “load L >= 70” ? –The Mediator? Or the Producer? –What are the capabilities of a Producer? –Which are relevant? – Where are these recorded? Satisfiability (issues)

8 Completeness The Problem: “Find all machines that are not in USA and have diskspace S >= 100” q: M | DiskSpace(M, S) & S > 100 & NOT InUSA(M) p1: (M, S) | DiskSpace(M, S) p2: M | InUSA(M) The Query Plan: M | p1(M, S) & S > 100 & NOT InUSA(M)

9 Implementation: What if p1 doesn’t know about all machines? We might not get all answers for our query (“incompleteness”) What if p2 doesn’t know about all US machines? –We might get answers that don’t satisfy our query (“incorrect” answers). –What is the yardstick for completeness? Completeness (issues)

10 Projection Views (1) Popular queries stored by an Archiver ar may involve projection, e.g. “all machines with disk space S >= 50” ar: M | DiskSpace(M, S) & S >= 50 The Problem: “get all machines with S >= 30” q: M | DiskSpace(M, S) & S >= 30 Can we compute answers for q, even though no diskspace values are stored?

11 Query Plan: In all possible instances of this database, machines stored in ar have diskspace S >= 50 Thus, ar provides certain answers to query q What if the values 50/ 30 are swapped? Projection Views (2)

12 Projection Views (3) “all machines with disk space S >= 30” ar: M | DiskSpace(M, S) & S >= 30 The Problem: “get all machines with S >= 50” q: M | DiskSpace(M, S) & S >= 50 In some instances, all machines in ar will be correct answers to q … in others, not. Thus, ar would not provide certain answers.

13 Computing certain answers can be costly (1) ral007ibm747 hw666 gla999 Diskspace = 24Diskspace = 90 Link(x,y) Diskspace = ? Diskspace = 10 The Problem: q: M | Link(X,Y) & DiskSpace(Y, S1) & S1 >= 50 & Link(Y,Z) & DiskSpace(Z, S2) & S2 < 50 Is ral007 a certain answer?

14 The Problem: “Find all machines that are linked to another with a diskspace >= 50, which is in turn linked to one with a diskspace < 50.” q: X | Link(X,Y) & DiskSpace(Y, S1) & S1 >= 50 & Link(Y,Z) & DiskSpace(Z, S2) & S2 < 50 Is ral007 a certain answer? The Answer: It is! But we have to reason about all cases... Computing certain answers can be costly (2)

15 Early Conclusions (1) First Problem: Semantics What are the answers we expect from our queries? Certain answers? A subset of these? So far we have not looked at time, which will raise further questions. We need to clarify what producer views mean? (Completeness? To what degree?) Semantics are not too difficult when there are no projection views (or aggregation). Query planning techniques exist for special cases, e.g. select/project/join views and queries without comparisons (, …).

16 Early Conclusions (2) The Mediator needs Helpers Who decides which sources are relevant for a query? –The Registry? –The Mediator? (but higher network load). Can Producers do: –selections? –joins? (several producers may be attached to one DBMS)

17 Early Conclusions (3) What will the Mediator do? Construct a set of logical plans = query over some producers Identify logical plans that are feasible (e.g. input bindings: “no phone no. without a name”) Construct an execution plan –which concrete operations, when (e.g. selection, sort-merge join... –joining becomes complex! Choose the best/ cheapest plan Execute the plan


Download ppt "The role of a Mediator in R-GMA Manfred Oevers IBM Andrew Cooke Heriot Watt Laurence Field RAL Steve Fisher RAL James Magowan IBM Werner Nutt Heriot Watt."

Similar presentations


Ads by Google