Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.

Similar presentations


Presentation on theme: "Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University."— Presentation transcript:

1 Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University

2 Distributed Information Search COmponent (DISCO) The distributed mediator architecture of DISCO Query processing semantics Data models The interface to underlying data sources

3 Introduction Access to large number of data sources of heterogeneous distributed databases introduces new problems:  End users and application programmers Unavailable data sources To answer a query involving n databases, all n databases must be available, otherwise either no answer is returned, or some partial answer is returned The availability of answers in the system declines as the number of databases rises.

4 Introduction (Cont.) Access to large number of data sources of heterogeneous distributed databases introduces new problems:  Database Administrators (DBA) Incorporating new sources into the model Schemas must be changed Catalogs must be updated New definitions must be added

5 Introduction (Cont.) Access to large number of data sources of heterogeneous distributed databases introduces new problems:  Database Implementors (DBI) Translation of queries between query languages and schemas New codes must be written

6 DISCO Architecture A : Application M : Mediator C : Catalog W : Wrapper D : Data Source Arcs represent exchange of queries and answers

7 Applications (A) Written by application programmers Access a uniform representation of the underlying sources through a uniform query language

8 Mediators (M) Permit collection of databases to be accessed in a uniform way Accept queries and transform them into sub-queries Keep state of summary information about its associated databases

9 Catalogs (C) Special mediators Keep track of collection of databases, wrappers, and mediators Overview of the entire system

10 Wrappers (W) Deal with the heterogeneous nature of databases Transform sub-queries Maps from the general query language, used by mediators, to the source query language Reform answer (data) appropriate to each mediator

11 Features of DISCO For application Programmers  Provides a new semantic for query processing to ease dealing with unavailable data sources

12 Features of DISCO For DBA  Models data sources as objects which permits powerful modeling capability  Supports type transformations to ease the incorporation of new data sources into a mediator

13 Features of DISCO For DBI  Provides flexible wrapper interface to ease the construction of wrappers

14 Data Model person (type) person (extent) person0person1person2 r0 Mary 200 r2 Select x.name From x in person Where x.salary > 10 The answer is: bag (“Mary”, “Sam”) of Bag type. (Programmer viewpoint) The same query would access the third data source as well. (DBA viewpoint) The model supports dissimilar structures r1 Sam 150

15 Wrapper Interface DISCO provides a flexible wrapper interface for DBI.  The interface to wrappers is at the level of an abstract algebraic machine (AM) of logical operators. DBI implements the logical operators and a call in the wrapper interface which returns the grammar. During the query processing, mediator generates a logical expression. Mediator call interface to get the grammar and checks the logical expression matches the grammar Mediator Wrapper Interface (Algebraic Machine)

16 Mediator Data Model Extensions to the ODMG standard ODMG (Object Data Management Group)  Object Data Model  Object Definition Language (ODL)  Object Query language (OQL)  Language binding

17 Mediator Data Model Extensions to the ODMG standard  Object Data Model interface  defines a type signature for an object extent  automatically maintain the collection of objects of the interface, i.e. an extent is a name variable whose value is the collection of all objects of the associated interface. When objects are created or destroyed, the extent is updated automatically.

18 Mediator Data Model Extensions to the ODMG standard  Object Definition Language (ODL) wrapper  models wrappers repository  the address of a database or some other type of repository, contain several data sources. Each data source in a repository is associated with an extent

19 Mediator Data Model Extensions to the ODMG standard  Define access to a data source 1.Create an instance of the repository type: r0 := Repository (host = “rodin.inria.fr”, name = “db”, address = “123.45.6.7”) 2.Locate the wrapper (written by a database implementor): w0 := WrapperPostgres ( ); 3. Define the interface (type) in the mediator which corresponds to the data source object, e.g. Person type corresponds to the objects in data sources r0 and r1: interface Person { attribute String name; attribute Short salary; } 4. Specify the extent of this mediator type which access the r0 utilizing the w0 wrapper. extent person0 of Person wrapper w0 repository r0; Each DISCO extent represents a collection of data in one data source

20 Mediator Data Model Extensions to the ODMG standard  Data access from the data source The query select x.name from x in person0 where x.salary > 10 returns the answer Bag(“Mary”) Addition of a new extent of Person type: extent person1 of Person wrapper w0 repository r1; To access objects in both data sources the query: select x.name from x in union (person0, person1) where x.salary > 10 returns the answer Bag(“Mary”, “Sam”) Advantage: refer to the extents explicitly Disadvantage: difficult to express queries, when the extents are not explicitly specified

21 Mediator Data Model Extensions to the ODMG standard Solution: MetaExtent keeps details the extents of all the mediator types. General format of MetaExtent type that is created automatically: interface MetaExtent (extent metaextent) { attribute String name; attribute Extent e; attribute Type interface; attribute Wrapper wrapper; attribute Repository repository; attribute Map map; } Query definition expression of the extent person: interface Person (extent person) { attribute String name; attribute Short salary; } Thus, the query dynamically accesses all the extents defined for the type Person define person as Flatten( select x.e from x in metaextent where x.interface = Person)

22 Mediator Data Model Matching similar and dissimilar structures or substructures DBA defines the aggregation of data from data sources  access to multiple data sources: Matching similar substructures  subtype Matching similar structures  map Matching dissimilar structures  view

23 Mediator Data Model Matching similar substructures Subtyping  ODMG standard Example: The Student interface as a subtype of Person and two extents are defined by DBA as follows: interface Student: Person { } extent student0 of Student wrapper w0 repository r2 extent student1 of Student wrapper w0 repository r3 The person extent still contains person0, and person1. It does not automatically reference the extents of its subtypes, in the subtype hierarchy. DISCO Solution: special syntax  person*

24 Mediator Data Model Matching similar structures Mapping Example: interface PersonPrime { attribute String n; attribute Short s; } extent personprime0 of PersonPrime wrapper w0 repository r0; Since objects returned from r0 are of type Person, the extent personprime0 has a type conflict with objects returned. To avoid a run-time error DISCO allows the DBA to resolve this type conflict.

25 Mediator Data Model Matching similar structures Mapping example (Cont.): The type conflict is resolved by specifying a mapping between a mediator type and a data source type. The mapping function is called the local transformation map. extent personprime0 of PersonPrime wrapper w0 repository r0 map ((person0=personprime0), (name = n), (salary = s)); extent personprime0 of PersonPrime wrapper w0 repository r0;

26 Mediator Data Model Matching dissimilar structures View in DISCO Example: interface PersonTwo { attribute String name; attribute Short regular; attribute Short consult; } extent persontwo0 of PersonTwo wrapper w0 repository r5; View definition to aggregate over the data sources: define personnew as bag (select struct (name : x.name, salary : x.salary) from x in person, select struct (name : x.name, salary : x.regular + x. consult) from x in persontwo0) A view can reference other views but are not updatable

27 Mediator Query Processing

28 Query Processing With Unavailable Data There are three possibilities if a data source does not respond: 1- System waits 2- System assumes the unavailable source do not exist or the source is considered to have no matching tuples 3- System returns a partial answer DISCO uses partial evaluation semantics to queries, by processing as much of the query as possible, from the information that is available. Thus, the answer to a query may be another query.

29 Assume r0 does not respond: select x.name from x in person where x.salary > 10 Query Processing With Unavailable Data (Cont.)

30 Assume r0 does not respond: select x.name from x in person where x.salary > 10 union (select y.name from y in person0 where y.salary > 10, Bag(“sam”))

31 Query Processing With Unavailable Data (Cont.) Assume r0 does not respond: select x.name from x in person where x.salary > 10 union (select y.name from y in person0 where y.salary > 10, Bag(“sam”)) partial answer (query) partial answer (data)

32 Conclusion The design of DISCO provides some solutions to some of the problems encountered by the scaling the number of data sources in heterogeneous distributed databases.  Partial evaluation query semantics  AP  Data modeling tools  DBA  Flexible wrapper interface  DBI


Download ppt "Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University."

Similar presentations


Ads by Google