Presentation is loading. Please wait.

Presentation is loading. Please wait.

2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.

Similar presentations


Presentation on theme: "2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts."— Presentation transcript:

1 2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts

2 2005Integration-intro2 Main components of a DI system (I) Mediator מתווך Supports in its user interface : The global data model The integrated / global / mediated schema / world view A query language Manages the interaction with sources Posing queries Receiving answers, transforming and showing them Is responsible for query execution strategies planning carrying out

3 2005Integration-intro3 (II) Wrapper עוטף Serves as the interface to a source Receive queries from a mediator Plan and execute how to retrieve the data from its source Transform data to global data model Send to mediator For an SQL source, these are rather easy For a restricted capability source, may require A series of queries on the source, or A program to be executed (on a non-db source) Filtering results obtained from the source

4 2005Integration-intro4 A simple architecture: Arrows represent query and data flow source wrapper mediator

5 2005Integration-intro5 A more complex architecture: Mediators can serves as wrapped sources for other mediators source wrapper mediator source wrapper

6 2005Integration-intro6 Important: The global database is virtual – contain no data The data reside in the sources The users pose queries as if the data resides in the global db Users may/may not be aware that the data actually comes from the sources

7 2005Integration-intro7 Main tasks & activities: At mediator: Query reformulation & decomposition –express queries in terms of the sources’ schemas –decompose into queries on sources Planning query execution, including optimization –a declarative query may be executed in various ways (even in a single centralized db) –different sources may provide same data at different costs (money, communication time, response time, delays, …) –If data is associated with user priorities, we may want to retrieve some answers before others When answers arrive – fuse them – a full answer is not a simple union of partial answers; data on an entity must be combined (fused) into a single record

8 2005Integration-intro8 Requirements: (from mediator, wrapper, system) Ability to handle Incomplete information (data may be missing from available sources) Heterogeneity – in data model, schema, contents Both data and meta-data ability to describe sources: Capabilities –what queries can a source answer –What mechanisms does it offer for data retrieval Coverage – allows to know –Where can data relevant to a query be found –Is there overlap between sources?

9 2005Integration-intro9  The relationship between source and global data The global data is virtual  the mediated schema describes data that resides in the sources is described by source schemas The relationship between the mediated and source data determines how queries are answered Two main approaches (a combination of the two – later)

10 2005Integration-intro10 Global as View – GAV The global db is defined as a view on the sources In relational model: Each global relation defined as a view, by a query on sources Obvious advantage: simplicity of query answering Given Q on global relations, expand it: replace each atom R(x) by an expression on sources, using the definition of R Then send appropriate sub-queries to sources

11 2005Integration-intro11 Simple example: a university database Source A: Dept(D, C) – departments and their courses Teaches(C,T) – teachers of courses Source B: Enroll(S, C) – student enrollment to courses Integrated schema & its definition: Stud(S, D, T) :- Dept(D, C), Teach(C, T), Enroll(S, C) Query Q: Stud(S, ‘CS’, ‘Beeri’) Expand body to Dept(‘CS’, C), Teach(C, ‘Beeri’), Enroll(S, C) Then use one of (at least) two execution strategies on sources A,B

12 2005Integration-intro12 Local as View – LAV The global database is viewed as the “real world” Each source is defined as a view on it Example (revisited): Global schema: Univ(D, C, T, S) Source A: Dept(D, C) :- Univ(D, C, T, S) Teaches(C, T) :- Univ(D, C, T, S) Source B: Enroll(S, C) :- Univ(D, C, T, S)

13 2005Integration-intro13 Possible assumptions on sources: A source contains all data in its defining view A source contains some of the data in its view, usually not all 2 nd is more realistic Example: Global database describes cars for sale A source may contain : only some of the attributes of cars present in the global schema (e.g., it may not contain history, or owner-contact) Only some of the cars for sale full view / contained view Obviously, the more sources we have, the more cars

14 2005Integration-intro14 Query answering in LAV: Expansion is not possible An approach: answering queries using views Practically: rewriting queries using views (differences explored later) Only the views have data  rewrite query to an expression over the views expression must be (explained in more detail later) Full views: equivalent to query Contained views: contained in query  Solution may/may not exist (contrast to expansion)  Finding it is more difficult This problem was explored in many contexts, e.g.: Query optimization using views/previous answers

15 2005Integration-intro15 Why prefer LAV to GAV? Ease of expanding a system: –In GAV, adding a source may require re-definition of global schema – makes it difficult to add sources –In LAV, just define the new source as a view given an algorithm for using views to answer queries, it automatically uses the new source As for expanding queries vs. using views: Even in GAV, when sources have restricted capabilities, query answering requires using views

16 2005Integration-intro16 Typically, a global schema reflects a real ‘world’, as we know it; each source materializes only a fragment –Horizontal – not all entity types or attributes are present –Vertical – not all entities of a type are present Thus, it is natural to define the sources as (contained) views Examples: Cars for sale: –global db reflects our understanding and requirements –A source provides only some info, only on the cars it has Looking for personal information using UNIX facilities –we know about: name, office, phone, email, … –Each facility may offer only some of the above

17 2005Integration-intro17 LAV is a natural approach in the presence of www and its diversity & dynamicity of source Legacy systems Most research efforts & systems are LAV

18 2005Integration-intro18 On rewriting queries using views: It is not clear (now) how to obtain a rewriting, given Q But, given v1(..), v2(..), …, vn(..) as a candidate, we may expand each vi using its definition in terms of the global schema Check whether the resulting expression is equivalent to or contained in Q (both Q and the expansion are in terms of global schema relations)  Equivalence and containment of queries are fundamental problems for data integration

19 2005Integration-intro19 Example (our LAV example): Q: ans(S,’CS’,’Beeri’) :- Univ(’CS, C, ’Beeri’,S) Guess an answer in terms of views: ans`(S, ’CS’, ‘Beeri’ ) :- Dept(‘CS’,C), Teach(C, ‘Beeri’), Enroll(S,C) (Note: must use distinct variables in different expansions for all non-join variables) Is the query equivalent to this expansion? Is the expansion contained in the query? Univ(‘CS’,C, T1, S1)Univ(D2,C, ‘Beeri’, S2)Univ(D3,C, T3, S)


Download ppt "2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts."

Similar presentations


Ads by Google