Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.

Similar presentations


Presentation on theme: "1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2."— Presentation transcript:

1 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2

2 2 Outline Database Integration Wrappers Mediators Schema Integration Book Section

3 3 Database Integration How to build applications using multiple DBs? Ebay DVD orders IMDB amazon Oracle PointBase MySQL IBM DB2 movie DB order movieorder status

4 4 Problem Dimensions Distribution Autonomy Heterogeneity

5 5 How to Deal with Distribution? Problems Solutions

6 6 How to Deal with Autonomy? Problems Solutions

7 7 How to Deal with Heterogeneity? Problems Solutions

8 8 Solution Variants General issues –Bottom-up vs. top-down engineering –Virtual vs. materialized integration –Read-only vs. read-write access –Transparency: language, schema, location What did you do?

9 9 A Generic System Architecture Wrapper-Mediator architecture DB1 DB2DB3 DB4 Oracle PointBase MySQL IBM DB2 wrapper mediator application 1 application 2application 3 mediators integrate the data from the DBs wrappers convert to a common representation

10 10 A Closer Look at Data Models Data model used by sources –relational? HTML? XML? Text? Data model used by integrated DB –canonical data model (e.g. relational, XML) Query models –Structured queries, retrieval queries, data mining (statistics)

11 11 A Generic Wrapper Architecture request/queryresult/data Compensation for missing processing capabilities Transformation of data model Communication interface Source data Metadata integrity constraints

12 12 Wrapper Tasks Data Model consists of –Data types –Integrity constraints –Operations (e.g. query language) Translate among different data models Overcome other "syntactic" heterogeneity Which was the task? How was it implemented?

13 13 Example: Wrapping Relational Data in XML/HTML Data types –trivial Integrity Constraints (e.g. primary keys) –requires XML Schema Operations –none in HTML Where did this play a role?

14 14 Example: Wrapping XML/HTML into Relational Data Types –which difficulties? Integrity Constraints –none in HTML Operations –requires generally XQuery –form fields can be considered as hard-coded queries

15 15 A Closer Look at Schemas Tight vs. loose integration –Is there a global schema? Support for semantic integration –collection, fusion, abstraction

16 16 Schema Architecture for Federated DBMS View 1View 2View 3 Integrated Schema Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema... Relational. DBMS Objectorient. DBMS File System Web Server accepted model for integrated database systems with integrated schema 5-level architecture data independence

17 17 Export Schema provided by data source source DB can change w/o changing export schema which was the export schema? View 1View 2View 3 Integrated Schema Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema... Relational. DBMS Objectorient. DBMS File System Web Server

18 18 Import Schema provided by wrapper export schema can change w/o changing import schema which was the import schema? View 1View 2View 3 Integrated Schema Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema... Relational. DBMS Objectorient. DBMS File System Web Server

19 19 Integrated Schema provided by mediator import schemas can change w/o changing integrated schema which was the integrated schema? View 1View 2View 3 Integrated Schema Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema... Relational. DBMS Objectorient. DBMS File System Web Server

20 20 Application View provided by application integrated DB can change w/o changing application (code) which were application views? View 1View 2View 3 Integrated Schema Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema... Relational. DBMS Objectorient. DBMS File System Web Server

21 21 Mediator Tasks Integrate data with same "real-world meaning", but different representation –integration mapping  schema integration –can be implemented, e.g., as database view Decompose queries against the integrated schema to queries against source DBs –only for virtual integration

22 22 Schema Integration Standard Methodology Schema translation (wrapper) Correspondence investigation Conflict resolution and schema integration

23 23 Identifying Schema Correspondences Sources of information –source schema –source database –source application –database administrator, developer, user Which were your information sources?

24 24 Identifying Schema Correspondences Semantic correspondences –e.g. related names Structural correspondences –reachability by paths Data analysis –distribution of values Can you give examples?

25 25 Conflicts What types of problems did you encounter integrating corresponding data?

26 26 Types of Conflicts Schema level –Naming conflicts –Structural conflicts –Classification conflicts –Constraint and behavioral conflicts Data level –Identification conflicts –Representational conflicts –Data errors

27 27 Conflict Resolution Depends on type of conflict Requires construction of mappings Mappings might be complex, e.g. not expressible as SQL views

28 28 Naming Conflicts Homonyms (give example) –same name used for different concepts –Resolution: introduce prefixes to distinguish the names Synonyms (give example) –different names for the same concepts –Resolution: introduce a mapping to a common name

29 29 Structural Conflicts Different, non-corresponding attributes –Resolution: create a relation with the union of the attributes Different datatypes –Resolution: build a mapping function Different data model constructs –e.g. attribute vs. relation –Resolution: requires higher order mappings

30 30 Classification Conflicts Relations can have different coverage (inclusion, non-empty intersection) –Resolution: build generalization hierarchies Additional problem –Identification of corresponding data instances –"real world" correspondence is application dependent

31 31 Data Correspondences Corresponding data instances –similar to naming conflicts at schema level –Resolution: mapping tables and functions –Similarity functions Corresponding data values, data conflicts –of corresponding data instances –Resolution: mapping tables and functions –Prefer data from more trusted data source

32 32 Constraint and Behavioral Conflicts Cardinality conflicts –different types of cardinality constraints on relationships –Resolution: use the more general constraint Behavioral conflicts for relation update –E.g. cascading delete vs. non-cascading –Resolution: add missing behavior at global level

33 33 More? Security –protecting data Data Quality –actively managing data quality Integration as Agreement Process –"emergent semantics"


Download ppt "1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2."

Similar presentations


Ads by Google