1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.

Slides:



Advertisements
Similar presentations
Chapter 4 Database Processing. Agenda Purpose of Database Terminology Components of Database System Multi-user Processing Database Design Entity-relationship.
Advertisements

Introduction to Databases
Slide 1 Web-Base Management Systems Aaron Brown and David Oppenheimer CS294-7 February 11, 1999.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/1 Outline Introduction Background Distributed Database Design Database Integration ➡ Schema Matching ➡
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
--What is a Database--1 What is a database What is a Database.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
File Systems and Databases
Ch1: File Systems and Databases Hachim Haddouti
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
M1G Introduction to Database Development 1. Databases and Database Design.
Automatic Data Ramon Lawrence University of Manitoba
1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI.
Dr. Kalpakis CMSC 461, Database Management Systems Introduction.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
Week 1 Lecture MSCD 600 Database Architecture Samuel ConnSamuel Conn, Asst. Professor Suggestions for using the Lecture Slides.
Module Title? DBMS Introduction to Database Management System.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
Information Systems: Modelling Complexity with Categories Four lectures given by Nick Rossiter at Universidad de Las Palmas de Gran Canaria, 15th-19th.
Chapter 2 CIS Sungchul Hong
TU/e eindhoven university of technology / faculty of mathematics and informatics Technologie van Informatiesystemen TIS college 3.
Database Technical Session By: Prof. Adarsh Patel.
CST203-2 Database Management Systems Lecture 2. One Tier Architecture Eg: In this scenario, a workgroup database is stored in a shared location on a single.
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
2. Database System Concepts and Architecture
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
Chapter 1 : Introduction §Purpose of Database Systems §View of Data §Data Models §Data Definition Language §Data Manipulation Language §Transaction Management.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
1 Chapter 1 Introduction. 2 Introduction n Definition A database management system (DBMS) is a general-purpose software system that facilitates the process.
Announcements. Data Management Chapter 12 Traditional File Approach  Structure Field  Record  File  Fixed All records have common fields, and a field.
COMU114: Introduction to Database Development 1. Databases and Database Design.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Distributed Databases
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
1 Database Management Systems (DBMS). 2 Database Management Systems (DBMS) n Overview of: ä Database Management Components ä Database Systems Architecture.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
1 Lecture 9: Distributed Databases – Principles and Architectures Advanced Databases CG096 Nick Rossiter.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
1 Lecture 7 Distributed Data Bases: Principles and Architectures.
Web Technologies Lecture 10 Web services. From W3C – A software system designed to support interoperable machine-to-machine interaction over a network.
Object storage and object interoperability
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
1 Chapter 2 Database Environment Pearson Education © 2009.
The Relational Model Lecture #2 Monday 21 st October 2001.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Database Management:.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2 Database Environment.
File Systems and Databases
Database Architecture
Database System Concepts and Architecture
CMPE/SE 131 Software Engineering March 7 Class Meeting
Presentation transcript:

1 Lecture 13: Database Heterogeneity

2 Outline Database Integration Wrappers Mediators Integration Conflicts

3 1. Database Integration Goal: providing a uniform access to multiple heterogeneous information sources More than data exchange (e.g., ASCII, EDI, XML) Old problem, difficult, well-know (partial) solutions Ebay DVD orders IMDB amazon Oracle PointBase MySQL IBM DB2 movie DB order movieorder status

4 …still A Big Problem in Practice

5 Manually merge multiple databases into a new global database Time consuming and error prone Local autonomy lost Static solution Does not scale with number of databases Book(ISBN, Title, Price, Author) Author(Name, ISBN ) Livre(ISBN, Prix, Titre) Auteur(Prenom, Nom, ISBN ) Book(ISBN, Title) Author(Name, ISBN ) Old-School Approach (1) Manual, Global Integration

6 Old-School Approach (2) Multidatabase Language Approach No attempt at integrating schemas Language (e.g., MSQL) used to integrate information sources at run-time Simple example: Not transparent (you need to know all databases!) Heavy burden on (expert) users Global queries subject to local changes Use S1, S2 Select Titre From S1.Book, S2.Livre Where S1.Book.ISBN = S2.Livre.ISBN

7 Problem Dimensions Distribution Autonomy Heterogeneity

8 How to Deal with Distribution? Problems –data access over the network –inconsistent replicated data Solutions –solved by using Web access (over HTTP) –Java RMI –publishing using JSP –JDBC to access remote databases –Etc.

9 How to Deal with Autonomy? Problems –changing structure of Web page –different coverage of Web sites –availability of services Solutions –manually adapt to changes –replication, materialization (availability) –contacts, agreements… standards

10 How to Deal with Heterogeneity? Problems –Data models –Schemas –Data Solutions –Mappings, schema integration –Standards

11 Solution Variants General issues –Bottom-up vs. top-down engineering –Virtual vs. materialized integration –Read-only vs. read-write access –Transparency: language, schema, location What did you do?

12 A Generic System Architecture One solution: the Wrapper-Mediator architecture DB1 DB2DB3 DB4 Oracle PointBase MySQL IBM DB2 wrapper mediator application 1 application 2application 3 mediators integrate the data from the DBs wrappers convert to a common representation

13 2. Wrappers request/queryresult/data Compensation for missing processing capabilities Transformation of data model Communication interface Source data Metadata integrity constraints

14 Wrapper Tasks Translate among different data models Data Model consists of –Data types –Integrity constraints –Operations (e.g. query language) Overcome other "syntactic" heterogeneity

15 A Closer Look at Data Models Data model used by sources –relational? HTML? XML? RDF? Custom? Text? Data model used by integrated DB –canonical data model (e.g. relational, XML) Query models –Structured queries, retrieval queries, data mining (statistics)

16 Example: Wrapping Relational Data in XML/HTML Data types –trivial Integrity Constraints (e.g. primary keys) –requires XML Schema Operations –none in HTML

17 Example: Wrapping XML/HTML into Relational Data Types –which difficulties? Integrity Constraints –none in HTML Operations –requires generally XQuery –form fields can be considered as hard-coded queries

18 3. The Mediator Integrate data with same "real-world meaning", but different representations –integration mapping  schema integration –can be implemented, e.g., as database views Decompose queries against the integrated schema to queries against source DBs –only for virtual integration

19 An Example: LAV Local As View approach Each local database is defined as a view on the integrated schema A simple Example: Source A: R1(prof, course, university) Source B: R2(title, prof, course) Definition of the global, integrated schema: Global(prof,course,title,university) Source A defined as: Create view R1 as SELECT prof, course, university FROM Global Source B defined: SELECT title, prof, course FROM Global

20 Schema Integration Standard Methodology Schema translation (wrapper) Correspondence investigation Conflict resolution and schema integration

21 Identifying Schema Correspondences Sources of information –source schema –source database –source application –database administrator, developer, user

22 Identifying Schema Correspondences Semantic correspondences –e.g. related names Structural correspondences –reachability by paths Data analysis –distribution of values

23 A Closer Look at Schemas Tight vs. loose integration –Is there a global schema? Support for semantic integration –collection, fusion, abstraction

24 A Widely Used Architecture: Federated DBMS View 1View 2View 3 Integrated Schema Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema... Relational. DBMS Objectorient. DBMS File System Web Server accepted model for integrated database systems with integrated schema 5-level architecture data independence

25 Export Schema provided by data source source DB can change w/o changing export schema View 1View 2View 3 Integrated Schema Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema... Relational. DBMS Objectorient. DBMS File System Web Server

26 Import Schema provided by wrapper export schema can change w/o changing import schema View 1View 2View 3 Integrated Schema Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema... Relational. DBMS Objectorient. DBMS File System Web Server

27 Integrated Schema provided by mediator import schemas can change w/o changing integrated schema View 1View 2View 3 Integrated Schema Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema... Relational. DBMS Objectorient. DBMS File System Web Server

28 Application View provided by application integrated DB can change w/o changing application (code) View 1View 2View 3 Integrated Schema Export Schema Export Schema Export Schema Export Schema Import Schema Import Schema Import Schema Import Schema... Relational. DBMS Objectorient. DBMS File System Web Server

29 4. Handling Integration Conflicts What types of problems did you encounter integrating corresponding data? different structural representation (e.g. attribute vs. table) different naming schemes

30 Types of Conflicts Schema level –Naming conflicts –Structural conflicts –Classification conflicts –Constraint and behavioral conflicts Data level –Identification conflicts –Representational conflicts –Data errors

31 Conflict Resolution Depends on type of conflict Requires construction of mappings Mappings might be complex, e.g. not expressible as SQL views

32 Naming Conflicts Homonyms –same name used for different concepts –Resolution: introduce prefixes to distinguish the names Synonyms –different names for the same concepts –Resolution: introduce a mapping to a common name

33 Structural Conflicts Different, non-corresponding attributes –Resolution: create a relation with the union of the attributes Different datatypes –Resolution: build a mapping function Different data model constructs –e.g. attribute vs. relation –Resolution: requires higher order mappings

34 Classification Conflicts Relations can have different coverage (inclusion, non-empty intersection) –Resolution: build generalization hierarchies Additional problem –Identification of corresponding data instances –"real world" correspondence is application dependent

35 Data Correspondences Corresponding data instances –similar to naming conflicts at schema level –Resolution: mapping tables and functions –Similarity functions Corresponding data values, data conflicts –of corresponding data instances –Resolution: mapping tables and functions –Prefer data from more trusted data source

36 Constraint and Behavioral Conflicts Cardinality conflicts –different types of cardinality constraints on relationships –Resolution: use the more general constraint Behavioral conflicts for relation update –E.g. cascading delete vs. non-cascading –Resolution: add missing behavior at global level

37 More? Security –protecting data Data Quality –actively managing data quality Integration as Agreement Process –"emergent semantics"