TU/e eindhoven university of technology / faculty of mathematics and informatics Technologie van Informatiesystemen TIS college 3.

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

Database Systems: Design, Implementation, and Management Tenth Edition
Software Modeling SWE5441 Lecture 3 Eng. Mohammed Timraz
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
MSc IT UFIE8K-10-M Data Management Prakash Chatterjee Room 3P16
A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.
Chapter 1: Data Models and DBMS Architecture Title: What Goes Around Comes Around Authors: M. Stonebraker, J. Hellerstein Pages: 2-40.
TU/e eindhoven university of technology / faculty of mathematics and informatics Exporting Databases in XML DTD A Conceptual and Generic Approach Philippe.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Page 1 MDBS Schema Integration: The Relational Integration Model Ramon Lawrence MDBS Schema Integration: The Relational Integration Model Candidacy Exam.
Introduction to Databases Transparencies
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
The University of Akron Dept of Business Technology Computer Information Systems Database Management Approaches 2440: 180 Database Concepts Instructor:
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
Automatic Data Ramon Lawrence University of Manitoba
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
/ faculty of mathematics and informatics TU/e eindhoven university of technology ADBIS'200128/09/20011 An RMM-Based Methodology for Hypermedia Presentation.
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
1 CS 456 Software Engineering. 2 Contents 3 Chapter 1: Introduction.
Information Systems: Modelling Complexity with Categories Four lectures given by Nick Rossiter at Universidad de Las Palmas de Gran Canaria, 15th-19th.
Database Architecture Introduction to Databases. The Nature of Data Un-structured Semi-structured Structured.
Web-Enabled Decision Support Systems
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
Database Systems: Design, Implementation, and Management Ninth Edition
Session-9 Data Management for Decision Support
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
Data Access and Security in Multiple Heterogeneous Databases Afroz Deepti.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
1 By Paul Murray Claire McQuade Kashif Rafiq David Miller.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
1 Chapter 1 Introduction to Databases Transparencies.
Distributed database system
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Wrapper-Based Evolution of Legacy Information System Philippe Thiran et al Fcculties University Notre-Dame de la Paix.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Object storage and object interoperability
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
2) Database System Concepts and Architecture. Slide 2- 2 Outline Data Models and Their Categories Schemas, Instances, and States Three-Schema Architecture.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Databases Salihu Ibrahim Dasuki (PhD) CSC102 INTRODUCTION TO COMPUTER SCIENCE.
Of 24 lecture 11: ontology – mediation, merging & aligning.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Chapter (12) – Old Version
Chapter 2: Database System Concepts and Architecture - Outline
Database Management:.
Tools for Memory: Database Management Systems
Database Management System (DBMS)
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
MANAGING DATA RESOURCES
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Database Architecture
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Practical Database Design and Tuning Objectives
Presentation transcript:

TU/e eindhoven university of technology / faculty of mathematics and informatics Technologie van Informatiesystemen TIS college 3

TU/e eindhoven university of technology / faculty of mathematics and informatics Inhoud Inleiding, 30/11 Web engineering & Web information systems, 7/12 Data transformatie & Data integratie, 14/12 ERP, Smulders (Deloitte), 21/ /1 Flower, Berens (Pallas Athena), 25/1 + 1/2 Biztalk, van den Boom (Microsoft), 15+22/2

TU/e eindhoven university of technology / faculty of mathematics and informatics Inhoud Inleiding, 30/11 Web engineering & Web information systems, 7/12 Data transformatie & Data integratie, 14/12 ERP, Smulders (Deloitte), 21/ /1 Flower, Berens (Pallas Athena), 25/1 + 1/2 Biztalk, van den Boom (Microsoft), 15+22/2 Philippe Thiran

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Data Integration Philippe Thiran Computer Science Department Technische Universiteit Eindhoven The Netherlands

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation & Integration Agenda – Problem Statement Existing database systems Heterogeneity, distribution, autonomy – Data Transformation Schema conversion Query conversion: Wrapper – Data Integration Schema integration Query processing: Multidatabase and Federation

TU/e eindhoven university of technology / faculty of mathematics and informatics Problem Statement Existing database systems Heterogeneity, distribution, autonomy

TU/e eindhoven university of technology / faculty of mathematics and informatics Problem Statement Existing Database Systems Existing Database Systems – Data are recorded in existing database systems – Existing database systems are: Mission critical (essential to the organization business) To be operational at all times Inflexible – Typically, existing database systems are: Very large (millions of lines of code) Old (often more than 10 years old) Written in old programming language like COBOL, PL/1, SQL! Built around an old DBMS

TU/e eindhoven university of technology / faculty of mathematics and informatics Problem Statement Existing Database Systems Existing Database Systems – Data are recorded in existing database systems – Answer of old requirements New functions and services New user requirements New technology (Web) Communication among them?

TU/e eindhoven university of technology / faculty of mathematics and informatics Problem Statement Existing Database Systems Existing Systems: New Services – How to deal with existing database systems ? Abandon the existing systems: migration to a new system Keep and modify the existing systems Keep the existing systems and wrap them: autonomy Existing Systems: Communication – How to integrate existing database systems?

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Problems – Integrating database systems is very hard and costly – Three main dimension of the problem: Distribution Autonomy Heterogeneity Distribution Autonomy Heterogeneity Centralized DBMS Distributed databases Problem Statement Data Integration

TU/e eindhoven university of technology / faculty of mathematics and informatics Autonomy – Autonomy refers to the distribution of control – Four dimensions of autonomy: Design: own data models and own transaction management technique Communication: nor knowledge of the existence of other system nor how to communicate with them Execution: independently of the other systems Association: each system decides how much of its data and processing capabilities it will share with the other system Data Integration Problem Statement Distribution Autonomy Heterogeneity

TU/e eindhoven university of technology / faculty of mathematics and informatics Heterogeneity – Heterogeneity may exist at three basic levels: DBMS level. Data is managed by a variety of DBMS based on different data models and data languages – Data models : relational model, hierarchical model and file model – Data languages : SQL, DL/1, COBOL programs Platform level. Different hardwares, different network protocols Semantic level. Different designer viewpoints in modelling the same objects of the application domain. Incompatible design specifications which lead to different naming, types or integrity constraints Data Integration Problem Statement Distribution Autonomy Heterogeneity

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Generic Integration Architecture Schema Hierarchy Database Schema 1 DB1 Export Schema 1 Database Schema 2 DB2 Export Schema 2 Data Schema 3 Export Schema 3 Relational DBMS OO DBMS File System Import Schema 1 Integrated Schema Import Schema 2 Import Schema 3 Local Models Common Model Unifies data models View on export schema available for non-local access Homogenizes and unions import schemas

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Generic Integration Architecture Schema Hierarchy Database Schema 1 DB1 Export Schema 1 Database Schema 2 DB2 Export Schema 2 Data Schema 3 Export Schema 3 Relational DBMS OO DBMS File System Import Schema 1 Integrated Schema Import Schema 2 Import Schema 3 Local Models Common Model Data and Schema Transformation Data and Schema Integration

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Schema Conversion Query Conversion: Wrapper

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Schema Conversion Introduction – Schema conversion – Query/Data conversion Data Source 1 Local Data Models Common Data Model Query1’ Database Schema 1 Data Source 2 Database Schema 2 Export Schema 1 Export Schema 2 Query1 Query2’ Query2 Data1’ Data1 Data2’ Data2

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Schema Conversion Schema Conversion – Schema transformation Transformation of a schema expressed in a data model (Ms) into an equivalent schema expressed in another data model (Mt) Examples – ER model  Relational model (lecture ISO) – Relational model  XML Schema (see later) Schema transformation operators Schema conversion consists in applying the relevant transformations on the relevant constructs of the schema expressed in Ms in such a way that the final result complies with Mt

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Schema Conversion Schema Conversion –Schema transformation A (schema) transformation basically is an operator by which a source data structure C is replaced with a target structure C'. Example of a semantics-preserving transformation: transforming a relationship type into an attribute B B1 B2 id:B1 A A1 B1 ref:B N R B B1 B2 id:B1 A A1 RT-FK: Transforming a binary relationship type into a foreign key.

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Schema Conversion Schema Conversion –2 main schema transformations for ER model  Relational model RT-ET: Transforming a relationship type into an entity type. Inverse: ET-RT RT-FK: Transforming a binary relationship type into a foreign key. Inverse: FK-RT B B1 B2 id:B1 A A1 B1 ref:B N R B B1 B2 id:B1 A A1

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Schema Conversion Schema Conversion –Exercice: From ER model  Relational model

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Schema Conversion Schema Conversion –Exercice: From ER model  Relational model

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Schema Conversion Schema Conversion –Exercice: From ER model  Relational model

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Wrappers Definition – A wrapper controls a (legacy) data source – Basically a wrapper is a software component that offers an homogeneous query interface based on a common data model (XML for the Web) – It converts data and queries from the common data model to a local data model  It offers an adequate way for solving the DBMS heterogeneity that appears when one wants to integrate existing and heterogeneous data systems Database Schema Export Schema Data Source Wrapper Local Data Models Common Data Model Common Data Model Common Query Language

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Wrappers Definition (ctd) – A data wrapper is basically defined as a converter of data and queries – That is, a wrapper: Offers an export schema in the common data model Accepts queries against the export schema Translates them into queries understandable by the data system Transforms the results of the local queries into a format understood by the application Database Schema Export Schema Data Source Wrapper Local Data Models Common Data Model Common Data Model Common Query Language QueryData Local Data Model Local Query Language

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Wrappers Categories of Wrappers – There exists no standard approach to build wrappers – Functionality One-way: only transformation of data (e.g., for data warehouses) Two-way: transformation of requests and data – Development Hard-wired wrappers, for specific data sources Semi-automated generation: wrapper development tools Automatically generated wrappers – Availability Standalone programs (data conversion, data migration) Components of a federation (see later) Database interface for foreign data

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Wrappers Wrappers and the Web – Wrapper interface Data format: XML Common data model: XML DTD and Schema Common query language: XPath, XQuery, none – Wrapper mapping Generally between relational data and XML Two translation types – Automated – Defined by the user XML- or SQL-oriented query language

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Wrappers XML Views of Relational Databases – Automated translation Oi d DescCost 10Ship Generator8000 IdCustnameCustnum 10Philips7734 9Unilever7725 OidDueAmt 101/10/ /10/ Order Item Payement 10 Philips Unilever Ship Generator 8000 similar to and

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Wrappers XML Views of Relational Databases – User-defined Translation Oi d DescCost 10Ship Generator8000 IdCustnameCustnum 10Philips7734 9Unilever7725 OidDueAmt 101/10/ /10/ Order Item Payement Philips …

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Wrappers XML Views of Relational Databases – Exercises What is the XML Document of this relational database?

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Wrappers XML Views of Relational Databases – Exercises What is the XML Document of this relational database? <!ATTLIST Order OrderID ID #REQUIRED> <!ATTLIST Detail Product IDREF #REQUIRED> <!ATTLIST Product Reference ID #REQUIRED Label CDATA #IMPLIED UnitPrice CDATA #REQUIRED>

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Wrappers XML Views of Existing Relational Databases – Mapping definition SQL-oriented query language For $b in SQL(select * from Order where Custname=“’ +$x + ‘””) return {$b/Id} {$x} IdCustnameCustnum 10Philips7734 9Unilever7725 Order IdCustname

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Wrappers XML Views of Existing Relational Databases – XML View definition Bottom-up (from the relational schema) Top-Down (from a given XML schema) – Mappings between XML views and relational schemas Automated (algorithm) Manual (defined by the user)

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Transformation Wrappers XML Views of Existing Relational Databases – Examples Product NameSQL-written Mapping XML-written Mapping XML SchemaQuery over views Xperanto (IBM) noyes (XQuery) XML Schemayes (XQuery) update Microsoft’s SQL Server yes (FOR XML clause) noXDR Schemayes (XPath) DB2 (IBM)noyes (subset of XQuery) yes (XQuery)no Oracle9iyesno SilkRoute (AT&T) noyes (XQuery) XML Schemayes (XQuery) update

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Generic Integration Architecture Schema Integration Query Processing: multidatabase and federation

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Generic Integration Architecture Schema Integration

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Generic Integration Architecture Schema Hierarchy Database Schema 1 DB1 Export Schema 1 Database Schema 2 DB2 Export Schema 2 Data Schema 3 Export Schema 3 Relational DBMS OO DBMS File System Import Schema 1 Integrated Schema Import Schema 2 Import Schema 3 Local Models Common Model Unifies data models View on export schema available for non-local access Homogenizes and unions import schemas

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Generic Integration Architecture Component Architecture Application 1 DB1 Application 2Application 3 DBMS 1 DB2 DBMS 2 DB3 DBMS 3 Wrapper Meditor Common DDL/DML Integrated Schema Export Schema 1 Local DDL/DML Database Schema 1 Import Schema 1 Controls a local data source Offers an homogeneous query interface based on a common data model Offers an abstract integrated view of sources Reconciles independent data structures to yield a unique, coherent, view of the data

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Generic Integration Architecture Aspects to Consider for Integration – General Issues Bottom-up vs. top-down engineering – From existing schema to integrated or vice-versa – Schema integration vs. schema matching Virtual vs. materialized integration Read-only vs. read-write access Transparency – Language, schema, location – Data Model related issues Types of sources – Structured, semi-structured, unstructured Common data model of integrated system Tight vs. loose integration – Use of a global schema Query model

TU/e eindhoven university of technology / faculty of mathematics and informatics Methodology – Bottom-up process – Four main steps Preparing the local schemas Detecting what is common between the components of local schemas – Correspondence (what is common) Solving the conflicts – Conflict (what is incompatible) Integrating the different schemas according to the correspondences and conflicts detected in the previous steps Data Integration Schema Integration

TU/e eindhoven university of technology / faculty of mathematics and informatics Concept of Correspondence – Two complementary views of correspondence: Structural correspondence (schema level: concepts) Instance correspondence (instance level: data) – Structural correspondence Five types of structural correspondence: – Identity – Independence – Complementarity – Subtyping – Common supertype Data Integration Schema Integration

TU/e eindhoven university of technology / faculty of mathematics and informatics Concept of Correspondence – Instance correspondence Four types of instance correspondence: – Disjointed: the instances classes are disjointed – Inclusion: the set of one class is included to another class – Equivalence: the classes contain the same instances – Overlapping: the classes share some instances but not all Data Integration Schema Integration

TU/e eindhoven university of technology / faculty of mathematics and informatics Concept of Conflict – Conflicts occur in three possible ways : syntactic (naming conflicts), structural, semantic or instance – Syntactic conflicts (resolution: use of an ontology) Synonyms. Two identical objects (entities, attributes, relationships) that have different names are synonyms Homonyms. Two different objects that have identical names are homonyms – Structural conflicts (resolution: mapping function or transformation) Domain. Two identical objects have different domains (Differences in dimension, units and scales) Structure. The same concept is presented by different data structures (e.g., different attributes) Data Integration Schema Integration

TU/e eindhoven university of technology / faculty of mathematics and informatics Concept of Conflict – Structural conflict In the left-hand schema, Address is an compound attribute, whereas in the right-hand one, Address is represented by an entity type Resolution: transformation Data Integration Schema Integration Site 1 Site 2

TU/e eindhoven university of technology / faculty of mathematics and informatics Concept of Conflict – Semantic conflicts A semantic conflict appears when a contradiction appears between two representations A and B of the same application domain concept or between two integrity constraints (resolution?) Example – In the left-hand schema, Customer is identified by CustId, whereas in the right-hand one, it is identified by Name Data Integration Schema Integration Site 1Site 2

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Schema Integration Concept of Conflict – Instance conflicts Instance conflicts are specific to existing data Modelling constructs A and B that are recognized as corresponding can cover sets with different scopes Examples – ZIP codes of addresses can be written like “NL-5600 MB” or “56oo MB” or “5600” – Different ZIP codes can be recorded for the same address (encoding errors) – Resolution: Data transforming… cleaning?

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Query Processing: multidatabase and federation

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Integration Architecture Three Classical Architectures – Multidatabases No integrated schema Integrated access to different relational DBMS – Federated Databases Integrated schema Integrated access to different DBMS Integrated access to different data sources (on the Web) – Data Warehouses Materialized integrated data sources Not here

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Query Processing Classical Architecture: Multidatabase – Enable transparent access to multiple (relational) databases Hides distribution, different SQL variants Processes queries and updates against multiple databases (2- phase commit) Does not provide any type of global schema (does not hide the different database schemas) Example: IBM DataJoiner DataJoiner Sybase Open Client Oracle SQL*Net TCP/IP Network Sybase Server Oracle Server

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Query Processing Classical Architecture: Multidatabase – Multidatabase schema Source 1 Source 2 SybaseOracle Multidatabase Schema

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Query Processing Classical Architecture: Multidatabase – Query processing Multidatabase Schema SELECT title FROM PUBLICATIONS SELECT title FROM PAPERS Source 1 Sybase Source 2 Oracle Sybase Data Oracle Data SELECT p2.title FROM Sybase.PUBLICATIONS p1, Oracle.PAPERS p2 WHERE p1.title = p2.title

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Query Processing Classical Architecture: Multidatabase Main properties Transparency – Low level of transparency provided to the user (The user is responsible for finding the relevant information, understanding each database schema, detecting and resolving the semantic conflicts, and finally, building the required view of the data in the sources) Autonomy – Not intrusive against the autonomy of the data sources – Suitable when component systems are strongly autonomous Methodology – Simplicity since there is no schema integration Maintenance and evolution – No integrated schema maintenance

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Query Processing Classical Architecture: Federation – Integrated schema(s) and unique interface Hides the semantic and location heterogeneity Wrapper/Mediator hierarchy – Wrapper » Controls a local data source » Offers an homogeneous query interface based on a common data model – Mediator » Offers an abstract integrated view of several sources » Reconciles independent data structures to yield a unique, coherent, view of the data – Research projects Tsimmis (Stanford) Garlic (IBM) Oasis (Dublin University)

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Query Processing Classical Architecture: Federation – Typical example Views Integrated schema Import schemas Oracle SQL DBMS XML DBMS Wrapper (provides export schema) Meditor Authors ANR Title FirstName Surname Affiliation id:ANR Publication PNR Title Authors Journal Pages id:PNR

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Query Processing Classical Architecture: Federation – Typical example Views Import schema DB1 Import schema DB2 Integrated schema

TU/e eindhoven university of technology / faculty of mathematics and informatics Q2 Q2’ Q1’ Data Integration Query Processing: Federation Submit query Q Q = FOR $b IN //Book RETURN $b/author Q1 = FOR $b IN //Book RETURN $b/authors Q2 = FOR $b IN //book RETURN $b/author Q1’ = SELECT a.name FROM AUTHORS A Q2’ = //book/author ORACLE SQL DBMS XML DBMS Q1 A1= { … } A1A2 A2= { … } A2 Return result A A1’={ … } A = A1’  A2

TU/e eindhoven university of technology / faculty of mathematics and informatics Data Integration Query Processing Classical Architecture: Federation Main properties Transparency – High level of transparency provided to the user. The user is not aware of the distribution and the heterogeneity of the integrated data sources Autonomy – Each local data source have control over its sharable information Methodology – Problems of defining an integrated schema – Web as Loosely Coupled Federation Many different, widely distributed information systems Heterogeneity – Structural homogeneous: XML – Semantically heterogeneous: no explicit schemas (ontology?) Autonomy – Runtime autonomy: pages change on average every 4 weeks, dangling links Distribution – Replication (proxies) and caching frequently used