1 Corso di Architetture della Info A.A. 2009-2010 Carlo Batini 5.1.2 I sistemi di Data Integration elementi architetturali.

Slides:



Advertisements
Similar presentations
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1 DB unimo Searching for data and services F. Guerra 1, A. Maurino 2, M. Palmonari.
Advertisements

Università di Modena e Reggio Emilia ;-)WINK Maurizio Vincini UniMORE Researcher Università di Modena e Reggio Emilia WINK System: Intelligent Integration.
Chapter 10: Designing Databases
Database System Concepts and Architecture
CSE 636 Data Integration Data Integration Approaches.
--What is a Database--1 What is a database What is a Database.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
9/6/2001Database Management – Fall 2000 – R. Larson Information Systems Planning and the Database Design Process University of California, Berkeley School.
Chapter 2 Database Environment.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users   Applications.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Automatic Data Ramon Lawrence University of Manitoba
Lecture Two Database Environment Based on Chapter Two of this book:
1 Information Integration and Source Wrapping Jose Luis Ambite, USC/ISI.
1 Chapter 2 Database Environment. 2 Chapter 2 - Objectives u Purpose of three-level database architecture. u Contents of external, conceptual, and internal.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Chapter 2 Database System Concepts and Architecture
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Copyright © 2004 Pearson Education, Inc. Chapter 1 Introduction.
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
A Unified Framework for the Semantic Integration of XML Databases
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
Database System Concepts and Architecture
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
1 Chapter 1 Introduction. 2 Introduction n Definition A database management system (DBMS) is a general-purpose software system that facilitates the process.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition Copyright © 2004 Pearson Education, Inc. Slide 2-1 Data Models Data Model: A set.
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
Database Environment Session 2 Course Name: Database System Year : 2013.
1 Chapter 1 Introduction to Databases Transparencies.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Exam 1 Review Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2008.
Exam 1 Review Dr. Bernard Chen Ph.D. University of Central Arkansas.
Object storage and object interoperability
1 Chapter 2 Database Environment Pearson Education © 2009.
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Database Environment Chapter 2. The Three-Level ANSI-SPARC Architecture External Level Conceptual Level Internal Level Physical Data.
Postgraduate Module Enterprise Database Systems Technological Educational Institution of Larisa in collaboration with Staffordshire University Larisa
1 Database Design Chapter-2- Database System Concepts and Architecture Reference: Prof. Mona Mursi Lecture notes.
Of 24 lecture 11: ontology – mediation, merging & aligning.
By ILTAF MEHDI (MCS, MCSE, CCNA) 1 Remember: Examination is a chance not ability. 6/12/2016.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Databases and Database User ch1 Define Database? A database is a collection of related data.1 By data, we mean known facts that can be recorded and that.
Databases (CS507) CHAPTER 2.
Databases and DBMSs Todd S. Bacastow January 2005.
Datab ase Systems Week 1 by Zohaib Jan.
Chapter 2 Database Environment.
Chapter 2 Database System Concepts and Architecture
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2 Database Environment.
Chapter 2: Database System Concepts and Architecture
Chapter 2 Database Environment Pearson Education © 2009.
Data, Databases, and DBMSs
Database Systems Instructor Name: Lecture-3.
Database System Concepts and Architecture
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

1 Corso di Architetture della Info A.A Carlo Batini I sistemi di Data Integration elementi architetturali

2 Data Integration (or mediator) systems

3 Data Integration definition Data integration is a major research and business area that has the main purpose of allowing a user to provide uniform access to multiple, autonomous, heterogeneous data sources through the presentation of a unified view of these data. Finding this agreement is complex because one has to find differences and similarities in each schema to be able to conform.

The plus of data integration architectures wrt federated architectures Manages –schema level heterogeneities more complex than in federated databases  –(to some extent..) instance level heterogeneities due to quality errors (accuracy, currency, incompleteness, inconsistencies, etc.) in data

5 Data integration – several approaches Data integration stands for several approaches for combining data from different data sources [Hull, 1997]: Integrated read-only views: Mediation. To support an integrated, read-only, view of data that resides in multiple databases (the majority of academic and commercial systems) Integrated read-write views: Mediation with update. An extension of the mediation architecture to support updates against an integrated view Initially, we will deal only with the first issue

Schema level heterogeneities

NB heterogeneity and conflic are synonym in the following Are of two types Name heterogeneities Type heterogeneities

Name heterogeneities Sinonyms – Different names for the same concepts –employee, clerk –exam, course –code, num Homonyms – Same name for different concepts - Employee as employee in one schema, as vendor in another schema

Name conflicts – HOMONYMS – SYNONIMS Examples of name heterogeneities price (production price) Product price (sale price) Product Department Division

Type conflicts The same concepts is represented with different conceptual structures in two schemas Different definition domains for the same attribute in two schemas Attribute in one schema and derived value in another schema Attribute in one schema and entity in another schema Attribute in one schema and generalization hierarchy in another schema Entity in one schema and relationship in another schema Different abstraction levels for the same concept in two schemas: e.g. two entities with homonym names related by an IS-A hierarchy in two schemas Different granularities in the definition domains Different cardinalities in the same relationships Key conflicts See next pages for examples - 

Examples of type conflicts - 1 TYPE CONFLICTS in a single attribute (e.g. NUMERIC, ALPHANUMERIC,...) e.g. the attribute “gender”: –Male/Female –M/F –0/1 –In Italy, it is implicit in the “codice fiscale” (SSN) Year has a four digit domain in one schema and two digit domain in another schema

different currencies (euros, US dollars, etc.) different measure systems (kilos vs pounds, centigrades vs. Farhenheit.) different granularities (grams, kilos, etc.) Examples of type conflicts - 2

Examples of type conflicts - 3 Person WOMAN MAN GENDER Person PUBLISHER BOOK PUBLISHER EMPLOYEE DEPARTMENT PROJECT EMPLOYEE PROJECT Structure conflicts

DEPENDENCY (OR CARDINALITY) CONFLICTS Examples of type conflicts - 4 EMPLOYEE DEPARTMENT PROJECT EMPLOYEE PROJECT 1:11:n 1:1 1:n

KEY CONFLICTS Examples of type conflicts - 5 CODE PRODUCT LINE CODE PRODUCT DESCRIPTION

16 Data integration The research community has been investigating data integration for about 20 years: different research communities (database, artificial intelligence, semantic web) have been developing and addressing issues related to data integration: –Definitions, architectures, classification of the problems to be addressed –Data Integration problems have been analyzed in different perspectives and different approaches have been proposed –Developed benchmarks allow the evaluation and the comparison of the approaches (THALIA benchmark) –Several commercial software suites have been released and are on testing in real environments

17 Integration of Heterogeneous & Distributed Data Sources “Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data” (Global Virtual Schema (GS)) [Lenzerini, 2002] Query Global Schema (GS) Mapping Local Schema DB File XML

18 Main elements of DI architecture Three main elements of the architecture of a schema integration system can be distinguished. These elements are: a global schema one or more source/local schemas mappings between the global and the source/local schemas

19 Typical architecture of a data integration system Global schema Mapping User query Source 1Source 2 Source n Local schema 1Local schema 2Local schema n Wrapper Mediator Wrapper

20 Definitions of global schema and mappings The global schema describes the structure of the schema representing the whole universe of discourse. The mappings, or connections, describe how each element in the local schemas relates to the global schema (REMARK mappings can be expressed in the two directions…) 

21 Typical architecture of a data integration system Global schema Mapping User query Source 1Source 2 Source n Local schema 1Local schema 2Local schema n Wrapper Mediator Wrapper Global schema Mapping User query Source 1Source 2 Source n Local schema 1Local schema 2Local schema n Wrapper Mediator Wrapper From local schemas to the global schema From the global schema to local schemas

22 Definitions of global schema and mappings The global schema describes the structure of the schema representing the whole universe of discourse. The mappings, or connections, describe how each element in the local schemas relates to the global schema Mappings can be expressed in the two directions Summarized, the essence of integration is to combine information in a logical way so information can be queried as one through a common interface. The schema for each information source needs to be connected through a mapping with the global schema of the common interface to enable querying.

23 Wise 2009 – Poznan (PL)Università di Modena e Reggio Emilia & Milano Bicocca 23 Mediators (1) Query Interface Local Sources Global Schema View Mapping Local Schemata SOURCE 1 Professor (first_name, last_name, , area) SOURCE 2 Faculty_member(name, mail, research_topic) GLOBAL SCHEMA Full_professor (name, mail, area) Search mail of professors whose research activities are in the “Database area” Select From Professor Where area = “Database” Select mail From Faculty_member Where research_topic = “Database” Resultset

24 Wise 2009 – Poznan (PL)Università di Modena e Reggio Emilia & Milano Bicocca 24 Mediators (2) The mediator builds a unified schema of several (heterogeneous) information sources and allows a user to formulate a query on it The user query is transformed in a set of sub-queries, one for each data source involved in the query The results are collected by the Mediator, merged and shown to the user

25 Architettura funzionale di un Data Integration system Wrapper Mediatore Wrapper DBMS BD MultiDBMS client Mediatore - Fornisce agli utenti una rappresentazione virtuale unica delle fonti, data dallo schema globale - Traduce le queries in termini di frammenti, inviate ai wrapper -Ricompone i risultati restituiti dai wrapper - Effettua le azioni di data fusion e di risoluzione delle eterogeneita’ sui valori 

Instance level heterogeneities

Mediators object fusion and reconciliation A mediator’s main functionality is object fusion:  group together information about the same real world entity  remove redundancy among the various data sources  resolve inconsistencies among the various data sources  achieve accuracy, completeness, currency (and other DQ dimensions…) among data from different data sources

28 Architettura funzionale di un Data Integration system Wrapper Mediator Wrapper DBMS BD DI System client Wrapper -Traduce la richiesta che proviene dal mediatore in termini della rappresentazione logico fisica dello schema locale sottostante

29 Wise 2009 – Poznan (PL)Università di Modena e Reggio Emilia & Milano Bicocca 29 Mediators (3) We may divide the interactions with a mediator in two phases: 1.The creation of the unified representation (Publishing phase at design time) 2.The formulation and the execution of a query in the unified representation (Querying phase)

30 Architettura funzionale di un MDBS nel nostro esempio Wrapper Mediatore Wrapper DBMS BD MultiDBMS client StudenteCorsoProfessore Global schema

31 Architettura funzionale di un mediator system - esempio Wrapper Mediatore Wrapper DBMS BD MultiDBMS client Studente Corso Professore Modulo Local schema

32 Virtual Integration Architecture including optimization functionality Data source wrapper Data source wrapper Data source wrapper Sources can be: relational, hierarchical (IMS), structured files, web sites. Mediator: User queries Mediated schema Data source catalog Reformulator Optimizer Execution engine

33 DI Systems and design time vs run time issues Publishing phase (or Design time) –[The global schema and the mappings] must be defined from source schemas Run time –Queries are executed and –Global schema, local schemas and the mappings are maintained

34 Wise 2009 – Poznan (PL)Università di Modena e Reggio Emilia & Milano Bicocca 34 Mediators – relevant challenges Mediator User Interface Data Sources Publishing Phase Visualizing the unified schema Model and language for representing the unified schema Matching and Mapping the unified schema and the local sources Building the unified schema Managing updates Schema extraction Querying Phase Model and Language for formulating queries Model and language for querying the schema Query unfolding / rewriting Data fusion and cleaning Query transformation and execution

35 Wise 2009 – Poznan (PL)Università di Modena e Reggio Emilia & Milano Bicocca 35 Mediators – relevant challenges Mediator User Interface Data Sources Publishing Phase Visualizing the unified schema Model and language for representing the unified schema Matching and Mapping the unified schema and the local sources Building the unified schema Managing updates Schema extraction Querying Phase Model and Language for formulating queries Model and language for querying the schema Query unfolding / rewriting Data fusion and cleaning Query transformation and execution

36 wrapper Mediated Schema Semantic mappings optimization & execution query reformulation Design timeRun time

37 Basic properties of a DI System A System Providing: –Uniform (same query interface to all sources) –Access to (queries; eventually updates too) –Multiple (we want many, but 2 is hard too) –Autonomous (DBA doesn’t report to you) –Heterogeneous (data models are different) –Structured (and at least semi-structured) –Data Sources (not only databases).