Data integration mediation system “ … The mountain is a mountain, The mountain is not a mountain The mountain is a mountain. “ Presented by Taras Mahlin.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

©Silberschatz, Korth and Sudarshan4.1Database System Concepts Lecture-1 Database system,CSE-313, P.B. Dr. M. A. Kashem Associate. Professor. CSE, DUET,
Prentice Hall, Database Systems Week 1 Introduction By Zekrullah Popal.
Data - Information - Knowledge
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
THE OBJECT-ORIENTED DESIGN WORKFLOW Interfaces & Subsystems.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya Fridman Noy and Mark A. Musen.
Ch1: File Systems and Databases Hachim Haddouti
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 1: Introduction to Decision Support Systems Decision Support.
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
Distributed Systems: Client/Server Computing
© 2003, Prentice-Hall Chapter Chapter 2: The Data Warehouse Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Objectives of the Lecture :
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 1: Introduction.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
ADVANCED DATABASES WITH ORACLE 11g FOR ADDB7311 LEARNING UNIT 1 of 7.
1 Introduction to databases concepts CCIS – IS department Level 4.
Introduction to Databases
DBMS By Narinder Singh Computer Sc. Deptt. Topics What is DBMS What is DBMS File System Approach: its limitations File System Approach: its limitations.
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Database Technical Session By: Prof. Adarsh Patel.
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Chapter 1 In-lab Quiz Next week
Concepts and Terminology Introduction to Database.
Introduction to MDA (Model Driven Architecture) CYT.
1 Welcome: To the second learning sequence “ Data Base (DB) and Data Base Management System (DBMS) “ Recap : In the previous learning sequence, we discussed.
Databases and Database Management Systems
Chapter 11 CS Introduction to Database Systems.
© 2007 by Prentice Hall 1 Introduction to databases.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Fluency with Information Technology INFO100 and CSE100 Katherine Deibel Katherine Deibel, Fluency in Information Technology1.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 10: The Data Warehouse Decision Support Systems in the 21 st.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Chap No: 04 Advanced Relational Database
Fundamentals of Information Systems, Seventh Edition 1 Chapter 3 Data Centers, and Business Intelligence.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Design CIS 4800 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College Copyright © 2009 John Wiley & Sons, Inc. Copyright © 2008 Course.
Database Management Systems (DBMS)
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Object storage and object interoperability
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
Databases Salihu Ibrahim Dasuki (PhD) CSC102 INTRODUCTION TO COMPUTER SCIENCE.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
By ILTAF MEHDI (MCS, MCSE, CCNA) 1 Remember: Examination is a chance not ability. 6/12/2016.
SQL Basics Review Reviewing what we’ve learned so far…….
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Introduction To DBMS.
Information Systems Today: Managing in the Digital World
Chapter 1: Introduction
Introduction to Database Management System
Dr. Awad Khalil Computer Science Department AUC
An Introduction to Software Architecture
Database (DB) and Database Management System (DBMS)
The ultimate in data organization
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Dr. Awad Khalil Computer Science Department AUC
Terms: Data: Database: Database Management System: INTRODUCTION
Chapter 1: Introduction
Presentation transcript:

Data integration mediation system “ … The mountain is a mountain, The mountain is not a mountain The mountain is a mountain. “ Presented by Taras Mahlin Heterogeneous reasoning and mediator system

Problem: Mountain is not a mountain The past few decades have witnessed a spectacular explosion in the quantity of data available in one electronic form or another. This vast quantity of data has been gathered, organized, and stored by a small army of individuals, working for different organizations on varied problems.

Solution: Mountain is a mountain Synergetic approach - the complete thing is much more then all it’s components together. Integration of disparate data sources by pooling fragmented data together, resolving data conflicts, and transforming them into information objects All these while user continue to use existing systems for routine function of add, change and delete.

Advantages Advantages An integration alleviates the burden of duplicating the data gathering efforts. Synergetic effect - it enables the extraction of information that would otherwise be impossible. For example: For example: –Law enforcement agencies ( Interpol ) –Insurance companies –Medical researchers and epidemiologists Integrating Heterogeneous Data Sources

Integrating Heterogeneous Reasoning Paradigms In conjunction with the ability to integrate a variety of data sources is the need to integrate diverse forms of reasoning. Access to such reasoning systems provides mediators with sophisticated abilities to extract and produce new information from existing data. For example: For example: –Problem of terrain reasoning Determining where resources can be physically situated Integrating multiple forms of reasoning that may include logical inference, numerical optimization, planning, pattern recognition, scheduling, and learning.

Mediator technology Seamless integration of information located across multiple, heterogeneous computer platforms and recorded in multiple, heterogeneous electronic formats. –relational database management systems, –other non-relational database management systems, –flat files, text files etc. Mediator technology defines a structure and architecture that allows software applications to be independent of the underlying data resources.

Mediators provide: – Intelligence for understanding, selecting, accessing, merging, and manipulating data. –New level of knowledge –Consistent responses to questions regardless of who asks the question. – Seamless integration of information from multiple existing sources without having to redesign existing databases (i.e., legacy data) or change existing operational systems. Mediator technology cont

Mediator technology - summary Mediators perform ``mediation'' between applications and databases. Mediators are software modules that occupy an explicit, active layer between an end user application and the data sources the application is accessing. In this way, the Mediator forms a distinct middle layer, making user applications independent of data sources. They capture knowledge from the data experts so that the common user can find the information. Mediators do not create a new database. A mediator creates a ``virtual'' database that supplies data contained in the existing database(s). Mediators use existing databases and require no redesign or changes in these databases or existing operational systems. Mediators provide easy access to information. They support a heterogeneous computing environment (i.e., multiple hardware, software, and databases). It provides a cost effective means to integrate data from heterogeneous information systems.

Mediators - goals and implementation The aim of the system is to develop the principled methodology for –integrating multiple data sources and –reasoning systems, –and to propose a mediator language within which access to the data sources and reasoning systems can be expressed uniformly. There are two important aspects to constructing a mediator: domain integration and semantic integration. –Domain integration –Domain integration is the physical linking of the data sources and reasoning systems. –Semantic integration –Semantic integration is the coherent extraction and combination of the information provided by the data and reasoning sources, serving a given purpose.

Domain integration Goal: –Adding a new source of data or reasoning system to an existing mediated system (or one being developed) such that Requirements: – resources provided by the new system, whether it is new data, or new representations of data, or a corpus of new reasoning algorithms, may be accessed by various mediators. –no recompilation of the whole system is needed –integrity of the system is preserved

Semantic integration Semantic integrationSemantic integration is the process of specifying methods –to resolve conflicts, –pool information together, –and define new, compositional operations based on existing operations in the individual data sources.

Data Integration and Mediation System DIMS is an implementation of "intelligent middleware” that resides between user applications and independent data sources. Data sources can reside on multiple, heterogeneous computer platforms and may be recorded in a variety of formats DIMS creates a “virtual object database” so that the user application sees the data retrieved from the various sources as though it were returned from a single, integrated database.

System major functions DIMS performs five major functions: –query decomposition/routing –object unification and fusion –removal of data redundancies –identification/resolution of data inconsistencies –advanced data integration techniques Although DIMS performs query decomposition/routing to multiple, heterogeneous data sources, DIMS’s main advantage is its data instance integration functionality.

Query processing example Query :Query : retrieve information about Employees and their associated Dependents. We assumes that the Employee and Dependent information is spread across three disparate data sources: –Personnel database – Payroll database – Benefits database. The Employee information is distributed across the Personnel and Payroll databases. The Dependent information is contained in the Benefits database.

Query processing example cont. Initially, a single query for Employee and their associated Dependent(s) information is sent from a user application to DIMS.

Query processing example cont. First retrieve the Employee objects which meet the specified constraint. Based upon domain-specific knowledge, we know that the Personnel database can supply the Employee name and title information, whereas the Payroll database can supply the Employee name and salary information. In both cases, DIMS automatically “knows” to also retrieve the Employee ID which will be needed for later data integration functions.

Tabular results are returned from the Personnel and Payroll databases to DIMS. Note that Mark Smith is returned only from the Personnel database and Jane Peterson is returned only from the Payroll database. Query processing example cont.

Query processing - data retrieving DIMS performs object unification based on the data returned from the data sources. Object unification is the combining of the data into object instances. Notice that the “Mark Smith” and the “Jane Peterson” objects have empty attributes since their information was returned from single sources with only partial information.Notice that the “Mark Smith” and the “Jane Peterson” objects have empty attributes since their information was returned from single sources with only partial information.

Query processing - redundancy elimination Once the object instances have been created, DIMS then removes any extraneous data redundancies. In this example, the “Tim Andrews” object has the same name listed twice Assume that the domain object model specified that each Employee object should have only one name attribute. Therefore, the “Tim Andrews” object has an extraneous, redundant name attribute which should not exist.

Query processing - redundancy elimination The system will automatically remove the second, redundant occurrence of the name for the “Tim Andrews” object.

Inconsistency resolving DIMS will then identify data inconsistencies within the objects. It can also provide resolutions to these data inconsistencies. In this example, there is a data inconsistency in the “Sarah Jones/Kaiser” object because –the Personnel database returned the name as “Sarah Jones” –whereas the Payroll database returned the name as “Sarah Kaiser” for the same Employee ID.

DIMS identifies the data inconsistency. DIMS will then flag the identified inconsistencies within an object. DIMS can also provides the source information associated with each data inconsistency to allow further automated and/or manual inconsistency handling. Inconsistency resolving cont.

Data inconsistency rules can be defined for a specific domain for DIMS. DIMS uses a rules-based expert system to apply the rules over the data. In this example, assume that a data inconsistency rule that specifies to use data from the Payroll database if there is an inconsistency in an employee’s name attribute is defined. Data inconsistency rules.

Based upon the example’s rule, DIMS will remove the “Sarah Jones” name that came from the Personnel database from the “Sarah Jones/Kaiser” object. Data inconsistency rules cont.

After the Employee objects have been integrated, DIMS will then send another query for the Dependent information associated with each of these Employees. This example assumes that only the Benefits database contains Dependent information. Based on the domain-specific knowledge, DIMS “knows” that each Employee object is associated to its Dependent object(s) via the Employee ID attribute. Therefore DIMS uses this information to constrain the new query. Getting dependent information

The Dependents information is returned from the Benefits database. Getting dependent information cont.

DIMS again performs object unification on the new results. However, instead of making totally independent objects for the Dependents, DIMS integrates the Dependent objects with the appropriate Employee objects. Since the Dependent objects contained no redundant data nor data inconsistencies, no further processing is needed on the Dependent information. Object unification

Finally, DIMS returns all the Employee objects and their associated Dependent objects to the user application as a single, packaged integrated response. The user application never had to “know” anything about all the extra processing that DIMS performed -- it simply knows that it had to send one query to DIMS and received one “clean”, integrated response. Composing the result

Units conversion Data abstraction Data aggregations Expert rules Advanced integration