Data Integration https://store.theartofservice.com/the-data-integration-toolkit.html.

Slides:



Advertisements
Similar presentations
Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
Advertisements

Distributed Data Processing
Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
Pentaho Open Source BI Goldwin. Pentaho Overview Pentaho is the commercial open source software for Business Pentaho is the commercial open source software.
Information Integration Using Logical Views Jeffrey D. Ullman.
An Extensible System for Merging Two Models Rachel Pottinger University of Washington Supervisors: Phil Bernstein and Alon Halevy.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Chapter 2 Database Environment Transparencies © Pearson Education Limited 1995, 2005.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
1 Introduction to Software Engineering Lecture 42 – Communication Skills.
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Paea LePendu Week 8 (Nov. 16)
Organizing Data & Information
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
Chapter 2 Database Environment Pearson Education © 2014.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
Automatic Data Ramon Lawrence University of Manitoba
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
1 CS 456 Software Engineering. 2 Contents 3 Chapter 1: Introduction.
Database Architecture Introduction to Databases. The Nature of Data Un-structured Semi-structured Structured.
Fundamentals of Information Systems, Fifth Edition
Knowledge representation
Organizing Data and Information AD660 – Databases, Security, and Web Technologies Marcus Goncalves Spring 2013.
Emerging Technologies Work Group Master Data Management (MDM) in the Public Sector Don Hoag Manager.
Information Explosion. Reality: New Machine-Generated Data Non-relational and relational data outside of the EDW † Source: Analytics Platforms – Beyond.
Enterprise Reporting Solution
Model-Driven Analysis Frameworks for Embedded Systems George Edwards USC Center for Systems and Software Engineering
IT Management
Database A database is a collection of data organized to meet users’ needs. In this section: Database Structure Database Tools Industrial Databases Concepts.
1.file. 2.database. 3.entity. 4.record. 5.attribute. When working with a database, a group of related fields comprises a(n)…
Customer Data
Service Oriented Architecture (SOA) Dennis Schwarz November 21, 2008.
The Volcano Optimizer Generator Extensibility and Efficient Search.
FDT Foil no 1 On Methodology from Domain to System Descriptions by Rolv Bræk NTNU Workshop on Philosophy and Applicablitiy of Formal Languages Geneve 15.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Bayu Adhi Tama, M.T.I 1 © Pearson Education Limited 1995, 2005.
1 Chapter 1 Introduction to Databases Transparencies.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Managing Enterprise GIS Geodatabases
Operational Data Store
Information Architecture The Open Group UDEF Project
Business Modeling
Chapter 2 Database Environment.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Lecture 8-2CS250: Intro to AI/Lisp What do you mean, “What do I mean?” Lecture 8-2 November 18 th, 1999 CS250.
Storage Virtualization
A Rule Driven Bi-Directional Translation System for Remapping Queries and Result Sets Between a Mediated Schema and Heterogeneous Data Sources R. Shaker.
SAP NetWeaver Business Intelligence SAP Netweaver Business Warehouse (SAP NetWeaver BW) the name of the Business Intelligence,
MDD-Kurs / MDA Cortex Brainware Consulting & Training GmbH Copyright © 2007 Cortex Brainware GmbH Bild 1Ver.: 1.0 How does intelligent functionality implemented.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Understanding Enterprise Architecture
Introduction: Computer programming
Software Design and Architecture
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Model-Driven Analysis Frameworks for Embedded Systems
Database Environment Transparencies
Data Warehousing Concepts
Business Process Management and Semantic Technologies
Presentation transcript:

Data Integration

Data fusion Data integration 1 In applications outside of the geospatial domain, differences in the usage of the terms Data integration and Data fusion apply. In areas such as business intelligence, for example, data integration is used to describe the combining of data, whereas data fusion is integration followed by reduction or replacement. Data integration might be viewed as set combination wherein the larger set is retained, whereas fusion is a set reduction technique with improved confidence.

Data integration 1 In management circles, people frequently refer to data integration as "Enterprise Information Integration" (EII).

Data integration History 1 As of 2009 the trend in data integration has favored loosening the coupling between data and providing a unified query-interface to access real time data over a mediated schema (see figure 2), which allows information to be retrieved directly from original databases

Data integration History 1 This approach represents ontology- based data integration

Data integration Theory of data integration 1 The theory of data integration forms a subset of database theory and formalizes the underlying concepts of the problem in first-order logic. Applying the theories gives indications as to the feasibility and difficulty of data integration. While its may appear abstract, they have sufficient generality to accommodate all manner of integration systems.

Data integration Definitions 1 When users pose queries over the data integration system, they pose queries over and the mapping then asserts connections between the elements in the global schema and the source schemas.

Data integration Definitions 1 The burden of complexity falls on implementing mediator code instructing the data integration system exactly how to retrieve elements from the source databases

Data integration Definitions 1 In a GAV approach to the example data integration system above, the system designer would first develop mediators for each of the city information sources and then design the global schema around these mediators

Data integration Definitions 1 In an LAV approach to the example data integration system above, the system designer designs the global schema first and then simply inputs the schemas of the respective city information sources

Data integration Query processing 1 The theory of query processing in data integration systems is commonly expressed using conjunctive queries and Datalog, a purely declarative logic programming language

Data integration Query processing 1 In terms of data integration, "query containment" represents an important property of conjunctive queries

Data integration Query processing 1 In LAV systems, queries undergo a more radical process of rewriting because no mediator exists to align the user's query with a simple expansion strategy. The integration system must execute a search over the space of possible queries in order to find the best rewrite. The resulting rewrite may not be an equivalent query but maximally contained, and the resulting tuples may be incomplete. As of 2009 the MiniCon algorithm is the leading query rewriting algorithm for LAV data integration systems.

Data integration Data Integration in the Life Sciences 1 National Science Foundation initiatives such as Datanet are intended to make data integration easier for scientists by providing cyberinfrastructure and setting standards

Data integration Further reading 1 Ronald Schuldt (November 15, 2011). UDEF – Six Steps to Cost Effective Data Integration. CreateSpace. ISBN

Customer data integration 1 In data processing, 'customer data integration' ('CDI') combines the technology, processes and services needed to set up and maintain an accurate, timely, complete and comprehensive representation of a customer across multiple channels, business-lines, and enterprises — typically from multiple sources of associated data in multiple application systems and databases. It applies data integration|data-integration techniques in this specific area.

Customer data integration - Techniques for managing complexity 1 # management – data integration, governance, stewardship, operations and distribution all combine to make-or-break data-value

Customer data integration - History of customer data integration 1 In the late 1990s Acxiom and GartnerGroup coined the term customer data integration (CDI). The process of CDI, as Acxiom and Gartner described it, includes:

Customer data integration - History of customer data integration 1, service providers deliver CDI as a hosted solution in batch volumes, on demand using a software as a service (SaaS) model, or on-site as licensed software in companies and organizations with the resources to drive their own data integration processing

Pentaho Data Integration 1 It offers a suite of open source Business Intelligence (BI) products called Pentaho Business Analytics providing data integration, OLAP|OLAP services, reporting, Dashboards (management information systems)|dashboarding, data mining and Extract, transform, load|ETL capabilities. Pentaho is headquartered in Orlando, FL, USA.

Pentaho Data Integration - Social Media Communication 1 * 'Matt Casters', founder and developer of Pentaho Data Integration (PDI/Kettle)Matt Casters, [ matt casters on data integration] Retrieved July 27, 2012 and author of the book Pentaho Kettle SolutionsMatt Casters, Bouman, Dongen, Wiley [ oductCd html Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration] September

Data integration 1 'Data integration' involves combining data residing in different sources and providing users with a unified view of these data.

Data integration 1 In management circles, people frequently refer to data integration as Enterprise Information Integration (EII).

Data integration - History 1 the trend in data integration has favored loosening the coupling between data and providing a unified query-interface to access real time data over a data mediation|mediated schema (see figure 2), which allows information to be retrieved directly from original databases

Data integration - History 1 This approach represents ontology based data integration|ontology-based data integration

Data integration - Example 1 These adapters simply transform the local query results (those returned by the respective websites or databases) into an easily processed form for the data integration solution (see figure 2)

Data integration - Theory of data integration 1 The theory of data integration forms a subset of database theory and formalizes the underlying concepts of the problem in first-order logic. Applying the theories gives indications as to the feasibility and difficulty of data integration. While its definitions may appear abstract, they have sufficient generality to accommodate all manner of integration systems.

Data integration - Definitions 1 When users pose queries over the data integration system, they pose queries over G and the mapping then asserts connections between the elements in the global schema and the source schemas.

Data integration - Definitions 1 The burden of complexity falls on implementing mediator code instructing the data integration system exactly how to retrieve elements from the source databases

Data integration - Definitions 1 In a GAV approach to the example data integration system above, the system designer would first develop mediators for each of the city information sources and then design the global schema around these mediators

Data integration - Definitions 1 In an LAV approach to the example data integration system above, the system designer designs the global schema first and then simply inputs the schemas of the respective city information sources

Data integration - Query processing 1 The theory of query processing in data integration systems is commonly expressed using conjunctive Database query language|queries and Datalog, a purely declarative logic programming language

Data integration - Query processing 1 In terms of data integration, query containment represents an important property of conjunctive queries

Data integration - Query processing 1 In LAV systems, queries undergo a more radical process of rewriting because no mediator exists to align the user's query with a simple expansion strategy. The integration system must execute a search over the space of possible queries in order to find the best rewrite. The resulting rewrite may not be an equivalent query but maximally contained, and the resulting tuples may be incomplete. the MiniCon algorithm is the leading query rewriting algorithm for LAV data integration systems.

Ontology based data integration 1 'Ontology based Data Integration' involves the use of ontology (computer science)|ontology(s) to effectively combine data or information from multiple heterogeneous sources. It is one of the multiple data integration approaches and may be classified as Global-As-View (GAV). The effectiveness of ontology based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process.

Ontology based data integration - Background 1 Data from multiple sources are characterized by multiple types of heterogeneity. The following hierarchy is often used:[ aper/AHM02/tutorial5.html AHM02 Tutorial 5: Data Integration and Mediation; Contributors: B. Ludaescher, I. Altintas, A. Gupta, M. Martone, R. Marciano, X. Qian]

Ontology based data integration - Background 1 In domains like bioinformatics and biomedicine, the rapid development, adoption and public availability of ontologies [ ml#obo] has made it possible for the data integration community to leverage them for semantic integration of data and information.

Ontology based data integration - Approaches using ontologies for data Integration 1 There are three main architectures that are implemented in ontology-based data integration applications, namely,

Core data integration 1 'Core data integration' is the use of data integration technology for a significant, centrally planned and managed IT initiative within a company. Examples of core data integration initiatives could include:

Core data integration 1 Core data integrations are often designed to be enterprise-wide integration solutions. They may be designed to provide a data abstraction layer, which in turn will be used by individual core data integration implementations, such as ETL servers or applications integrated through EAI.

Core data integration 1 Because it is difficult to promptly roll out a centrally managed data integration solution that anticipates and meets all data integration requirements across an organization, IT engineers and even business users create edge data integration, using technology that may be incompatible with that used at the core. In contrast to a core data integration, an edge data integration is not centrally planned and is generally completed with a smaller budget and a tighter deadline.

Edge data integration 1 Many edge integrations, and actually the vast majority of all data integration, involves hand-coded scripts

Edge data integration 1 It has been claimed that edge data integration do not typically require large budgets and centrally managed technologies, which is in contrast to a core data integration.

For More Information, Visit: m/the-data-integration- toolkit.html m/the-data-integration- toolkit.html The Art of Service