Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary

Slides:



Advertisements
Similar presentations
Information Systems Today: Managing in the Digital World
Advertisements

The Database Environment
Management Information Systems, Sixth Edition
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
Data Warehousing M R BRAHMAM.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Unity Demonstration Dr. Ramon Lawrence University of Iowa Dr. Ramon Lawrence University of Iowa
Page 1 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Querying Relational Databases without Explicit Joins.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
ETEC 100 Information Technology
Integrating Relational Database Schemas using a Standardized Dictionary.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Lecture Microsoft Access and Relational Database Basics.
1 Introduction The Database Environment. 2 Web Links Google General Database Search Database News Access Forums Google Database Books O’Reilly Books Oracle.
Page 1 MDBS Schema Integration: The Relational Integration Model Ramon Lawrence MDBS Schema Integration: The Relational Integration Model Candidacy Exam.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Data and Knowledge Management
Automatic Data Ramon Lawrence University of Manitoba
INTEGRATION INTEGRATION Ramon Lawrence University of Iowa
Chapter 13 The Data Warehouse
Academic Year 2014 Spring.
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
An Introduction to Database Management Systems R. Nakatsu.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Enables businesses achieve greater efficiency by sharing data and processes Shared application data across legal entities— party, location, products…
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
1 Introduction to databases concepts CCIS – IS department Level 4.
Database Design - Lecture 1
Managing Data Resources
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
DBS201: DBA/DBMS Lecture 13.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Introduction to MDA (Model Driven Architecture) CYT.
Page 1 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Composing Mappings between Schemas using.
© 2007 by Prentice Hall 1 Introduction to databases.
Using SAS® Information Map Studio
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.
Automatic Integration of Relational Database Systems Ramon Lawrence University of Manitoba Ramon Lawrence University of Manitoba.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Overview of the SAS® Management Console
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
1 © 1999 Microsoft Corp.. Microsoft Repository Phil Bernstein Microsoft Corp.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Web-site Building Methodologies Current Research.
1 Chapter 1 Introduction to Databases Transparencies.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Object storage and object interoperability
Introduction to Active Directory
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
Manufacturing Systems Integration Division Development Process and Testing Tools for Content Standards Simon Frechette National Institute of Standards.
Main tasks of system analysis ? 1-study exit=sting information system 2-identify problem 3-spelify system requirement 4-asalysis decision ========= How.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Metadata Driven Aspect Specification Ricardo Ferreira, Ricardo Raminhos Uninova, Portugal Ana Moreira Universidade Nova de Lisboa, Portugal 7th International.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Enables businesses achieve greater efficiency by sharing data and processes Shared application data across legal entities – Party, Location, Products,
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.
September 20051© GEFEG – - Context Inspired Component Architecture Creating ASC X12 CICA Constructs with the CICA Editor.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Information Systems Today: Managing in the Digital World
Data Warehouse.
MANAGING DATA RESOURCES
MANAGING DATA RESOURCES
Presentation transcript:

Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary

Introduction Integration of data is required when accessing multiple databases within an organization or on the WWW. Our focus is automatically combining database schema using schema integration. Schema integration requires knowledge of data semantics and use of metadata.

Motivation Organizations have several database systems which must interoperate. Users often access multiple Web databases whose knowledge must be integrated and presented in a useful form. Data warehouses and OLAP systems require data semantics to be understood and data to be cleansed and summarized.

Background Schema integration involves combining diverse database schema into an integrated view by resolving conflicts. Schema conflicts include naming, structural, and semantic conflicts. Schema integration is required for database interoperability, but it is currently a manual process.

Previous Work Research systems: –integrating systems by logical rules (Sheth) –defining global dictionaries (Castano) –Carnot Project using the Cyc knowledge base Industrial systems and standards: –Metadata Interchange Specification (MDIS) –XML, BizTalk, E-commerce portals

Architecture Components: The Global Dictionary A global dictionary (GD) provides standardized terms to capture data semantics. –Hierarchy of terms related by IS-A or Has-A links –Contains base set of common database concepts, but new concepts can be added A GD term is a single, unambiguous semantic definition. –Several GD entries for a single English word are required if the word has multiple definitions.

Architecture Components: Using the Global Dictionary GD terms are used to build semantic names to describe the semantics of schema elements. Semantic names have the form: –semantic name = “[“CT [[;CT] | [,CT]] “]” CN –CT = context term, CN = concept name –each CT and CN is a single term from the GD Semantic names are included in RIM specifications describing a data source.

Architecture Components: The Relational Integration Model Database metadata and semantic names are combined into Relational Integration Model (RIM) Specifications (RIM Specs) –contains information on a relational schema –organized into database, table, and field levels –stores semantic names to describe and integrate schema elements

Architecture Components: Integrating RIM Specs Each database to be integrated is described using a RIM specification. Identical concepts in different databases are identified by similar semantic names. Concepts with identical (or hierarchially related) semantic names are combined regardless of their physical representation in the individual databases.

Integration Architecture Our integration architecture consists of two separate phases: –capture process: RIM specs are constructed for each data source independently –integration process: RIM specs are combined using the integration algorithm which matches semantic names using the global dictionary

Integration Architecture: The Capture Process Capture process involves: –automatically extracting the schema information and metadata using a specification editor –assigning semantic names to each schema element (tables and fields) to capture their semantics

Integration Architecture: The Capture Process Relational Schema Global Dictionary RIM Spec Specification Editor Automatic Extraction DBA Lookup of terms

Integration Architecture: The Integration Process Integration process involves: –automatically identifying identical concepts by matching semantic names –constructing a global view of database concepts consisting of a hierarchy of concept terms –resolving structural differences during query generation and submission (e.g. a concept may be represented as a table in one database and a field (attribute) in another)

Integration Architecture: The Integration Process Client RDBMS Integration Site Subtransactions Client …………. RDBMS …….. RIM spec

Integration Architecture Benefits The benefits of the two phase architecture are: –Dynamic integration: schemas integrated as needed –RIM Specs are constructed only once and independent of each other –Automatic conflict resolution by integrating based on semantic name rather than physical structure –Users are isolated from system names and organization by querying through a global view using semantic names for concepts

Integration Example Two claims databases to be integrated: –ABC Company: Claims_tb(claim_id, claimant, net_amount, paid_amount) –XYZ Company: T_claims(id, customer, claim_amt), T_payments(cid, pid, amount) First step is to construct RIM specs for each database.

Integration Example: ABC Database RIM Spec

Integration Example: XYZ Database RIM Spec

Integration Example: Integrated View Global view after integration: –[Claim] Id Net amount [Customer] –name [Payment] –id –amount

Integration Example: Discussion Important points: –system and field names are not presented to the user who queries based on semantic names –database structure is not shown to the user –different physical representations for the same concept are combined (e.g. payment (attribute) in ABC with payment table in XYZ database) –hierarchially related concepts (customer vs. claimant) are combined based on their IS-A relationship in the global dictionary

Applications to the WWW Integrating diverse data sources is involved in constructing a data warehouse and other operational systems. The WWW is a diverse organizations of databases which users access. Automatically integrating web data sources by a browser or portal reduces query complexity and integration of results for the user.

Conclusions Automatic integration of database schema is possible by using a global dictionary of terms and constructing semantic names for schema elements. Integration of data sources has applications to the WWW and construction of data warehouses.

Important Changes The integration architecture is constantly being refined. Some notable differences in this presentation versus the paper: –Our integration system uses XML to represent a RIM spec which is renamed as a X-Spec. –An integration site is used as a central portal for integration and management. –No longer using semantic distance calculations between terms. –Format of semantic name has been simplified.

Future Work The integration architecture is involving with standards on XML and now captures metadata information in XML documents. The system is being tested on sample problems, and a query mechanism is work- in-progress. We are refining a prototype of the system called Unity.