INTEGRATION INTEGRATION Ramon Lawrence University of Iowa

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Configuration management
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
The database approach to data management provides significant advantages over the traditional file-based approach Define general data management concepts.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
Unity Demonstration Dr. Ramon Lawrence University of Iowa Dr. Ramon Lawrence University of Iowa
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
Page 1 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Querying Relational Databases without Explicit Joins.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Integrating Relational Database Schemas using a Standardized Dictionary.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Lecture Microsoft Access and Relational Database Basics.
Organizing Data & Information
Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary
Page 1 MDBS Schema Integration: The Relational Integration Model Ramon Lawrence MDBS Schema Integration: The Relational Integration Model Candidacy Exam.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Information Technology in Organizations
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
BUSINESS DRIVEN TECHNOLOGY
Automatic Data Ramon Lawrence University of Manitoba
Week 2 Lecture 2 Structure of a database. External Schema Conceptual Schema Internal Schema Physical Schema.
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Chapter 2 Introduction to Database Development Database Processing David M. Kroenke © 2000 Prentice Hall.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Quete: Ontology-Based Query System for Distributed Sources Haridimos Kondylakis, Anastasia Analyti, Dimitris Plexousakis Kondylak, analyti,
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Chapter 4: Organizing and Manipulating the Data in Databases
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
Fundamentals of Information Systems, Fifth Edition
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
Introduction to MDA (Model Driven Architecture) CYT.
CHAPTER 8: MANAGING DATA RESOURCES. File Organization Terms Field: group of characters that represent something Record: group of related fields File:
Page 1 Composing Mappings between Schemas using a Reference Ontology - ODBASE’04 - Eduard Dragut, Ramon Lawrence Composing Mappings between Schemas using.
Configuration Management (CM)
Using SAS® Information Map Studio
Automatic Integration of Relational Database Systems Ramon Lawrence University of Manitoba Ramon Lawrence University of Manitoba.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Database A database is a collection of data organized to meet users’ needs. In this section: Database Structure Database Tools Industrial Databases Concepts.
1.file. 2.database. 3.entity. 4.record. 5.attribute. When working with a database, a group of related fields comprises a(n)…
Dimu' Rumpak © 2009 by Prentice Hall 1 Getting Started Didimus Rumpak, M.Si. Database Concepts Chapter 1 1.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Lesson Overview 3.1 Components of the DBMS 3.1 Components of the DBMS 3.2 Components of The Database Application 3.2 Components of The Database Application.
3-Tier Client/Server Internet Example. TIER 1 - User interface and navigation Labeled Tier 1 in the following graphic, this layer comprises the entire.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Session 1 Module 1: Introduction to Data Integrity
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Physical Layer of a Repository. March 6, 2009 Agenda – What is a Repository? –What is meant by Physical Layer? –Data Source, Connection Pool, Tables and.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
XP Chapter 1 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Level 2 Objectives: Understanding and Creating Table.
Building Enterprise Applications Using Visual Studio®
Database Systems: Design, Implementation, and Management Tenth Edition
Information Systems Today: Managing in the Digital World
Web Ontology Language for Service (OWL-S)
Databases and Information Management
MANAGING DATA RESOURCES
Databases and Information Management
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
The ultimate in data organization
Query Optimization.
Presentation transcript:

INTEGRATION INTEGRATION Ramon Lawrence University of Iowa

USING UNITY Ken Barker University of Calgary

Summary The Unity prototype tackles the schema integration problem by constructing an integrated, global view in a bottom-up approach. u Constructing a global view in this manner requires describing data source semantics using a dictionary and a XML-based language. The extraction process, which is semi-automatic in nature, is separated from the integration process. u Thus, the integration process is automatic, and there is no requirement for a global human integrator. Systematic naming using a dictionary allows global queries to be graphically constructed without specifying joins between global relations. u The global view produced demonstrates properties similar to a dynamically constructed Universal Relation.

Benefits and Contributions The architecture automatically integrates relational schemas into a global view for querying. Unique contributions: u Synthesizing a global view from the bottom-up instead of top-down improves integration scalability. u Organizing the global view as a hierarchy of concepts instead of relations or predicates simplifies querying as the user does not have to specify specific relations or join conditions. This is called Querying by Context (QBC). u Query processing is achieved by dynamically discovering extraction rules based on the naming of fields and tables. ïThe discovered rules are similar to the extraction rules of global- as-view (GAV) systems.

Unity Overview Unity is a software package that performs bottom- up integration with a GUI. u Developed using Microsoft Visual C++ 6 and Microsoft Foundation Classes (MFC). Unity allows the user to: u Construct and modify standard dictionaries. u Build X-Specs to describe data sources including extraction of metadata using ODBC and mapping system names to dictionary terms. u Integrate X-Specs into an integrated view. u Transparently query integrated systems using ODBC and automatically generate SQL queries.

Architecture Components The architecture consists of four components: u A standard dictionary (SD) to capture data semantics ïSD terms are used to build semantic names describing semantics of schema elements. u X-Specs for storing data source descriptions ïRelational database info. stored and transmitted using XML. ïStores semantic names to describe schema elements. u Integration Algorithm ïIdentical concepts in different databases are identified by similar semantic names. ïProduces an integrated view of all database concepts. u Query Processor ïAllows the user to formulate queries on the view. ïTranslates from semantic names in integrated view to SQL queries and integrates and formats results. s Involves determining correct field and table mappings s and discovery of join conditions and join paths.

Querying by Context (QBC) Querying by context (QBC) is a methodology for querying relational databases by semantics. u Querying is performed by selecting semantic names that represent query concepts from the integrated view. u The integrated, context view contains all concepts present in the databases referenced by semantic names. Query by Context performs dynamic closure relating concepts for the user as they browse the integrated view. u This allows a limited form of recursive queries and eliminates the need for the user to specify joins. The query processor maps the user’s selections and criteria to an actual SQL query.

References Publications: u Unity - A Database Integration Tool, R. Lawrence and K. Barker, TRLabs Emerging Technology Bulletin, Jan u Multidatabase Querying by Context, R. Lawrence and K. Barker, DataSem2000, pages , Oct u Integrating Relational Database Schemas using a Standardized Dictionary, SAC’ ACM Symposium on Applied Computing, pages , March u Querying Relational Databases without Explicit Joins DASWIS International Workshop on Data Semantics in Web Information Systems (with ER'2001), Nov Further Information: u

Integration Example

BodyWorks Systems Web Server Custom Accounting Package Shipment Tracking Software Customer Order Database Invoice Database Shipment Database Bodyworks is a fictional company with 3 legacy databases that must be integrated for management reporting.

Query-Driven Data Extraction Invoice Database Order Database Shipment Database Unity Software ODBC Querying Integrated Context View Query Processor and ODBC Manager X-Spec Editor Standard Dictionary Integration Algorithm

Integration is performed with 3 separate processes: u Capture process: independently extract database schema information into a XML document called a X-Spec. ïThis process is a semi-automatic description using a dictionary. u Integration process: combines X-Specs into a structurally-neutral hierarchy of database concepts called an integrated context view. ïThis process performs automatic name matching, but imprecision may occur. u Query process: allows the user to formulate queries on the integrated view that are mapped by the query processor to structural queries (SQL), executed using ODBC, and the results are combined using global keys. ïUsers do not have to specify joins when querying the global view. Integration Processes

The Unity Prototype

What is the open problem? The GAV and LAV approaches are both viable methods for solving data integration. However, the open problem is that neither approach performs schema integration - the construction of the global view itself. u GAV - GV constructed (schema integration performed) by global designer when specifying extraction rules. u LAV - GV is pre-defined using some previous integration process (most likely manual in nature). u Both methods rely on the concept of a global user to create the global schema.

How Unity is Different Our integration architecture called Unity is different because it approaches the integration problem from a different perspective: Thus, the integration problem is tackled from a different set of starting assumptions: u Do not assume pre-existing or manually created GV. u However, assume we have a dictionary and a language for describing schema and data element semantics. u Attempt to automatically build a GV from source descriptions of each data source. How can we automate, or semi-automate, the construction of the global view by extracting information from the local data sources?

The Unity Approach Given a set of data sources and a dictionary and a language to describe data semantics: u 1) Semi-automatically extract and represent data source semantics in the language using the dictionary. u 2) Automatically match concepts across data sources by using the dictionary to determine related concepts. ïThis process effectively builds the global level relations or objects initially assumed or created in other approaches. ïHowever, since there is no manual intervention, the precision of global view construction is affected by inconsistencies in the descriptions of the data sources and matching concepts. u 3) Automatically generate queries specified by the user using dictionary terms (not structures) and map the user's query to appropriate data elements in the local sources.

What is wrong with SQL? There is nothing wrong with SQL. However, SQL is not a simple query language for many reasons: u Querying by structure does not hide complexities introduced due to database normalization. u Structures (fields and tables) may be assigned poor names that do not adequately describe their semantics. u Notion of a “join” is confusing for beginner users especially when multiple joins are present. u SQL forces structural access which does not provide logical query transparency and restricts logical schema evolution. u Querying multiple databases (without a global view) using SQL-variants is complex because naming and structural conflicts must be resolved during query formulation.