Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.

Slides:



Advertisements
Similar presentations
Prentice Hall, Database Systems Week 1 Introduction By Zekrullah Popal.
Advertisements

ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
ETEC 100 Information Technology
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
DATA WAREHOUSING.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
Methodology Conceptual Database Design
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
Distributed Databases and DBMSs: Concepts and Design
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
ITEC 3220A Using and Designing Database Systems
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Data Warehouse & Data Mining
Database Design - Lecture 1
DBS201: DBA/DBMS Lecture 13.
Information Systems: Modelling Complexity with Categories Four lectures given by Nick Rossiter at Universidad de Las Palmas de Gran Canaria, 15th-19th.
1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru.
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
Database System Concepts and Architecture
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Organizing Data and Information AD660 – Databases, Security, and Web Technologies Marcus Goncalves Spring 2013.
Architecture for a Database System
Session-9 Data Management for Decision Support
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
©2003 Prentice Hall Business Publishing, Accounting Information Systems, 9/e, Romney/Steinbart 4-1 Accounting Information Systems 9 th Edition Marshall.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
1 Data Warehouses BUAD/American University Data Warehouses.
2 Copyright © Oracle Corporation, All rights reserved. Defining Data Warehouse Concepts and Terminology.
Dimitrios Skoutas Alkis Simitsis
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
2Object-Oriented Analysis and Design with the Unified Process Objectives  Describe the differences and similarities between relational and object-oriented.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Databases Shortfalls of file management systems Structure of a database Database administration Database Management system Hierarchical Databases Network.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
1 By Paul Murray Claire McQuade Kashif Rafiq David Miller.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition Copyright © 2004 Pearson Education, Inc. Slide 2-1 Data Models Data Model: A set.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
CISB113 Fundamentals of Information Systems Data Management.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
©2003 Prentice Hall Business Publishing, Accounting Information Systems, 9/e, Romney/Steinbart 4-1 Relational Databases.
©2003 Prentice Hall Business Publishing, Accounting Information Systems, 9/e, Romney/Steinbart 4-1 Relational Databases.
Distributed DBMS Architecture Chapter 4 Principles Of Distributed Database Systems,2/e By Ozsu, Patrick Valduriez.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Building a Data Warehouse
Architetture della Informazione Anno accademico Carlo Batini Methodologies for planning the evolution of data architectures 1.
Data and Applications Security Developments and Directions
Data warehouse and OLAP
Data Warehouse—Subject‐Oriented
Data Warehouse.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Introduction of Week 9 Return assignment 5-2
Data Warehouse.
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Distributed Database Management System
Presentation transcript:

Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George

Data Integration2 Modelling & Data Integration Key Elements of today’s Presentation Key Drivers for Data Integration Dimensions and Issues in Integration Three Integration Approaches David George

Data Integration3 Drivers for Data Integration David George

Data Integration4 Drivers for Data Integration (1) Organisations evolving as global entities with distributed data. Systems characterised by mix of legacy and new databases and applications. Organisational change : Organic growth – size and diversity. Business re-engineering. Corporate mergers and acquisitions. David George

Data Integration5 Drivers for Data Integration (2) Organisations evolved as collections of distinct, autonomous departments with disconnected systems e.g. in financial services. Trends in Business Intelligence initiatives: Decision-making support. Customer segmentation. Marketing strategies. Development of distributed or multidatabase systems. David George

Data Integration6 Dimensions and Issues in Integration David George

Data Integration7 Architecture & Design Issues Multidatabase systems can be classified in two ways: Homogeneous systems – local databases having same techniques and language. Heterogeneous systems – local databases demonstrating diverse data models and language. Key Dimensions in systems heterogeneity System heterogeneity – hardware, OS, DBMS Semantic heterogeneity - models and data David George

Data Integration8 <<<< << Check Design >> >>>> Why Heterogeneity/Conflict? Translating conceptualisations of the real world into database world representations David George

Data Integration9 Research Work Conceptualised Books Model (a) The data of interest is about Books, their Publishers and adopting Universities. Publications Model (b) The data of interest is about Publications and their Types David George

Data Integration10 Publisher Topics BookUniversity Keywords Publication Published by Adopted by contains Refer to Title Word Title Name Code Name Address City Code Research Area Publisher David George Books Publications

Data Integration11 Keywords Word Publisher Topics BookUniversity Topics Publication Published by Adopted by contains Refer to Title Name Title Name Code Name Address City Code Research Area Name Publisher Published by David George A B

Data Integration12 Publisher Topics BookUniversity Publication Published by Adopted by Refer to Title Name Code Name Address City Code Research Area Published by David George Books and Publications Integrated contains

Data Integration13 Semantic Heterogeneity/Conflict Structural Conflicts Generalisation versus Specialisation Conflicts. Entity versus attributes. Naming conflicts. Attribute (Domain) Conflicts Data Type conflicts. Measure and Scale conflicts. Integrity, Presence & Absence. Data Values David George

Data Integration14 Semantic Heterogeneity/Conflict Generalisation/Specialisation Conflicts. (i.e. Structural) Naming conflicts. Synonyms e.g. vs Homonyms e.g. vs CustomerClient Market (Products) Market (Customers)

Data Integration15 Semantic Heterogeneity/Conflict Data Type (representation) conflicts. Student (integer or string) Student -No vs Name (integer or string) Measure and Scale etc conflicts. Dimension -volume vs weight Measure -light years vs miles Scale -miles vs kilometres Precision -1:100 versus A:E Date -dd/mm/yyyy vs mm-dd-yy ??? David George

Data Integration16 Semantic Heterogeneity/Conflict Integrity Constraints e.g. Age Range 18 Referential conflict 1:1 vs 1:M (e.g. 1 invoice for 1/ M orders) Presence/Absence. No null, nulls – e.g. optional No corresponding attribute Data Values Same items different values David George

Data Integration17 Integration Approaches David George

Data Integration18 Integration Approaches Federated Database (Multidatabase) Systems. Data Warehouse (Materialised in house) Systems. Mediators (Virtual integration) Systems. David George

Data Integration19 Federated Database Systems David George

Data Integration20 Federated Databases (1) David George

Data Integration21 Federated Databases (2) A Class of heterogeneous databases that: Consist of both new and old systems. Previously existed in their own stand-alone (autonomous) environments. Integration is a consequence of distribution. Organisation can adopt different architectures i.e. the way databases are mapped together: Loosely Coupled integrations. Tightly Coupled integrations. David George

Data Integration22 Federated Databases (3) Tightly Coupled Federations Federation administrator determines schema view for all component systems in the federation. Negotiates export schemas (tables and attributes) from federation participants who control exports of local schemas. Local schema exports integrated as a federated schema. Less autonomy at federation user level for view creation. David George

Data Integration23 Federated Databases (4) Loosely Coupled Federations The federated component databases have a greater degree of autonomy. No central schema view is imposed on users. Federated user is effectively an administrator creating views. User employs a MDB Query Language (v TC schema integration). David George

Data Integration24 Federated Databases (5) Sharing is made explicit by allowing export schemas from the local or component database. The export schemas are imported to the federation to represent the shareable federated database. Each source can call on others for information. FDBMSs differ from homogeneous Distributed DBMSs – they use the same data model and DBMS. DDBMSs sharing is therefore implicit. David George

Data Integration25 Data Warehousing Systems David George

Data Integration26 Data Warehousing (1) Local Operational Warehouse Decision Support & Mining NetworkInternet Integration & Storage David George R3R2

Data Integration27 Data Warehousing (2) Represents the physical separation of operational and decision support environments. Operational data provides the raw material for: Decision support systems. Data-mining (DM). E.g. identifying trends or characteristics. DM = process of “non-trivial extraction of implicit, previously unknown, and potentially useful information”. David George

Data Integration28 Data Warehousing (3) Warehouse integrates multiple, heterogeneous data sources - e.g. Relational DBs, flat files. Data is pre-fetched into a central or intermediate warehouse repository by mediation process. Data is “cleaned” and data integration techniques applied e.g. filtered, joined or aggregated. Data may be transformed to conform to the warehouse schema. Provides consistency in naming conventions, data structures, attributes, etc. David George

Data Integration29 Data Warehousing (4) Data then stored (materialised) in warehouse repository – possibly in separate data marts. Result is a repository of synthesised data for management decision-making. Queries are made over the repository’s global schema. Information is independent from the source data. Data extraction tends to be periodically. David George

Data Integration30 Mediator (+Wrapper) Systems David George

Data Integration31 Mediator Systems (1) Mediator NetworkInternet David George Query Translation

Data Integration32 Mediator Systems (2) Global schema created and mapped to the source schemas. User makes queries over global, mediated schema. Mappings can be either: Global-as-view (GAV). Local-as-view (LAV). Mediator translates global schema query and reformulates it into sub-queries of local schemas. Wrappers execute and return. David George

Data Integration33 Mediator Systems (3) Wrappers standardise how source information is described and accessed (i.e. they translate or adapt). Query answers are returned to the user on demand – after sources are interrogated. Thus data is always up-to-date (v. Warehousing). Mediators integrate information view, without integrating the source data. David George

Data Integration34 Mediator Systems (4) Results in a homogeneous information source using views - based on the mediated (global) schema. Integration is virtual i.e. retrieved by the mediator but not stored in any central repository. Differs from Warehousing Queries – where made to materialised data. In short – provides virtual source schema integration via schema mapping and integrated view. David George

Data Integration35 Comparisons David George

Data Integration36 Federation versus Warehousing & Mediation Federation represents a more “static” approach – using agreed couplings to allow view creation. Warehousing and Mediation addresses integration in a more “dynamic” way – using extraction, transformation and integration processes. David George

Data Integration37 Warehousing vs. Mediation Warehouse: Update-driven: i.e. in warehouse repository Heterogeneous data is integrated in advance and stored in- house for direct query and analysis. Mediation: Wrapper and Mediator layer on top of source DBs. Query-driven: Query to mediated schema then translated into queries appropriate to sources. Results integrated into a global answer set. David George

Data Integration38 Summary David George

Data Integration39 Summary Drivers for Data Integration Organisational change. Business Intelligence and Strategies. Integration Issues Different Conceptual Model representations. Resulting Semantic Heterogeneities. Integration Approaches Federated Systems. Data Warehousing and Mediator Systems. David George

Data Integration40 Next step …… David George

Data Integration41 Research Resources Reference Material Journals Books Presentation slides UCLAN Website Internal: External: David George