1 Lecture 7 Distributed Data Bases: Principles and Architectures.

Slides:



Advertisements
Similar presentations
Database Architectures and the Web
Advertisements

Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
Manajemen Basis Data Pertemuan 9 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
File Systems and Databases
Distributed DBMSs A distributed database is a single logical database that is physically distributed to computers on a network. Homogeneous DDBMS has the.
Overview Distributed vs. decentralized Why distributed databases
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Introduction to Databases Transparencies
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
Chapter 12 Distributed Database Management Systems
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Chapter 1 Introduction to Databases
Distributed databases
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Distributed Databases and DBMSs: Concepts and Design
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
Introduction to Databases and Database Languages
Distributed Database The University of California Berkeley Extension Copyright © 2011 Patrick McDermott.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Module Title? DBMS Introduction to Database Management System.
Information Systems: Modelling Complexity with Categories Four lectures given by Nick Rossiter at Universidad de Las Palmas de Gran Canaria, 15th-19th.
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
2. Database System Concepts and Architecture
311: Management Information Systems Database Systems Chapter 3.
Session-9 Data Management for Decision Support
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
1.file. 2.database. 3.entity. 4.record. 5.attribute. When working with a database, a group of related fields comprises a(n)…
Distributed Database Systems Overview
Chapter 1 Introduction to Databases © Pearson Education Limited 1995, 2005.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
DDBMS Distributed Database Management Systems Fragmentation
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Distributed Databases
1 Chapter 1 Introduction to Databases Transparencies.
Distributed database system
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs.
1 Lecture 9: Distributed Databases – Principles and Architectures Advanced Databases CG096 Nick Rossiter.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Object storage and object interoperability
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
1 Lecture1 Introduction to Databases Systems Database 1.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Managing Data Resources File Organization and databases for business information systems.
Chapter 19: Distributed Databases
Distributed Database Management Systems
Introduction of Week 14 Return assignment 12-1
Presentation transcript:

1 Lecture 7 Distributed Data Bases: Principles and Architectures

2 Distributed Data Base (DDB) – Definition A logically interrelated collection of shared data, physically distributed over a computer network. Implies data description at two levels: Global (the view of the whole); Local (where data is actually held).

3 DDBMS (Distributed DBMS) The software system that permits the management of the distributed data base and makes the distribution transparent to users: Transparent – users are unaware of the underlying local structure: Data requests do not specify distribution sites -but they may notice performance differences (e.g. if local data has to be moved to another site along a slow line).

4 Characteristics of a DDB Collection of logically related shared data. Data is split into a number of fragments (horizontal or vertical – restrict or project). Fragments may be replicated. Fragments / replicas are allocated to sites. Fragments are in effect views; Replicas are duplicates – only acceptable if redundancy is controlled.

5 Why distribute? Natural match of data with location Can have each division, department or office hold its own data with some degree of autonomy. Autonomy – to have control (self-determination, self-rule) Users can decide policies locally (devolved). Still need global DBA to ensure entire system works.

6 Why distribute? (cont.) More flexible operation: Improved availability -one node failure does not bring the whole system down. Improved reliability -replication ensures that copies of data are still available if a node fails. Improved performance -accessing most data locally reduces network overheads. Readily handle expansion -can add new nodes with local schema -followed by simple adjustments to global definition.

7 Problems with Distribution Complexity Global and local schema must be integrated. Design techniques involve more stages. Replications rigorously handled. Network needs to be robust. Costs More people effort needed to handle the complexity –although cheaper to buy power with smaller machines rather than larger ones.

8 Problems with Distribution (cont.) Security Many more potential access points for would- be violators. Integrity Need to ensure that combination of local and global constraints gives the required effect. Experience Fairly immature technology –not yet translated to standards.

9 Homogeneous and Heterogeneous DDBMS A homogeneous DDBMS uses the same database product at all sites. A heterogeneous DDBMS uses different data base products at various sites –may arise from corporate mergers.

10 Degrees of Heterogeneity  Same software, different hardware can be handled fairly easily. Oracle 9i : Oracle 8i – differences slight. Oracle 9i : SQL Server – same underlying relational model, different syntax in places. Oracle 9i : Objectivity – object-relational (SQL-1999) and ODMG respectively, different underlying model.

11 Interoperability Ability to work with each other. In a loosely coupled environment: Full details of each system not needed –BUT need to have interfaces for reliably exchanging messages without error or misunderstanding Solutions: Standardized specifications; Mediation. Differences in implementation may still lead to breakdowns in communication.

12 Simple Problem in Interoperability - 1 Two schemas in SQL-1999: AB author varchar2(50),author_surname varchar2(40), author, initials varchar2(10), title varchar2(300),title varchar2(200), keyword1 varchar2(30),keywd keywordarr; keyword2 varchar2(30); CREATE TYPE keywordarr AS VARRAY(8) OF varchar2(30); Note: homogeneous model – both SQL-1999 – but difficulties.

13 Different Standards e.g. Names: Person (surname, first_name, …) or Person (first_name, surname, …) or Person (name, …) First two may easily be made equivalent but convention in third needs to be understood. Note also possibilities of A.N.Other, AN Other, A N Other, etc.

14 Possible Solutions In schema B, define a function which amalgamates the two parts of author into one value. Will need to look manually at format of author in schema A. If format inconsistent, need some pre-processing. Other inconsistencies require decisions: Fixed two entries for keyword vs. array dimension 8. Different name for keyword attribute. Different size for title fields (presumably adopt higher). In a heterogeneous environment, we need also to relate schema constructions, e.g. is CLASS the same as TABLE?

15 Simple Problem in Interoperability - 2 Homogeneous Models The same information may be held as attribute name, relation name or a value in different data bases. e.g. library fines could be held in a dedicated relation Fine (amount, borrower_id) –or as an attribute of Loan (id, isbn, date_out, fine) –or as a value Charge (1.25, ‘fine’).

16 Architectures for Interoperability 1 1. Global schema integration Produces a single new schema (C) for the different information systems with schemas (A, B). A C B

17 Global Schema Integration Advantages Transparent to end users – appears as a single information system. Disadvantages Difficult to perform integration – needs human understanding. Local autonomy lost. Static – does not evolve automatically. Tightly-coupled.

18 Architectures for Interoperability 2 2. Federated Data Base Systems Less tightly coupled schema than in 1. Each service specifies sharable objects through an export schema. Common data model. Internal command language. Decentralised control (local autonomy). 5-level architecture for federated system. e.g. Objectivity as Federated OODBMS

19 FDBMS Terminology IS – Internal Schema defining layout on disk of a conceptual schema CS – Conceptual Schema defining logical data base (e.g. relational – tables, attributes, domains). ES – External Schema defining views on conceptual schema.

20 Federated DBMS – 5-level Architecture Local CS Local IS DB Global CS Global ES Local ES DB

21 Federated Data Base: Loosely coupled Created by users. A E, B E are export schemas. V is a view. A, B are base schemas, retaining autonomy over those parts not exported. A B V AEAE BEBE

22 Federated Data Bases: Tightly coupled Created by administrators. Global schema integration on all export schemas. More formal than loosely-coupled. Much effort to resolve semantic inconsistencies.

23 Federated Data Base Systems – General Advantages Local autonomy preserved. Not all data needs to be integrated. Provide meta-data structures for views (external and export schema, data dictionary).

24 Federated Database Systems - Disadvantages by Approach Loosely coupled Duplication by different users in building views. Updating data defined in views can be difficult. Tightly coupled Similar to global schema integration: Complex, difficult to make changes dynamically. Much effort needed to resolve semantic inconsistencies.

25 Multi-Data-Base Language Approach No attempt at schema integration. All sites maintain complete autonomy. The various schemas can be heterogeneous, inconsistent w.r.t. services provided, and duplicate information in different ways. Language (e.g. MSQL) is used to integrate data bases at run time. Relational model used as Common Data Model.

26 Multi-Data-Base Language Approach A, B are schemas. MSQL is the run-time language. A B MSQL

27 Multi-Data-Base Language Approach – Advantages No preparatory work to understand semantics of schema. Dynamic – access latest versions. Very skilled users can succeed in reaching their goals. Interesting work on multi-data-base dependencies.

28 An Example Multi-Data-Base Language MSQL (Multidatabase SQL) Biased towards the relational model. Illustrates problems. Consider 2 data bases: Each on publications of a computing society; and query: “What is the name, address, title for each publication of an author appearing in both of the society’s data bases?”

29 MSQL Schema Schema 1 (for AIIA Database): Contacts (PersonID, Name, , …) Conference (Name, Type, …) Attendees (ID, Conf_ID, Speaker, …) Publ_Papers (P_ID, Title, Author_ID, …) Schema 2 (for IFIP Database): Member_Socs (Soc_Name, …) Conf (Conf_ID, …) Publ_Papers (P_Ref, Title, Conf_Ref, …) Authors (Name, , Paper_ID, …) Underlined attributes are primary keys. Attributes in italics are foreign keys.

30 MSQL for Query USE AIIA, IFIP SELECT Name, , Title FROM Authors, IFIP.Publ_Papers IFIP_Paper, Contacts, AIIA.Publ_papers AIIA_Paper WHERE Authors.Name = Contacts.Name AND Contacts.Person_ID = AIIA_Paper.Author_ID AND Authors.Paper_ID = IFIP_Paper.P_Ref; The USE clause declares the multi-data-bases which are used as qualifiers in the FROM clause to distinguish tables with the same name (thereafter distinguished by aliasing). Retrieves Name, address and Title from both data bases.

31 Potential Problems with MSQL Are names and domains of corresponding attributes the same? Can use LET command to create equivalences of names, but this does not solve domain incompatibility. What if one schema is not relational? The E-R model is often used as a neutral schema for translation and comparison of heterogeneous features.

32 Multi-data-base Language – Disadvantages in General Distribution is not transparent. Users must resolve inconsistencies themselves. Common language may restrict scope of heterogeneity (relational bias). Local autonomous systems may change their schema freely (so existing queries fail).

33 Comparison of Approaches By coupling: How tightly is the interoperable system connected to its underlying systems? By adaptability: How freely can the interoperable system evolve in line with the underlying schema? By transparency: How much understanding of the interoperable system do end-users need to have?

34 Comparison of Approaches Approach Coupling AdaptabilityTransparency Global Schema TightLowHigh Integration Federated MediumMediumMedium Data Bases Multi-data-base LooseHighLow Languages

35 Summary Trend: From Global Schema Integration –through Federated Data Bases –to Multi-data-base Language. Towards looser coupling, higher adaptability, and lower transparency.

36 Further Reading Management of Heterogeneous and Autonomous Database Systems Elmagarmid, Ahmed Rusinkiewicz, Marek Sheth, Amit Morgan Kaufmann (1999).