Distributed Database System

Slides:



Advertisements
Similar presentations
Distributed databases 1. 2 Outline introduction principles / objectives problems.
Advertisements

1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Database Systems: Design, Implementation, and Management
Distributed Databases and Its Twelve Objectives CS157B Name: Yingying Wu Professor: Sin-Min Lee Reference Book: An introduction to Database Systems By.
V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS.
Enterprise Systems Distributed databases and systems - DT
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.
Overview Distributed vs. decentralized Why distributed databases
1 © Prentice Hall, 2002 Chapter 13: Distributed Databases Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
1 Distributed Databases Chapter What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 12 Distributed Database Management Systems
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Databases
DISTRIBUTED DATABASE MANAGEMENT SYSTEM CHAPTER 07.
Outline Introduction Background Distributed Database Design
Distributed databases
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Distributed Databases
Distributed Databases and DBMSs: Concepts and Design
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
Distributed Database The University of California Berkeley Extension Copyright © 2011 Patrick McDermott.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Session-8 Data Management for Decision Support
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Distributed Databases Reference Books: An introduction to Database Systems - By C.J. Database Systems and Concepts – Silberchatz, Korth and Sudarshan Lecture.
Distributed Database Systems Overview
Distributed Databases Midterm review. Lectures covered Everything until (including) March 2 nd Everything until (including) March 2 nd Focus on distributed.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
DDBMS Distributed Database Management Systems Fragmentation
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Distributed Databases
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Distributed database system
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Chapter 13 Client/Server Database and Distributed Database Fundamentals of Database Management Systems by Mark L. Gillenson, Ph.D. University of Memphis.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Distributed DBMSs – Concepts and Design Chapter 24 in Textbook.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Database Concepts
Chapter 19: Distributed Databases
Distributed Databases
Distributed Databases
Presentation transcript:

Distributed Database System

Definition A distributed Database System consists of a collection of sites, connected together via some kind of communication network, in which: Each site is a full database system site in its own right, but The sites has agrees to work together so that a user at any site can access data anywhere in the network exactly as if the data were all stored at the user’s own site.

London New York Database Server Communication network Database Workstation Database Communication network Database Server New York London

Each site is a database system site in its own right. Each site has it own local “real” database, Its own local users, Its own local DBMS and transaction Management software including it own local locking, logging recovery, and etc.), Its own local data communication manager (DC manager) Overall distributed system can thus be A kind of partnership among the individual local DBMSs at the individual local site It has a new S/W component at each site – logical extension of the local DBMS – provides the necessary partnership functionality, and it is the combination of these new components together with the existing DBMSs that constitutes what is usually called the “Distributed Database Management System” Overall distributed system แต่ละ site หรือระบบจัดการฐานข้อมูลแต่ละตัวเป็นหุ้นส่วนกัน

Distributed Database (DDB) DDB is a collection of multiple logically interrelated database distributed over a computer network, and a distributed database management system (DDBMS) as a software system that manages a distributed database while making the distribution transparent to the user.

Type of Distributed Database system Heterogeneous DDBMS Homogenous DDBMS

Advantage of Distributed Database Why desirable? The enterprise are usually distributed Data is usually distributed Each organization unit maintain data that is relevant to its own. Total information asset of the enterprise is thus splintered into what are sometime called “islands of information” And what a Distributed system does is provide the necessary “bridge” to connect those islands together. It enables the structure of database to mirror the structure of the enterprise Local data can be kept locally, where it most logical belongs While at the same time remote data can be accessed when necessary Relevant เกี่ยวเนื่อง Splinter ๑. ชิ้นเศษ, เสี้ยน ๒. แตกเป็นเสี่ยง

Banking system SF accounts keep in SF, NY accounts keep in NY… The advantage are surely obvious: “The distributed arrange combines efficiency of processing (the data kept close to the point where it is most frequency used” with increased accessibility (it is possible to access a LA account from SF, via the communication network) New York Atlanta San Francisco Los Angeles Communication Network

New York Atlanta Chicago San Francisco Los Angeles Communication (headquarter) San Francisco Los Angeles Communication Network Employees_All Projects_All Works_on_All Employees_New York Works_on_New York Employee Employees_Atlanta Projects_Atlanta Works_on_Atlanta Employee Employees_Los Angeles Projects_Los Angeles and San Francisco Works_on_LosAngeles Employee Employees_San Francisco and Los Angeles Projects_San Francisco Works_on_San Francisco Employees

Function of distributed database Keeping track of data Distributed query processing Distributed transaction management Replicated data management Distributed data management Distributed data recovery Security Distributed directory (catalog) management

Fundamental Principle (Rule zero) The fundamental principle of distributed database “To the user, a distributed system should look exactly like a non-distributed system” Local autonomy No reliance on a central site Continuous operation Location independence Fragmentation independence Replication independence Distributed query processing Distributed transaction management Hardware independence Network independence DBMS independence

Local autonomy The sites in a distributed system should be autonomous. Local autonomous means that all operations at a given site are controlled by that site; No site X should depend on site Y for its successful operation Local data is locally owned and managed, with local accountability; all data “really” belongs to some local database, even if it is accessible from other site Integrity, security and physical storage representation of local data remain under the control and jurisdiction of the local site autonomous อิสระ

No reliance on a central site All site must be treated as equals. They must not be any reliance on a central “master” site for some central service – for example Central query processing Central transaction management Centralized naming services The entire system is dependent on that central site Why don’t need First, that central site might be a bottleneck; Second, the system would be vulnerable if the central site went down, the whole system would be down (“The single point of failure” problem) Reliance = dependence ความไว้ใจ, ความเชื่อถือ Bottleneck จุดติดขัด Vulnerable = that can be wounded or physically injured

Continuous operation Unplanned shutdowns are undesirable The advantage of distributed systems is provide greater reliability and greater availability Reliability is the probability that the system is up and running any given moment. Availability is the probability that the system is up and running continuously (available) during a time interval. Unplanned shutdowns are undesirable Planned shutdowns should never be required; That is it should never necessary to shut the system down in order to perform a task such as adding a new site or upgrading the DBMS at an existing site to a new release version. Reliability ความเชื่อถือได้ Availability สภาพพร้อมใช้งาน

Location independence/transparency Basic idea Users should not have to know where data is physically stored, but rather should be able to perform – at least from a logical standpoint – as if data were all stored at their own local site. Desirable because It simplifies application programs and user activities It allows data to migrate from site to site without invalidating any of those program and activities (migratability is desirable because it allows data to be move around the network in respond to changing performance requirement)

Fragmentation independence / transparency Support data fragmentation Base table can be divided into pieces or fragments for physical storage purposes, and distinct fragments can be stored at different sites. Fragmentation is desirable for performance reasons: Data can be stored at the location where it is most frequently used, so that most operations are local and network traffic is reduces

User perception Emp Emp# Dept# Salary E1 D1 40000 E2 D1 42000 E3 D2 30000 E4 D2 35000 E5 D3 48000 Site A Site B S1_Emp S2_Emp Emp# Dept# Salary E1 D1 40000 E2 D1 42000 E5 D3 48000 Emp# Dept# Salary E3 D2 30000 E4 D2 35000 FRAGMENT EMP AS S1_EMP AT SITE ‘SITE_A’ WHERE DEPT# = DEPT#(‘D1’) OR DEPT# = DEPT#(‘D3’) S1_EMP AT SITE ‘SITE_B’ WHERE DEPT# = DEPT#(‘D2’)

Fragmentation Fragmentation type Horizontal Fragmentation Vertical Fragmentation Reconstructing the original base relvar from the fragments in done via suitable join (for vertical) and union operations (for horizontal) Fragmentation independence implies that users will be presented with a view of the data in which the fragments are logically recombines by means of suitable joins and unions. (no fragmentation) The optimizer responds to determine which fragments need to be physically accessed in order to satisfy any given user request. Emp where salary > 40000 and dept# = dept#(‘D1’) Optimizer will know from the fragment definitions (in catalog) that the entire result can be obtained from site_A

Replication independence Support Data replication Desirable because First, it can mean better performance. Application can operate on local copies instead of having to communicate with remote sites. Second, it can also mean better availability. A given replicated object remains available for processing – at least for retrieval as long as at least one copy reminds available Disadvantage A given replicated object is updates, all copies of that object must be updated (the update propagation)

Replication transparency Transparency to the user User should be able to behave, at least from a logical standpoint, as if the data were in fact not replicated at all. Desirable It simplifies application programs and end-user activities; It allows replicas to be created and destroy anytime in response to changing requirements, without invalidating any of those programs or activities. Replication independence implies that it is the responsibility of the optimizer to determine which replicas physically need to be access in order to satisfy any given user request.

Site_A Site_B S2_Emp S1_Emp Emp# Dept# Salary E3 D2 30000 E4 D2 35000 Emp# Dept# Salary E1 D1 40000 E2 D1 42000 E5 D3 48000 Emp# Dept# Salary E1 D1 40000 E2 D1 42000 E5 D3 48000 Emp# Dept# Salary E3 D2 30000 E4 D2 35000 S12_Emp (S2_EMP replica) S21_Emp (S1_EMP replica)

Distributed query processing In distributed system data store in many sites and may replicate Optimization is even more important in a distributed system that it is in a centralized one. Query that involve several sites A B CN

Database S{S#,CITY} 10,000 stored at Site A P{P#,Color} 100,000 stored at site B SP{S#,P#} 1,000,000 stored at A Assume every stored tuple is 25 bytes (200bits) Query (Get supplier numbers for LD suppliers of red parts” ((S JOIN SP JOIN P) where CITY = “LD” and COLOR = (‘Red’)) {S#} Estimated cardinalities of certain intermediate results: Number of read parts = 10 Number of shipments by LD suppliers = 100,000 Communication assumptions: Data Rate = 50,000 bits per second Access delay = 0.1 second

6 strategies for processing this query and for each i calculate the total communication time Ti from the formula (total access delay) + (total data volume/data rate) Become in second No of message/10 + No of bits/50000 Move parts to Site A and process the query at A T1 = 0.1 + (100000 * 200) /50000 Move supplier and shipments to site B and process the query at B T2 = 0.2 + ((10000 + 100000) * 200)/5000 Join suppliers and shipments at site A, restrict the result to LD suppliers and then, for each of those supplier in turn, check site B to see whether corresponding part is red. Each of these checks will involved 2 messages – a query and a respond. The transmission time for these messages will be small compared with the access delay T3 = 20000 seconds approx.

Restrict parts at site B of those the red, and then, for each of those parts in turns, check site A to see whether there exists a shipment relating the part to a LD supplier. Each of these checks will involve 2 messages; transmission time for these message will be small compared with the access delay T4 = 2 seconds approx. Join supplier and shipments at site A, restrict the result to LD suppliers, project the result over S# and P#, and move the result to site B. Complete the processing at site B T5 = 0.1 + (10000 * 200)/50000 Restrict parts at site B to those that are red and move the result to site A. complete the processing at site A T6 = 0.1 + (10 * 200) / 50000

Distributed transaction management Related to Transaction management Recovery and concurrency 2 phase commit Prepare phase Commit phase (see the previous slide) behalf of = in the interest of

Hardware / Network / DBMS independence