Distributed Databases

Distributed Databases

Outline Evolution of data processing What is a DDBMS?
Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS

Evolution of data processing
File-based system Each application defined and maintained its data Database technology (DBMS) Data is defined and administered centrally. A single logical database located at one site managed/controlled by a single DBMS Distributed database technology (DDBMS) Decentralization Allow users to access not only data at their own site but also data stored at remote sites.

File-based system prepared by:RdDB a collection of application programs that perform services for the end users such as the production of reports. EAF enrollment File Registrar Student Courses Faculty Data entry of courses Data entry of enrollment Generation of EAF

EAF enrollment File Registrar Student Courses Faculty Data entry of courses Data entry of enrollment File-based system prepared by:RdDB Each program in the system defines and manages its own data. [CBS98] struct course { char code[5]; char desc[20]; int units [3]; }

File-based systems File Registrar File Accounting File Department
Each user (with the assistance of DP staff) defines and implements (including storage and control) the files needed for a specific application. [CBS98, EN94] File-based systems prepared by:RdDB EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty

Student System (File-based)
What can be observed? Student System (File-based) Data redundancy prepared by:RdDB EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty

What can be observed? Separation and isolation of data prepared by:RdDB EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty

What can be observed? Program-data dependence Student System (File-based) prepared by:RdDB struct person { char first[20]; char middle[3]; char last[30]; } employees, managers; EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty

What can be observed? Incompatibility of files Student System (File-based) prepared by:RdDB COBOL EAF enrollment File Registrar Student Courses Faculty C OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty

What can be observed? Fixed queries; proliferation of application programs Student System (File-based) prepared by:RdDB EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty

Limitations of File-Based Systems
prepared by:RdDB Separation and isolation of data Duplication of data Program-data dependence Incompatibility of files (e.g C vs. COBOL) Fixed queries / proliferation of application programs

Factors that limit File-Based System
prepared by:RdDB The definition of data is embedded in the application programs, rather than being stored separately and independently. [CBS98] There is no control over the access and manipulation of data beyond that imposed by the application programs. [CBS98]

Database Approach

What is a database? prepared by:RdDB It is a shared collection of logically coherent data with some inherent meaning; It is designed , built and populated with data for specific purpose such as meeting the information needs of an organization; Ex: student database, employee database, library database, air flights database, hospital database, etc.

What is a DBMS? Database Management System
It is a software system that enables users to : define, create and maintain the database DDL DML provide controlled access to this database Security Integrity Concurrency control Recovery control User-accessible catalogue (description of data)

A Simple Database System Environment
prepared by:RdDB DATABASE SYSTEM DBMS Software Software to process programs/ queries access stored data Data Definition Application Programs/ Queries Database

University System: Database Environment
prepared by:RdDB Registrar DATABASE SYSTEM MySQL Software to process programs/ queries access stored data Data Definition Accounting Enrollment Payment Grade processing University database Department

What is a Distributed database?
A distributed database is a logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network.

What is a Distributed DBMS?
Distributed DBMS is the software system that permits the management of the distributed database and makes the distribution transparent to users.

Topology of DDBMS Site 1 DB Site 4 Site 2 Computer Network Site 3 DB

Topology of Distributed Processing
Site 1 Site 4 Site 2 Computer Network A centralized database that can be accessed over a computer network Site 3 DB

Motivation behind DDBMS
To integrate the operational data and provide controlled access to the data. This may imply centralization but it is not the intention To adopt a decentralize approach to data which mirrors the organizational structure of many companies. Each unit maintains its own data. Data is stored proximate to the location which would improve: Shareability of the data Efficiency of data access

Motivation behind DDBMS
To help resolve the islands of information problem: Geographical separation Incompatible computer architectures Incompatible communication protocols

(example from Stanford University)
Why Distribute?? Example: X Corp. has offices in London, New York, and Hong Kong. Employee data: EMP(ENUM, NAME, TITLE, SALARY, …) Where should the employee data table reside? (example from Stanford University)

X Corp. Data Access Pattern
Mostly, employee data is managed at the office where the employee works E.g., payroll, benefits, hire and fire Periodically, X Corp needs consolidated access to employee data E.g., X Corp. changes benefit plans and that affects all employees. E.g., Annual bonus depends on global net profit. (example from Stanford University)

London Payroll app New York Payroll app EMP London New York Internet Hong Kong Payroll app NY and HK payroll apps run very slowly! Hong Kong (example from Stanford University)

London Payroll app New York Payroll app London Emp NY Emp London New York Internet Hong Kong Payroll app Much better!! Hong Kong HK Emp (example from Stanford University)

Fundamental principle of DDBMS
The DDBMS is expected to make the distribution transparent (invisible) to the user. The objective of transparency: To make the distributed system appear like a centralized system.

Motivation behind DDBMS What is a distributed database system? Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS

Distributed Database System

Data is spread over multiple machines (also referred to as sites or nodes).
The computers may vary in size and function A Distributed System

A Distributed System Network interconnects the machines

A Distributed System Database is stored on several sites.
Data is shared by users on multiple machines A Distributed System

Account(actno, br-name, balance) Add 100 to account Aki-01
Branch(br-name, br-city,assets) Makati Main network Local transaction: If transaction is initiated at Makati branch Manila Q.C. Account(actno, br-name, balance) Account(actno, br-name, balance)

Account(actno, br-name, balance) Branch(br-name, br-city,assets)
Makati Main network Global: if transaction is initiated elsewhere Global Transaction Example: transfer 100 pesos from account Aki-01 (Makati) to account Aki-100 (Manila) Manila Q.C. Account(actno, br-name, balance) Account(actno, br-name, balance)

Local and Global Transactions
A local transaction accesses data in the single site at which the transaction was initiated. A global transaction either: accesses data in a site different from the one at which the transaction was initiated or (e.g. funds transfer) accesses data in several different sites. (e.g. summarization of deposits in all branches)

Distributed Database System
Makati Main network A distributed database system consists of loosely coupled sites that share no physical component Database systems that run on each site are independent of each other Transactions may access data at one or more sites Manila Q.C.

Why Distributed Database Systems?
Sharing data – users at one site are able to access the data residing at some other sites. Autonomy – each site is able to retain a degree of control over data stored locally. Higher system availability through redundancy — data can be replicated at remote sites, and system can function even if a site fails.

Trade-offs in Distributed Systems
Disadvantage: added complexity required to ensure proper coordination among sites. Software development cost. Greater potential for bugs. Increased processing overhead.

Homogeneous Distributed Database
oracle oracle All sites have identical database management system software Sites are aware of each other and agree to cooperate in processing user requests. Each site surrenders part of its autonomy in terms of right to change schemas or software Appears to user as a single system Makati Manila Q.C. Main network oracle oracle

Heterogeneous Distributed Database
SQl server oracle Different sites may use different schemas and software (DBMS) Difference in schema is a major problem for query processing Difference in software is a major problem for transaction processing Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing Makati Manila Q.C. Main network Sybase DB2

Approaches: Distributed Data Storage
Data Replication System maintains multiple copies of the relation, stored in different sites, for faster retrieval and fault tolerance. Data Fragmentation Relation is partitioned into several fragments stored in distinct sites Note: Replication and fragmentation can be combined Relation is partitioned into several fragments: system maintains several identical replicas of each fragment. Allocation Each fragment is stored at the site with optimal distribution

Account(actno, br-name, balance)
Data Replication Account(actno, br-name, balance) Main Makati network A relation or fragment of a relation is replicated if it is stored redundantly in two or more sites. Manila Account(actno, br-name, balance) Q.C. Account(actno, br-name, balance)

Data Replication Account(actno, br-name, balance) Account(actno, br-name, balance) Makati Main network Full replication of a relation is the case where the relation is stored at all sites. Manila Q.C. Account(actno, br-name, balance) Account(actno, br-name, balance)

Data Replication Bank database Makati Main
network Fully redundant databases are those in which every site contains a copy of the entire database. Manila Q.C.

Advantages of Data Replication
Availability: failure of site containing relation r does not result in unavailability of r if replicas exist. Parallelism: queries on r may be processed by several nodes in parallel. Reduced data transfer: relation r is available locally at each site containing a replica of r.

Disadvantages Data Replication
Increased cost of updates: each replica of relation r must be updated. Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented.

Data Fragmentation A relation may be divided into a number of sub-relations called fragments, which are then distributed. Division of relation r into fragments r1, r2, …, rn which contain sufficient information to reconstruct relation r.

relation r r1 r2 … rn

} } } Relation r r1 r2 r3 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3

Why fragment? : Advantages
Usage: Applications work with views rather than entire relations Efficiency: Data is stored to where it is most frequently used Parallelism: A transaction can be divided into subqueries that operate on fragments Security: Data not required by local applications is not stored and consequently not available to unauthorized users

Disadvantages of fragmentation
Performance: Performance of applications that require data form several fragments located at different sites may be slower Integrity :May be more difficult to implement

Types of Data Fragmentation Horizontal fragmentation
Vertical fragmentation Mixed fragmentation

Horizontal fragmentation
A horizontal fragment of a relation consists of a subset of the tuples of a relation. It is produced by specifying a predicate that performs a restriction on the tuples in the relation It is defined using the selection operation of relational algebra r1 R  r2 r3

Given a relation R, a horizontal fragment is defined as: r1 = p (R ) Where p is the predicate based on one or more attributes of the relation

r1 = B=‘b1’ (r ) r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r ) Relation r A B C
D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 e4 a5 b2 c5 d5 e5 a6 c6 d6 e6 a7 c7 d7 e7 a8 c8 d8 e8 a9 b3 c9 d9 e9 a10 c10 d10 e10 a11 c11 d11 e11 a12 c12 d12 e12 r1 = B=‘b1’ (r ) r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r )

r1 = B=‘b1’ (r ) Relation r A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3

r2 = B=‘b2’ (r ) Relation r A B C D E a5 b2 c5 d5 e5 a6 c6 d6 e6 a7

r3 = B=‘b3’ (r ) Relation r A B C D E a9 b3 c9 d9 e9 a10 c10 d10 e10

Given: PROPERTY_FOR_RENT(pno, street, area, city, pcode, type, rooms, rent, ono) Horizontal fragmentation by property type of PROPERTY_FOR_RENT P1: type=‘House’ (PROPERTY_FOR_RENT ) P2: type=‘Flat’ (PROPERTY_FOR_RENT )

Vertical Data Fragmentation
A vertical fragment of a relation consists of a subset of the attributes of a relation Vertical fragmentation groups together attributes that are used by some applications It is defined using the projection operation of relational algebra. R r1 r2 r3 

Vertical Fragmentation
Given R, a vertical fragment is defined as: r1 = a1, …., an (R ) Where a1, ….an are attributes of relation R

r1 = A, B,C (r) r2 = A, D (r) r3 = A, E (r) Relation r A B C D E a1

r1 = A, B,C (r) Relation r A B C a1 b1 c1 a2 c2 a3 c3 a4 c4 a5 b2 c5

r2 = A, D (r) Relation r A D a1 d1 a2 d2 a3 d3 a4 d4 a5 d5 a6 d6 a7

r3 = A, E (r) Relation r A E a1 e1 a2 e2 a3 e3 a4 e4 a5 e5 a6 e6 a7

Vertical Fragmentation
Given: STAFF(Sno, Fname, Lname, Address, Telno, Position, Sex, DOB, Salary, NIN,Bno) Vertical Fragmentation of staff S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) repeated

a horizontal fragment that is subsequently vertically fragmented or
Mixed Fragmentation A mixed fragment of a relation consists of either: a horizontal fragment that is subsequently vertically fragmented or vertical fragment that is then horizontally fragmented It is defined using the selection and projection operations of relational algebra

Mixed Fragmentation Given a relation R, a mixed fragment is defined as: Vertical fragment that is then horizontally fragmented p(a1, …., an (R )) Horizontal fragment that is then vertically fragmented a1, …., an (p(R ))

Data Fragmentation Mixed fragmentation: Mixed fragmentation:
Vertical fragments, horizontally fragmented Mixed fragmentation: Horizontal fragments, vertically fragmented

r1 = B=‘b1’ (r ) r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r )
Relation r r1 = B=‘b1’ (r ) A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 e4 a5 b2 c5 d5 e5 a6 c6 d6 e6 a7 c7 d7 e7 a8 c8 d8 e8 a9 b3 c9 d9 e9 a10 c10 d10 e10 a11 c11 d11 e11 a12 c12 d12 e12 r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r ) r3.1 = A, B,C (r) r3.2 = A, D (r) r3.3 = A, E (r)

r1 = B=‘b1’ (r ) r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r ) A B C D E a1 b1

r3.2 = A, D (r3) r3.3 = A, E (r3) r3.1 = A, B,C (r3) A B C a9 b3 c9

Mixed Fragmentation Given:
STAFF(Sno, Fname, Lname, Address, Telno, Position, Sex, DOB, Salary, NIN,Bno) Vertically fragment Staff into: S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff)

Mixed Fragmentation Horizontally fragment S2 according to branch number. S2: Sno, fname, lname,address,telno,bno (Staff) S21:  bno=‘B3’ (S2) S22:  bno=‘B5’ (S2) S22:  bno=‘B7’ (S2)

Rules: Correctness of fragmentation
Rule1: Completeness Rule2: Reconstruction Rule3: Disjointness

Rule1: Completeness If a relation instance R is decomposed into fragments R1, R2, … Rn , each data item that can be found in R must appear in at least one fragment. This is necessary to ensure that there is no loss of data during fragmentation

Rule1: Completeness r1 R r2 If a relation instance R is decomposed into fragment R1, R2, … Rn , each data item that can be found in R must appear in at least one fragment.

Relation r(A,B,C,D,E) A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3

Rule 2: Reconstruction It must be possible to define a relational operation that will reconstruct the relation R from the fragments. This rule ensures that functional dependencies are preserved.

Rule2: Reconstruction r1 R Relational operation r2
It must be possible to define a relational operation that will reconstruct the relation R from the fragments. Horizontal: Union r1  r2 = r Vertical: Natural Join operation: S S2

R= r1 r2 r2 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4

R= r1 r2 r2 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4

Rule 3: Disjointness If a data item di appears in fragment Ri, then it should not appear in any other fragment. This rule ensures minimal data redundancy Vertical fragmentation is the exception to this rule, where the primary key attributes must be repeated to allow reconstruction.

Rule3: Disjointness r1 R r2
If a data item di appears in fragment Ri, then it should not appear in any other fragment, except for the primary key in the case of vertical fragmentation

No data item appears in more than one fragment
B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 e4 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 e4 a5 b2 c5 d5 e5 a6 c6 d6 e6 a7 c7 d7 e7 a8 c8 d8 e8 a9 b3 c9 d9 e9 a10 c10 d10 e10 a11 c11 d11 e11 a12 c12 d12 e12 R= r1 r2 r2 A B C D E a5 b2 c5 d5 e5 a6 c6 d6 e6 a7 c7 d7 e7 a8 c8 d8 e8 No data item appears in more than one fragment A B C D E a9 b3 c9 d9 e9 a10 c10 d10 e10 a11 c11 d11 e11 a12 c12 d12 e12

R= r1 r2 r2 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4
No data item appears in more than one fragment except for the primary key

Check Correctness: Horizontal Fragmentation
Given: PROPERTY_FOR_RENT(pno, street, area, city, pcode, type, rooms, rent, ono) Horizontal fragmentation by property type of PROPERTY_FOR_RENT P1: type=‘House’ (PROPERTY_FOR_RENT ) P2: type=‘Flat’ (PROPERTY_FOR_RENT )

Check Correctness: Horizontal Fragmentation
P1: type=‘House’ (PROPERTY_FOR_RENT ) P2: type=‘Flat’ (PROPERTY_FOR_RENT ) Completeness: Each tuple in the relation appears in either fragment P1 or P2. Reconstruction: The Property_For_Rent relation can be reconstructed from the fragments using Union operation: S1  S2 = Property_For_Rent Disjointness: The fragments are disjoint; There can be no property type that is both house and flat

Check Correctness: Vertical Fragmentation
Given: STAFF(Sno, Fname, Lname, Address, Telno, Position, Sex, DOB, Salary, NIN,Bno) Vertical Fragmentation of staff S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff)

Check Correctness: Vertical Fragmentation
S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) Completeness: Each attribute in Staff relation appears in either fragment S1 or S2. Reconstruction: The Staff relation can be reconstructed from the fragments using Natural Join operation: S S2 Disjointness: S1 and S2 are disjoint except for the necessary duplication of the primary key.

Mixed Fragmentation Given:
STAFF(Sno, Fname, Lname, Address, Telno, Position, Sex, DOB, Salary, NIN,Bno) Vertically fragment Staff into: S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff)

Mixed Fragmentation Horizontally fragment S2 according to branch number. S2: Sno, fname, lname,address,telno,bno (Staff) S21:  bno=‘B3’ (S2) S22:  bno=‘B5’ (S2) S22:  bno=‘B7’ (S2)

Check Correctness: Mixed Fragmentation
S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) S21:  bno=‘B3’ (S2) S22:  bno=‘B5’ (S2) S22:  bno=‘B7’ (S2) Completeness Each attribute in Staff relation appears either in fragment S1 or S2 Each tuple (part) appears in fragment S1 and either fragment S21, S22, or S23.

Check for correctness: Mixed Fragmentation
S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) S21:  bno=‘B3’ (S2) S22:  bno=‘B5’ (S2) S22:  bno=‘B7’ (S2) Reconstruction The Staff relation can be reconstructed from the fragments using the Union and Natural Join operations: S ( S21  S22  S23 ) = Staff

Check for correctness: Mixed Fragmentation
S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) S21:  bno=‘B3’ (S2) S22:  bno=‘B5’ (S2) S22:  bno=‘B7’ (S2) Disjointness The fragments are disjoint There can be no staff member that works in more than one branch S1 and S2 are disjoint except for the necessary duplication of the primary key.

Advantages of Horizontal Fragmentation
allows parallel processing on fragments of a relation allows a relation to be split so that tuples are located where they are most frequently accessed

Advantages of Vertical Fragmentation
allows tuples to be split so that each part of the tuple is stored where it is most frequently accessed

Basis of design: definition and allocation of fragments
Analyze applications Concentrate on the most important ones 80/20 rule may be used as a guideline most active 20% of user queries account for 80% of the total data access Quantitative information (used in allocation) Frequency with which an application is run Site from which an application is run Performance criteria for transactions and applications

Basis of design: definition and allocation of fragments
Qualitative information (used in fragmentation) Transactions executed by the application Type of access (read or write) Predicates of read operations

Objectives of fragment definition and allocation
Locality of reference Data should be stored close to where it is used If a fragment is used at several sites, it may be beneficial to store copies of the fragments at these sites Improved reliability and availability Made possible through replication If one site fails, another copy is available at another site Acceptable performance Bad allocation may result in occurrence of bottlenecks; Bad allocation may also result in under utilization of resources.

Objectives of fragment definition and allocation
Balanced storage capacities and costs Availability and cost of storage at each site Minimal communication costs Consider cost of remote requests Retrieval costs are minimized when locality of reference is maximized or when each site has its own copy of the data Updating replicated data is costly

Strategies for Data allocation
Centralized Partitioned (or fragmented) Complete replication Selective replication

Users distributed across the network
Single database Single DBMS Users distributed across the network Reliability and availability are low – failure of central site results in loss of the entire database Communication costs are high LAN Database Workstation 1 Workstation 2 Workstation 3 Centralized Server with DBMS

Partitioned Site 1 DB Site 4 Site 2 Computer Network Site 3 DB DB

Partitioned Site 1 DB Site 4 Site 2 Computer Network Site 3 DB DB DB

Partitioned Database is partitioned into disjoint fragments
Each fragment is assigned to one site Storage costs are low since there are no replications Availability and reliability are low but higher than centralized Locality of reference is high if data access frequently occurs in the site where data is located Performance should be good Communication costs low if distribution is designed properly Computer Network DB Site 1 Site 4 Site 2 Site 3

Complete Replication Site 1 DB Site 4 Site 2 Computer Network Site 3

Complete Replication Maintaining a complete copy of the database at each site Locality of reference, reliability, performance and availability are high Storage costs are high Communication costs for updates are high Computer Network DB Site 1 Site 4 Site 2 Site 3

Selective Replication
Selective replication is a combination of replication, partitioning and centralization. Site 1 DB Site 4 Site 2 Computer Network Site 3 DB DB

Transparencies in a DDBMS
Transparency hide implementation details from the user Types of DDBMS transparencies Distribution transparency Transaction transparency Performance transparency DBMS transparency

Distribution transparency Transaction transparency Performance transparency DBMS transparency

Distribution transparency
Allows the user to perceive the database as a single logical entity. Types of distribution transparency The user does not need to know : that the data is fragmented (fragmentation transparency) location of data items (location transparency) The user is unaware of the replication of fragments (replication transparency) If the user needs to know about fragmented data and location of fragments, then there is local mapping transparency The DBMS must ensure that no two sites create a database object with the same name (naming transparency)

Transaction transparency
Ensures that all distributed transactions maintain the distributed database’s integrity and consistency. What is a distributed transaction? Accesses data stored at more than one location Each transaction is divided into a number of subtransactions, one for each site that has to be accessed

Fragmented schema: S1, S2, S21, S22 , S23 , a transaction T that prints out the names of all staff; Subtransactions: Ts3 : at site 3 Ts5 : at site 5 Ts7 : at site 7 S1: Sno, position,sex, dob, salary,nin (Staff) site5 S2: Sno, fname, lname,address,telno,bno (Staff) S21:  bno=‘B3’ (S2) site 3 S22:  bno=‘B5’ (S2) site 5 S22:  bno=‘B7’ (S2) site 7

Distributed transaction (transactions can execute concurrently; inherent parallelism) Time Ts3 Ts5 Ts7 t1 begin begin begin t2 read(fname,lname) read(fname,lname) read(fname,lname) t print(fname,lname) print(fname,lname) print(fname,lname) t4 end end end

The DBMS must ensure the indivisibility of each subtransaction It must ensure the synchronization of subtransactions with other local transactions that are executing concurrently at a site It must ensure the synchronization of subtransactions with global transactions that are running simultaneously at the same or different sites Note: Transaction transparency in a DDBMS is complicated by the fragmentation, allocation and replication schemas

Aspects of transaction transparency Concurrency transparency Failure transparency

Transaction transparency: Concurrency transparency
Results of all concurrent transactions (distributed and non-distributed) : execute independently Logically consistent with the results that are obtained if transactions are executed one at a time, in some arbitrary serial order Note: There is added complexity because the DDBMS must ensure: that both local and global transactions do not interfere with each other The consistency of all subtransactions of the global transaction

Transaction transparency: Concurrency transparency
Strategies (for replication, which makes concurrency more complex): Propagate the changes If one site holding a copy is not reachable, the transaction is delayed until the site is reachable Limit the update propagation to currently available sites; remaining sites are updated when they become available Allow the updates to copies to happen asynchronously sometime after the original update NOTE: there may be a delay in regaining consistency and this may range from a few seconds to several hours

Transaction transparency: Failure transparency
The DDBMS must provide for a recovery mechanism: Ensure subtransactions of a global transaction are atomic, that is, all commit or all abort Before recording a final commit for the global transaction, ensure that all subtransactions completed successfully In addition to the above, it must cater for: Loss of a message Failure of a communication link Failure of a site Network partitioning

Transaction transparency: Failure transparency
Example: Given a global transaction that has to update data at two sites, S1 and S2 Site 1 DB Site 4 Subtransaction at S1 completes successfully and COMMIT Site 2 Computer Network Subtransaction at S2 is unable to commit and rolls back the changes to ensure local consistency Site 3 Problem: The distributed database is now in an inconsistent state. We are unable to uncommit the data at site S1 due to the durability of the subtransaction at S1 DB DB

Performance transparency
Requires a DDBMS to perform as if it were a centralized DBMS. Requires DDBMS to determine the most cost-effective strategy to execute a request.

It hides the knowledge that the local DBMSs may be different
DBMS transparency It hides the knowledge that the local DBMSs may be different It is applicable to heterogeneous DDBMSs. One of the most difficult to provide.

Date’s Twelve Rules for a DDBMS
(0) Fundamental principle To the user, a distributed system should look exactly like a distributed system. Local autonomy Local data is locally owned and managed Local operations remain purely local All operations at a given site are owned by that site No reliance on a central site Continuous operation (no shutdown) in the case of: Adding or removing a site from the system Dynamic creation and deletion of fragments at one or more site

Date’s Twelve Rules for a DDBMS
(4) Location independence (5) Fragmentation independence (6) Replication independence (7) Distributed query processing (8) Distributed transaction processing (9) Hardware independence (10) Operating system independence (11) Network independence (12) Database independence

Summary: Characteristics of a DDBMS
A collection of logically related data. The data is split into a number of fragments. Fragments may be replicated. Fragments/replicas are allocated to sites. The sites are linked by a communications network. The data at each cite is under the control of a DBMS. The DBMS at each site can handle local applications autonomously. Each DBMS participates in at least one global application.

The End Thank You!!!

Distributed Databases

Similar presentations

Presentation on theme: "Distributed Databases"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Databases

Similar presentations

Presentation on theme: "Distributed Databases"— Presentation transcript:

Similar presentations

About project

Feedback