Distributed Databases
Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS
Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS
Evolution of data processing File-based system Each application defined and maintained its data Database technology (DBMS) Data is defined and administered centrally. A single logical database located at one site managed/controlled by a single DBMS Distributed database technology (DDBMS) Decentralization Allow users to access not only data at their own site but also data stored at remote sites.
File-based system prepared by:RdDB a collection of application programs that perform services for the end users such as the production of reports. EAF enrollment File Registrar Student Courses Faculty Data entry of courses Data entry of enrollment Generation of EAF
EAF enrollment File Registrar Student Courses Faculty Data entry of courses Data entry of enrollment File-based system prepared by:RdDB Each program in the system defines and manages its own data. [CBS98] struct course { char code[5]; char desc[20]; int units [3]; }
File-based systems File Registrar File Accounting File Department Each user (with the assistance of DP staff) defines and implements (including storage and control) the files needed for a specific application. [CBS98, EN94] File-based systems prepared by:RdDB EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty
Student System (File-based) What can be observed? Student System (File-based) Data redundancy prepared by:RdDB EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty
Student System (File-based) What can be observed? Separation and isolation of data prepared by:RdDB EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty
Student System (File-based) What can be observed? Program-data dependence Student System (File-based) prepared by:RdDB struct person { char first[20]; char middle[3]; char last[30]; } employees, managers; EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty
Student System (File-based) What can be observed? Incompatibility of files Student System (File-based) prepared by:RdDB COBOL EAF enrollment File Registrar Student Courses Faculty C OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty
Student System (File-based) What can be observed? Fixed queries; proliferation of application programs Student System (File-based) prepared by:RdDB EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty
Limitations of File-Based Systems prepared by:RdDB Separation and isolation of data Duplication of data Program-data dependence Incompatibility of files (e.g C vs. COBOL) Fixed queries / proliferation of application programs
Factors that limit File-Based System prepared by:RdDB The definition of data is embedded in the application programs, rather than being stored separately and independently. [CBS98] There is no control over the access and manipulation of data beyond that imposed by the application programs. [CBS98]
Database Approach
What is a database? prepared by:RdDB It is a shared collection of logically coherent data with some inherent meaning; It is designed , built and populated with data for specific purpose such as meeting the information needs of an organization; Ex: student database, employee database, library database, air flights database, hospital database, etc.
What is a DBMS? Database Management System It is a software system that enables users to : define, create and maintain the database DDL DML provide controlled access to this database Security Integrity Concurrency control Recovery control User-accessible catalogue (description of data)
A Simple Database System Environment prepared by:RdDB DATABASE SYSTEM DBMS Software Software to process programs/ queries access stored data Data Definition Application Programs/ Queries Database
University System: Database Environment prepared by:RdDB Registrar DATABASE SYSTEM MySQL Software to process programs/ queries access stored data Data Definition Accounting Enrollment Payment Grade processing University database Department
Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS
What is a Distributed database? A distributed database is a logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network.
What is a Distributed DBMS? Distributed DBMS is the software system that permits the management of the distributed database and makes the distribution transparent to users.
Topology of DDBMS Site 1 DB Site 4 Site 2 Computer Network Site 3 DB
Topology of Distributed Processing Site 1 Site 4 Site 2 Computer Network A centralized database that can be accessed over a computer network Site 3 DB
Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS
Motivation behind DDBMS To integrate the operational data and provide controlled access to the data. This may imply centralization but it is not the intention To adopt a decentralize approach to data which mirrors the organizational structure of many companies. Each unit maintains its own data. Data is stored proximate to the location which would improve: Shareability of the data Efficiency of data access
Motivation behind DDBMS To help resolve the islands of information problem: Geographical separation Incompatible computer architectures Incompatible communication protocols
(example from Stanford University) Why Distribute?? Example: X Corp. has offices in London, New York, and Hong Kong. Employee data: EMP(ENUM, NAME, TITLE, SALARY, …) Where should the employee data table reside? (example from Stanford University)
X Corp. Data Access Pattern Mostly, employee data is managed at the office where the employee works E.g., payroll, benefits, hire and fire Periodically, X Corp needs consolidated access to employee data E.g., X Corp. changes benefit plans and that affects all employees. E.g., Annual bonus depends on global net profit. (example from Stanford University)
(example from Stanford University) London Payroll app New York Payroll app EMP London New York Internet Hong Kong Payroll app NY and HK payroll apps run very slowly! Hong Kong (example from Stanford University)
(example from Stanford University) London Payroll app New York Payroll app London Emp NY Emp London New York Internet Hong Kong Payroll app Much better!! Hong Kong HK Emp (example from Stanford University)
Fundamental principle of DDBMS The DDBMS is expected to make the distribution transparent (invisible) to the user. The objective of transparency: To make the distributed system appear like a centralized system.
Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS What is a distributed database system? Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS
Distributed Database System
Data is spread over multiple machines (also referred to as sites or nodes). The computers may vary in size and function A Distributed System
A Distributed System Network interconnects the machines
A Distributed System Database is stored on several sites. Data is shared by users on multiple machines A Distributed System
Account(actno, br-name, balance) Add 100 to account Aki-01 Branch(br-name, br-city,assets) Makati Main network Local transaction: If transaction is initiated at Makati branch Manila Q.C. Account(actno, br-name, balance) Account(actno, br-name, balance)
Account(actno, br-name, balance) Branch(br-name, br-city,assets) Makati Main network Global: if transaction is initiated elsewhere Global Transaction Example: transfer 100 pesos from account Aki-01 (Makati) to account Aki-100 (Manila) Manila Q.C. Account(actno, br-name, balance) Account(actno, br-name, balance)
Local and Global Transactions A local transaction accesses data in the single site at which the transaction was initiated. A global transaction either: accesses data in a site different from the one at which the transaction was initiated or (e.g. funds transfer) accesses data in several different sites. (e.g. summarization of deposits in all branches)
Distributed Database System Makati Main network A distributed database system consists of loosely coupled sites that share no physical component Database systems that run on each site are independent of each other Transactions may access data at one or more sites Manila Q.C.
Why Distributed Database Systems? Sharing data – users at one site are able to access the data residing at some other sites. Autonomy – each site is able to retain a degree of control over data stored locally. Higher system availability through redundancy — data can be replicated at remote sites, and system can function even if a site fails.
Trade-offs in Distributed Systems Disadvantage: added complexity required to ensure proper coordination among sites. Software development cost. Greater potential for bugs. Increased processing overhead.
Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS
Homogeneous Distributed Database oracle oracle All sites have identical database management system software Sites are aware of each other and agree to cooperate in processing user requests. Each site surrenders part of its autonomy in terms of right to change schemas or software Appears to user as a single system Makati Manila Q.C. Main network oracle oracle
Heterogeneous Distributed Database SQl server oracle Different sites may use different schemas and software (DBMS) Difference in schema is a major problem for query processing Difference in software is a major problem for transaction processing Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing Makati Manila Q.C. Main network Sybase DB2
Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS
Approaches: Distributed Data Storage Data Replication System maintains multiple copies of the relation, stored in different sites, for faster retrieval and fault tolerance. Data Fragmentation Relation is partitioned into several fragments stored in distinct sites Note: Replication and fragmentation can be combined Relation is partitioned into several fragments: system maintains several identical replicas of each fragment. Allocation Each fragment is stored at the site with optimal distribution
Account(actno, br-name, balance) Data Replication Account(actno, br-name, balance) Main Makati network A relation or fragment of a relation is replicated if it is stored redundantly in two or more sites. Manila Account(actno, br-name, balance) Q.C. Account(actno, br-name, balance)
Data Replication Account(actno, br-name, balance) Account(actno, br-name, balance) Makati Main network Full replication of a relation is the case where the relation is stored at all sites. Manila Q.C. Account(actno, br-name, balance) Account(actno, br-name, balance)
Data Replication Bank database Makati Main network Fully redundant databases are those in which every site contains a copy of the entire database. Manila Q.C.
Advantages of Data Replication Availability: failure of site containing relation r does not result in unavailability of r if replicas exist. Parallelism: queries on r may be processed by several nodes in parallel. Reduced data transfer: relation r is available locally at each site containing a replica of r.
Disadvantages Data Replication Increased cost of updates: each replica of relation r must be updated. Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented.
Data Fragmentation A relation may be divided into a number of sub-relations called fragments, which are then distributed. Division of relation r into fragments r1, r2, …, rn which contain sufficient information to reconstruct relation r.
relation r r1 r2 … rn
} } } Relation r r1 r2 r3 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3
Why fragment? : Advantages Usage: Applications work with views rather than entire relations Efficiency: Data is stored to where it is most frequently used Parallelism: A transaction can be divided into subqueries that operate on fragments Security: Data not required by local applications is not stored and consequently not available to unauthorized users
Disadvantages of fragmentation Performance: Performance of applications that require data form several fragments located at different sites may be slower Integrity :May be more difficult to implement
Types of Data Fragmentation Horizontal fragmentation Vertical fragmentation Mixed fragmentation
Horizontal fragmentation A horizontal fragment of a relation consists of a subset of the tuples of a relation. It is produced by specifying a predicate that performs a restriction on the tuples in the relation It is defined using the selection operation of relational algebra r1 R r2 r3
Horizontal fragmentation Given a relation R, a horizontal fragment is defined as: r1 = p (R ) Where p is the predicate based on one or more attributes of the relation
r1 = B=‘b1’ (r ) r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r ) Relation r A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 e4 a5 b2 c5 d5 e5 a6 c6 d6 e6 a7 c7 d7 e7 a8 c8 d8 e8 a9 b3 c9 d9 e9 a10 c10 d10 e10 a11 c11 d11 e11 a12 c12 d12 e12 r1 = B=‘b1’ (r ) r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r )
r1 = B=‘b1’ (r ) Relation r A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3
r2 = B=‘b2’ (r ) Relation r A B C D E a5 b2 c5 d5 e5 a6 c6 d6 e6 a7
r3 = B=‘b3’ (r ) Relation r A B C D E a9 b3 c9 d9 e9 a10 c10 d10 e10
Horizontal fragmentation Given: PROPERTY_FOR_RENT(pno, street, area, city, pcode, type, rooms, rent, ono) Horizontal fragmentation by property type of PROPERTY_FOR_RENT P1: type=‘House’ (PROPERTY_FOR_RENT ) P2: type=‘Flat’ (PROPERTY_FOR_RENT )
Vertical Data Fragmentation A vertical fragment of a relation consists of a subset of the attributes of a relation Vertical fragmentation groups together attributes that are used by some applications It is defined using the projection operation of relational algebra. R r1 r2 r3
Vertical Fragmentation Given R, a vertical fragment is defined as: r1 = a1, …., an (R ) Where a1, ….an are attributes of relation R
r1 = A, B,C (r) r2 = A, D (r) r3 = A, E (r) Relation r A B C D E a1
r1 = A, B,C (r) Relation r A B C a1 b1 c1 a2 c2 a3 c3 a4 c4 a5 b2 c5
r2 = A, D (r) Relation r A D a1 d1 a2 d2 a3 d3 a4 d4 a5 d5 a6 d6 a7
r3 = A, E (r) Relation r A E a1 e1 a2 e2 a3 e3 a4 e4 a5 e5 a6 e6 a7
Vertical Fragmentation Given: STAFF(Sno, Fname, Lname, Address, Telno, Position, Sex, DOB, Salary, NIN,Bno) Vertical Fragmentation of staff S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) repeated
a horizontal fragment that is subsequently vertically fragmented or Mixed Fragmentation A mixed fragment of a relation consists of either: a horizontal fragment that is subsequently vertically fragmented or vertical fragment that is then horizontally fragmented It is defined using the selection and projection operations of relational algebra
Mixed Fragmentation Given a relation R, a mixed fragment is defined as: Vertical fragment that is then horizontally fragmented p(a1, …., an (R )) Horizontal fragment that is then vertically fragmented a1, …., an (p(R ))
Data Fragmentation Mixed fragmentation: Mixed fragmentation: Vertical fragments, horizontally fragmented Mixed fragmentation: Horizontal fragments, vertically fragmented
r1 = B=‘b1’ (r ) r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r ) Relation r r1 = B=‘b1’ (r ) A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 e4 a5 b2 c5 d5 e5 a6 c6 d6 e6 a7 c7 d7 e7 a8 c8 d8 e8 a9 b3 c9 d9 e9 a10 c10 d10 e10 a11 c11 d11 e11 a12 c12 d12 e12 r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r ) r3.1 = A, B,C (r) r3.2 = A, D (r) r3.3 = A, E (r)
r1 = B=‘b1’ (r ) r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r ) A B C D E a1 b1
r3.2 = A, D (r3) r3.3 = A, E (r3) r3.1 = A, B,C (r3) A B C a9 b3 c9
Mixed Fragmentation Given: STAFF(Sno, Fname, Lname, Address, Telno, Position, Sex, DOB, Salary, NIN,Bno) Vertically fragment Staff into: S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff)
Mixed Fragmentation Horizontally fragment S2 according to branch number. S2: Sno, fname, lname,address,telno,bno (Staff) S21: bno=‘B3’ (S2) S22: bno=‘B5’ (S2) S22: bno=‘B7’ (S2)
Rules: Correctness of fragmentation Rule1: Completeness Rule2: Reconstruction Rule3: Disjointness
Rules: Correctness of fragmentation Rule1: Completeness If a relation instance R is decomposed into fragments R1, R2, … Rn , each data item that can be found in R must appear in at least one fragment. This is necessary to ensure that there is no loss of data during fragmentation
Rule1: Completeness r1 R r2 If a relation instance R is decomposed into fragment R1, R2, … Rn , each data item that can be found in R must appear in at least one fragment.
Relation r(A,B,C,D,E) A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3
Rules: Correctness of fragmentation Rule 2: Reconstruction It must be possible to define a relational operation that will reconstruct the relation R from the fragments. This rule ensures that functional dependencies are preserved.
Rule2: Reconstruction r1 R Relational operation r2 It must be possible to define a relational operation that will reconstruct the relation R from the fragments. Horizontal: Union r1 r2 = r Vertical: Natural Join operation: S1 S2
R= r1 r2 r2 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4
R= r1 r2 r2 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4
Rules: Correctness of fragmentation Rule 3: Disjointness If a data item di appears in fragment Ri, then it should not appear in any other fragment. This rule ensures minimal data redundancy Vertical fragmentation is the exception to this rule, where the primary key attributes must be repeated to allow reconstruction.
Rule3: Disjointness r1 R r2 If a data item di appears in fragment Ri, then it should not appear in any other fragment, except for the primary key in the case of vertical fragmentation
No data item appears in more than one fragment B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 e4 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 e4 a5 b2 c5 d5 e5 a6 c6 d6 e6 a7 c7 d7 e7 a8 c8 d8 e8 a9 b3 c9 d9 e9 a10 c10 d10 e10 a11 c11 d11 e11 a12 c12 d12 e12 R= r1 r2 r2 A B C D E a5 b2 c5 d5 e5 a6 c6 d6 e6 a7 c7 d7 e7 a8 c8 d8 e8 No data item appears in more than one fragment A B C D E a9 b3 c9 d9 e9 a10 c10 d10 e10 a11 c11 d11 e11 a12 c12 d12 e12
R= r1 r2 r2 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 No data item appears in more than one fragment except for the primary key
Check Correctness: Horizontal Fragmentation Given: PROPERTY_FOR_RENT(pno, street, area, city, pcode, type, rooms, rent, ono) Horizontal fragmentation by property type of PROPERTY_FOR_RENT P1: type=‘House’ (PROPERTY_FOR_RENT ) P2: type=‘Flat’ (PROPERTY_FOR_RENT )
Check Correctness: Horizontal Fragmentation P1: type=‘House’ (PROPERTY_FOR_RENT ) P2: type=‘Flat’ (PROPERTY_FOR_RENT ) Completeness: Each tuple in the relation appears in either fragment P1 or P2. Reconstruction: The Property_For_Rent relation can be reconstructed from the fragments using Union operation: S1 S2 = Property_For_Rent Disjointness: The fragments are disjoint; There can be no property type that is both house and flat
Check Correctness: Vertical Fragmentation Given: STAFF(Sno, Fname, Lname, Address, Telno, Position, Sex, DOB, Salary, NIN,Bno) Vertical Fragmentation of staff S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff)
Check Correctness: Vertical Fragmentation S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) Completeness: Each attribute in Staff relation appears in either fragment S1 or S2. Reconstruction: The Staff relation can be reconstructed from the fragments using Natural Join operation: S1 S2 Disjointness: S1 and S2 are disjoint except for the necessary duplication of the primary key.
Mixed Fragmentation Given: STAFF(Sno, Fname, Lname, Address, Telno, Position, Sex, DOB, Salary, NIN,Bno) Vertically fragment Staff into: S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff)
Mixed Fragmentation Horizontally fragment S2 according to branch number. S2: Sno, fname, lname,address,telno,bno (Staff) S21: bno=‘B3’ (S2) S22: bno=‘B5’ (S2) S22: bno=‘B7’ (S2)
Check Correctness: Mixed Fragmentation S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) S21: bno=‘B3’ (S2) S22: bno=‘B5’ (S2) S22: bno=‘B7’ (S2) Completeness Each attribute in Staff relation appears either in fragment S1 or S2 Each tuple (part) appears in fragment S1 and either fragment S21, S22, or S23.
Check for correctness: Mixed Fragmentation S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) S21: bno=‘B3’ (S2) S22: bno=‘B5’ (S2) S22: bno=‘B7’ (S2) Reconstruction The Staff relation can be reconstructed from the fragments using the Union and Natural Join operations: S1 ( S21 S22 S23 ) = Staff
Check for correctness: Mixed Fragmentation S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) S21: bno=‘B3’ (S2) S22: bno=‘B5’ (S2) S22: bno=‘B7’ (S2) Disjointness The fragments are disjoint There can be no staff member that works in more than one branch S1 and S2 are disjoint except for the necessary duplication of the primary key.
Advantages of Horizontal Fragmentation allows parallel processing on fragments of a relation allows a relation to be split so that tuples are located where they are most frequently accessed
Advantages of Vertical Fragmentation allows tuples to be split so that each part of the tuple is stored where it is most frequently accessed
Basis of design: definition and allocation of fragments Analyze applications Concentrate on the most important ones 80/20 rule may be used as a guideline most active 20% of user queries account for 80% of the total data access Quantitative information (used in allocation) Frequency with which an application is run Site from which an application is run Performance criteria for transactions and applications
Basis of design: definition and allocation of fragments Qualitative information (used in fragmentation) Transactions executed by the application Type of access (read or write) Predicates of read operations
Objectives of fragment definition and allocation Locality of reference Data should be stored close to where it is used If a fragment is used at several sites, it may be beneficial to store copies of the fragments at these sites Improved reliability and availability Made possible through replication If one site fails, another copy is available at another site Acceptable performance Bad allocation may result in occurrence of bottlenecks; Bad allocation may also result in under utilization of resources.
Objectives of fragment definition and allocation Balanced storage capacities and costs Availability and cost of storage at each site Minimal communication costs Consider cost of remote requests Retrieval costs are minimized when locality of reference is maximized or when each site has its own copy of the data Updating replicated data is costly
Strategies for Data allocation Centralized Partitioned (or fragmented) Complete replication Selective replication
Users distributed across the network Single database Single DBMS Users distributed across the network Reliability and availability are low – failure of central site results in loss of the entire database Communication costs are high LAN Database Workstation 1 Workstation 2 Workstation 3 Centralized Server with DBMS
Partitioned Site 1 DB Site 4 Site 2 Computer Network Site 3 DB DB
Partitioned Site 1 DB Site 4 Site 2 Computer Network Site 3 DB DB DB
Partitioned Database is partitioned into disjoint fragments Each fragment is assigned to one site Storage costs are low since there are no replications Availability and reliability are low but higher than centralized Locality of reference is high if data access frequently occurs in the site where data is located Performance should be good Communication costs low if distribution is designed properly Computer Network DB Site 1 Site 4 Site 2 Site 3
Complete Replication Site 1 DB Site 4 Site 2 Computer Network Site 3
Complete Replication Site 1 DB Site 4 Site 2 Computer Network Site 3
Complete Replication Maintaining a complete copy of the database at each site Locality of reference, reliability, performance and availability are high Storage costs are high Communication costs for updates are high Computer Network DB Site 1 Site 4 Site 2 Site 3
Selective Replication Selective replication is a combination of replication, partitioning and centralization. Site 1 DB Site 4 Site 2 Computer Network Site 3 DB DB
Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS
Transparencies in a DDBMS Transparency hide implementation details from the user Types of DDBMS transparencies Distribution transparency Transaction transparency Performance transparency DBMS transparency
Transparencies in a DDBMS Distribution transparency Transaction transparency Performance transparency DBMS transparency
Distribution transparency Allows the user to perceive the database as a single logical entity. Types of distribution transparency The user does not need to know : that the data is fragmented (fragmentation transparency) location of data items (location transparency) The user is unaware of the replication of fragments (replication transparency) If the user needs to know about fragmented data and location of fragments, then there is local mapping transparency The DBMS must ensure that no two sites create a database object with the same name (naming transparency)
Transparencies in a DDBMS Distribution transparency Transaction transparency Performance transparency DBMS transparency
Transaction transparency Ensures that all distributed transactions maintain the distributed database’s integrity and consistency. What is a distributed transaction? Accesses data stored at more than one location Each transaction is divided into a number of subtransactions, one for each site that has to be accessed
Transaction transparency Fragmented schema: S1, S2, S21, S22 , S23 , a transaction T that prints out the names of all staff; Subtransactions: Ts3 : at site 3 Ts5 : at site 5 Ts7 : at site 7 S1: Sno, position,sex, dob, salary,nin (Staff) site5 S2: Sno, fname, lname,address,telno,bno (Staff) S21: bno=‘B3’ (S2) site 3 S22: bno=‘B5’ (S2) site 5 S22: bno=‘B7’ (S2) site 7
Transaction transparency Distributed transaction (transactions can execute concurrently; inherent parallelism) Time Ts3 Ts5 Ts7 t1 begin begin begin t2 read(fname,lname) read(fname,lname) read(fname,lname) t3 print(fname,lname) print(fname,lname) print(fname,lname) t4 end end end
Transaction transparency The DBMS must ensure the indivisibility of each subtransaction It must ensure the synchronization of subtransactions with other local transactions that are executing concurrently at a site It must ensure the synchronization of subtransactions with global transactions that are running simultaneously at the same or different sites Note: Transaction transparency in a DDBMS is complicated by the fragmentation, allocation and replication schemas
Transaction transparency Aspects of transaction transparency Concurrency transparency Failure transparency
Transaction transparency: Concurrency transparency Results of all concurrent transactions (distributed and non-distributed) : execute independently Logically consistent with the results that are obtained if transactions are executed one at a time, in some arbitrary serial order Note: There is added complexity because the DDBMS must ensure: that both local and global transactions do not interfere with each other The consistency of all subtransactions of the global transaction
Transaction transparency: Concurrency transparency Strategies (for replication, which makes concurrency more complex): Propagate the changes If one site holding a copy is not reachable, the transaction is delayed until the site is reachable Limit the update propagation to currently available sites; remaining sites are updated when they become available Allow the updates to copies to happen asynchronously sometime after the original update NOTE: there may be a delay in regaining consistency and this may range from a few seconds to several hours
Transaction transparency: Failure transparency The DDBMS must provide for a recovery mechanism: Ensure subtransactions of a global transaction are atomic, that is, all commit or all abort Before recording a final commit for the global transaction, ensure that all subtransactions completed successfully In addition to the above, it must cater for: Loss of a message Failure of a communication link Failure of a site Network partitioning
Transaction transparency: Failure transparency Example: Given a global transaction that has to update data at two sites, S1 and S2 Site 1 DB Site 4 Subtransaction at S1 completes successfully and COMMIT Site 2 Computer Network Subtransaction at S2 is unable to commit and rolls back the changes to ensure local consistency Site 3 Problem: The distributed database is now in an inconsistent state. We are unable to uncommit the data at site S1 due to the durability of the subtransaction at S1 DB DB
Transparencies in a DDBMS Distribution transparency Transaction transparency Performance transparency DBMS transparency
Performance transparency Requires a DDBMS to perform as if it were a centralized DBMS. Requires DDBMS to determine the most cost-effective strategy to execute a request.
Transparencies in a DDBMS Distribution transparency Transaction transparency Performance transparency DBMS transparency
It hides the knowledge that the local DBMSs may be different DBMS transparency It hides the knowledge that the local DBMSs may be different It is applicable to heterogeneous DDBMSs. One of the most difficult to provide.
Date’s Twelve Rules for a DDBMS (0) Fundamental principle To the user, a distributed system should look exactly like a distributed system. Local autonomy Local data is locally owned and managed Local operations remain purely local All operations at a given site are owned by that site No reliance on a central site Continuous operation (no shutdown) in the case of: Adding or removing a site from the system Dynamic creation and deletion of fragments at one or more site
Date’s Twelve Rules for a DDBMS (4) Location independence (5) Fragmentation independence (6) Replication independence (7) Distributed query processing (8) Distributed transaction processing (9) Hardware independence (10) Operating system independence (11) Network independence (12) Database independence
Summary: Characteristics of a DDBMS A collection of logically related data. The data is split into a number of fragments. Fragments may be replicated. Fragments/replicas are allocated to sites. The sites are linked by a communications network. The data at each cite is under the control of a DBMS. The DBMS at each site can handle local applications autonomously. Each DBMS participates in at least one global application.
The End Thank You!!!