Distributed Databases

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Enterprise Systems Distributed databases and systems - DT
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Chapter 13 (Web): Distributed Databases
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
Introduction to Databases Transparencies
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 12 Distributed Database Management Systems
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed databases
Distributed Databases
Distributed Databases and DBMSs: Concepts and Design
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
Introduction to Databases. Case Example: File based Processing Real Estate Agent’s office Property for sale or rent Potential Buyer/renter Staff/employees.
Distributed DBMSs - Concepts and Design Transparencies
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Distributed DBMSs - Concepts and Design Transparencies
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
Session-9 Data Management for Decision Support
Session-8 Data Management for Decision Support
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed systems and Distributed databases design Enterprise systems DT
Distributed Databases Reference Books: An introduction to Database Systems - By C.J. Database Systems and Concepts – Silberchatz, Korth and Sudarshan Lecture.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
DDBMS Distributed Database Management Systems Fragmentation
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Distributed database system
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Chapter 24 Distributed DBMSs – Concepts and Design Pearson Education © 2014.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Distributed DBMSs – Concepts and Design Chapter 24 in Textbook.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Databases “Fundamentals”
Introduction to Databases Transparencies
Distributed DBMSs - Concepts and Design
Distributed Database Concepts
Chapter 12 Distributed Database Management Systems
Distributed Database Management Systems
Introduction to Databases Transparencies
Introduction to Databases
Parallel and Distributed Databases
Distributed DBMS Concepts of Distributed DBMS
Chapter 19: Distributed Databases
Introduction to Databases
Distributed Databases and DBMSs: Concepts and Design
Distributed Databases
Introduction to Databases Transparencies
Distributed Database Management Systems
Introduction of Week 14 Return assignment 12-1
Database System Concepts and Architecture
Database System Architectures
Presentation transcript:

Distributed Databases

Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS

Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS

Evolution of data processing File-based system Each application defined and maintained its data Database technology (DBMS) Data is defined and administered centrally. A single logical database located at one site managed/controlled by a single DBMS Distributed database technology (DDBMS) Decentralization Allow users to access not only data at their own site but also data stored at remote sites.

File-based system prepared by:RdDB a collection of application programs that perform services for the end users such as the production of reports. EAF enrollment File Registrar Student Courses Faculty Data entry of courses Data entry of enrollment Generation of EAF

EAF enrollment File Registrar Student Courses Faculty Data entry of courses Data entry of enrollment File-based system prepared by:RdDB Each program in the system defines and manages its own data. [CBS98] struct course { char code[5]; char desc[20]; int units [3]; }

File-based systems File Registrar File Accounting File Department Each user (with the assistance of DP staff) defines and implements (including storage and control) the files needed for a specific application. [CBS98, EN94] File-based systems prepared by:RdDB EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty

Student System (File-based) What can be observed? Student System (File-based) Data redundancy prepared by:RdDB EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty

Student System (File-based) What can be observed? Separation and isolation of data prepared by:RdDB EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty

Student System (File-based) What can be observed? Program-data dependence Student System (File-based) prepared by:RdDB struct person { char first[20]; char middle[3]; char last[30]; } employees, managers; EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty

Student System (File-based) What can be observed? Incompatibility of files Student System (File-based) prepared by:RdDB COBOL EAF enrollment File Registrar Student Courses Faculty C OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty

Student System (File-based) What can be observed? Fixed queries; proliferation of application programs Student System (File-based) prepared by:RdDB EAF enrollment File Registrar Student Courses Faculty OR payment of fees File Accounting Student Fees course cards Processing of grades File Department Student Courses Grades Faculty

Limitations of File-Based Systems prepared by:RdDB Separation and isolation of data Duplication of data Program-data dependence Incompatibility of files (e.g C vs. COBOL) Fixed queries / proliferation of application programs

Factors that limit File-Based System prepared by:RdDB The definition of data is embedded in the application programs, rather than being stored separately and independently. [CBS98] There is no control over the access and manipulation of data beyond that imposed by the application programs. [CBS98]

Database Approach

What is a database? prepared by:RdDB It is a shared collection of logically coherent data with some inherent meaning; It is designed , built and populated with data for specific purpose such as meeting the information needs of an organization; Ex: student database, employee database, library database, air flights database, hospital database, etc.

What is a DBMS? Database Management System It is a software system that enables users to : define, create and maintain the database DDL DML provide controlled access to this database Security Integrity Concurrency control Recovery control User-accessible catalogue (description of data)

A Simple Database System Environment prepared by:RdDB DATABASE SYSTEM DBMS Software Software to process programs/ queries access stored data Data Definition Application Programs/ Queries Database

University System: Database Environment prepared by:RdDB Registrar DATABASE SYSTEM MySQL Software to process programs/ queries access stored data Data Definition Accounting Enrollment Payment Grade processing University database Department

Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS

What is a Distributed database? A distributed database is a logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network.

What is a Distributed DBMS? Distributed DBMS is the software system that permits the management of the distributed database and makes the distribution transparent to users.

Topology of DDBMS Site 1 DB Site 4 Site 2 Computer Network Site 3 DB

Topology of Distributed Processing Site 1 Site 4 Site 2 Computer Network A centralized database that can be accessed over a computer network Site 3 DB

Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS

Motivation behind DDBMS To integrate the operational data and provide controlled access to the data. This may imply centralization but it is not the intention To adopt a decentralize approach to data which mirrors the organizational structure of many companies. Each unit maintains its own data. Data is stored proximate to the location which would improve: Shareability of the data Efficiency of data access

Motivation behind DDBMS To help resolve the islands of information problem: Geographical separation Incompatible computer architectures Incompatible communication protocols

(example from Stanford University) Why Distribute?? Example: X Corp. has offices in London, New York, and Hong Kong. Employee data: EMP(ENUM, NAME, TITLE, SALARY, …) Where should the employee data table reside? (example from Stanford University)

X Corp. Data Access Pattern Mostly, employee data is managed at the office where the employee works E.g., payroll, benefits, hire and fire Periodically, X Corp needs consolidated access to employee data E.g., X Corp. changes benefit plans and that affects all employees. E.g., Annual bonus depends on global net profit. (example from Stanford University)

(example from Stanford University) London Payroll app New York Payroll app EMP London New York Internet Hong Kong Payroll app NY and HK payroll apps run very slowly! Hong Kong (example from Stanford University)

(example from Stanford University) London Payroll app New York Payroll app London Emp NY Emp London New York Internet Hong Kong Payroll app Much better!! Hong Kong HK Emp (example from Stanford University)

Fundamental principle of DDBMS The DDBMS is expected to make the distribution transparent (invisible) to the user. The objective of transparency: To make the distributed system appear like a centralized system.

Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS What is a distributed database system? Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS

Distributed Database System

Data is spread over multiple machines (also referred to as sites or nodes). The computers may vary in size and function A Distributed System

A Distributed System Network interconnects the machines

A Distributed System Database is stored on several sites. Data is shared by users on multiple machines A Distributed System

Account(actno, br-name, balance) Add 100 to account Aki-01 Branch(br-name, br-city,assets) Makati Main network Local transaction: If transaction is initiated at Makati branch Manila Q.C. Account(actno, br-name, balance) Account(actno, br-name, balance)

Account(actno, br-name, balance) Branch(br-name, br-city,assets) Makati Main network Global: if transaction is initiated elsewhere Global Transaction Example: transfer 100 pesos from account Aki-01 (Makati) to account Aki-100 (Manila) Manila Q.C. Account(actno, br-name, balance) Account(actno, br-name, balance)

Local and Global Transactions A local transaction accesses data in the single site at which the transaction was initiated. A global transaction either: accesses data in a site different from the one at which the transaction was initiated or (e.g. funds transfer) accesses data in several different sites. (e.g. summarization of deposits in all branches)

Distributed Database System Makati Main network A distributed database system consists of loosely coupled sites that share no physical component Database systems that run on each site are independent of each other Transactions may access data at one or more sites Manila Q.C.

Why Distributed Database Systems? Sharing data – users at one site are able to access the data residing at some other sites. Autonomy – each site is able to retain a degree of control over data stored locally. Higher system availability through redundancy — data can be replicated at remote sites, and system can function even if a site fails.

Trade-offs in Distributed Systems Disadvantage: added complexity required to ensure proper coordination among sites. Software development cost. Greater potential for bugs. Increased processing overhead.

Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS

Homogeneous Distributed Database oracle oracle All sites have identical database management system software Sites are aware of each other and agree to cooperate in processing user requests. Each site surrenders part of its autonomy in terms of right to change schemas or software Appears to user as a single system Makati Manila Q.C. Main network oracle oracle

Heterogeneous Distributed Database SQl server oracle Different sites may use different schemas and software (DBMS) Difference in schema is a major problem for query processing Difference in software is a major problem for transaction processing Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing Makati Manila Q.C. Main network Sybase DB2

Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS

Approaches: Distributed Data Storage Data Replication System maintains multiple copies of the relation, stored in different sites, for faster retrieval and fault tolerance. Data Fragmentation Relation is partitioned into several fragments stored in distinct sites Note: Replication and fragmentation can be combined Relation is partitioned into several fragments: system maintains several identical replicas of each fragment. Allocation Each fragment is stored at the site with optimal distribution

Account(actno, br-name, balance) Data Replication Account(actno, br-name, balance) Main Makati network A relation or fragment of a relation is replicated if it is stored redundantly in two or more sites. Manila Account(actno, br-name, balance) Q.C. Account(actno, br-name, balance)

Data Replication Account(actno, br-name, balance) Account(actno, br-name, balance) Makati Main network Full replication of a relation is the case where the relation is stored at all sites. Manila Q.C. Account(actno, br-name, balance) Account(actno, br-name, balance)

Data Replication Bank database Makati Main network Fully redundant databases are those in which every site contains a copy of the entire database. Manila Q.C.

Advantages of Data Replication Availability: failure of site containing relation r does not result in unavailability of r if replicas exist. Parallelism: queries on r may be processed by several nodes in parallel. Reduced data transfer: relation r is available locally at each site containing a replica of r.

Disadvantages Data Replication Increased cost of updates: each replica of relation r must be updated. Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented.

Data Fragmentation A relation may be divided into a number of sub-relations called fragments, which are then distributed. Division of relation r into fragments r1, r2, …, rn which contain sufficient information to reconstruct relation r.

relation r r1 r2 … rn

} } } Relation r r1 r2 r3 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3

Why fragment? : Advantages Usage: Applications work with views rather than entire relations Efficiency: Data is stored to where it is most frequently used Parallelism: A transaction can be divided into subqueries that operate on fragments Security: Data not required by local applications is not stored and consequently not available to unauthorized users

Disadvantages of fragmentation Performance: Performance of applications that require data form several fragments located at different sites may be slower Integrity :May be more difficult to implement

Types of Data Fragmentation Horizontal fragmentation Vertical fragmentation Mixed fragmentation

Horizontal fragmentation A horizontal fragment of a relation consists of a subset of the tuples of a relation. It is produced by specifying a predicate that performs a restriction on the tuples in the relation It is defined using the selection operation of relational algebra r1 R  r2 r3

Horizontal fragmentation Given a relation R, a horizontal fragment is defined as: r1 = p (R ) Where p is the predicate based on one or more attributes of the relation

r1 = B=‘b1’ (r ) r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r ) Relation r A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 e4 a5 b2 c5 d5 e5 a6 c6 d6 e6 a7 c7 d7 e7 a8 c8 d8 e8 a9 b3 c9 d9 e9 a10 c10 d10 e10 a11 c11 d11 e11 a12 c12 d12 e12 r1 = B=‘b1’ (r ) r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r )

r1 = B=‘b1’ (r ) Relation r A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3

r2 = B=‘b2’ (r ) Relation r A B C D E a5 b2 c5 d5 e5 a6 c6 d6 e6 a7

r3 = B=‘b3’ (r ) Relation r A B C D E a9 b3 c9 d9 e9 a10 c10 d10 e10

Horizontal fragmentation Given: PROPERTY_FOR_RENT(pno, street, area, city, pcode, type, rooms, rent, ono) Horizontal fragmentation by property type of PROPERTY_FOR_RENT P1: type=‘House’ (PROPERTY_FOR_RENT ) P2: type=‘Flat’ (PROPERTY_FOR_RENT )

Vertical Data Fragmentation A vertical fragment of a relation consists of a subset of the attributes of a relation Vertical fragmentation groups together attributes that are used by some applications It is defined using the projection operation of relational algebra. R r1 r2 r3 

Vertical Fragmentation Given R, a vertical fragment is defined as: r1 = a1, …., an (R ) Where a1, ….an are attributes of relation R

r1 = A, B,C (r) r2 = A, D (r) r3 = A, E (r) Relation r A B C D E a1

r1 = A, B,C (r) Relation r A B C a1 b1 c1 a2 c2 a3 c3 a4 c4 a5 b2 c5

r2 = A, D (r) Relation r A D a1 d1 a2 d2 a3 d3 a4 d4 a5 d5 a6 d6 a7

r3 = A, E (r) Relation r A E a1 e1 a2 e2 a3 e3 a4 e4 a5 e5 a6 e6 a7

Vertical Fragmentation Given: STAFF(Sno, Fname, Lname, Address, Telno, Position, Sex, DOB, Salary, NIN,Bno) Vertical Fragmentation of staff S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) repeated

a horizontal fragment that is subsequently vertically fragmented or Mixed Fragmentation A mixed fragment of a relation consists of either: a horizontal fragment that is subsequently vertically fragmented or vertical fragment that is then horizontally fragmented It is defined using the selection and projection operations of relational algebra

Mixed Fragmentation Given a relation R, a mixed fragment is defined as: Vertical fragment that is then horizontally fragmented p(a1, …., an (R )) Horizontal fragment that is then vertically fragmented a1, …., an (p(R ))

Data Fragmentation Mixed fragmentation: Mixed fragmentation: Vertical fragments, horizontally fragmented Mixed fragmentation: Horizontal fragments, vertically fragmented

r1 = B=‘b1’ (r ) r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r ) Relation r r1 = B=‘b1’ (r ) A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 e4 a5 b2 c5 d5 e5 a6 c6 d6 e6 a7 c7 d7 e7 a8 c8 d8 e8 a9 b3 c9 d9 e9 a10 c10 d10 e10 a11 c11 d11 e11 a12 c12 d12 e12 r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r ) r3.1 = A, B,C (r) r3.2 = A, D (r) r3.3 = A, E (r)

r1 = B=‘b1’ (r ) r2 = B=‘b2’ (r ) r3 = B=‘b3’ (r ) A B C D E a1 b1

r3.2 = A, D (r3) r3.3 = A, E (r3) r3.1 = A, B,C (r3) A B C a9 b3 c9

Mixed Fragmentation Given: STAFF(Sno, Fname, Lname, Address, Telno, Position, Sex, DOB, Salary, NIN,Bno) Vertically fragment Staff into: S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff)

Mixed Fragmentation Horizontally fragment S2 according to branch number. S2: Sno, fname, lname,address,telno,bno (Staff) S21:  bno=‘B3’ (S2) S22:  bno=‘B5’ (S2) S22:  bno=‘B7’ (S2)

Rules: Correctness of fragmentation Rule1: Completeness Rule2: Reconstruction Rule3: Disjointness

Rules: Correctness of fragmentation Rule1: Completeness If a relation instance R is decomposed into fragments R1, R2, … Rn , each data item that can be found in R must appear in at least one fragment. This is necessary to ensure that there is no loss of data during fragmentation

Rule1: Completeness r1 R r2 If a relation instance R is decomposed into fragment R1, R2, … Rn , each data item that can be found in R must appear in at least one fragment.

Relation r(A,B,C,D,E) A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3

Rules: Correctness of fragmentation Rule 2: Reconstruction It must be possible to define a relational operation that will reconstruct the relation R from the fragments. This rule ensures that functional dependencies are preserved.

Rule2: Reconstruction r1 R Relational operation r2 It must be possible to define a relational operation that will reconstruct the relation R from the fragments. Horizontal: Union r1  r2 = r Vertical: Natural Join operation: S1 S2

R= r1 r2 r2 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4

R= r1 r2 r2 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4

Rules: Correctness of fragmentation Rule 3: Disjointness If a data item di appears in fragment Ri, then it should not appear in any other fragment. This rule ensures minimal data redundancy Vertical fragmentation is the exception to this rule, where the primary key attributes must be repeated to allow reconstruction.

Rule3: Disjointness r1 R r2 If a data item di appears in fragment Ri, then it should not appear in any other fragment, except for the primary key in the case of vertical fragmentation

No data item appears in more than one fragment B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 e4 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 e4 a5 b2 c5 d5 e5 a6 c6 d6 e6 a7 c7 d7 e7 a8 c8 d8 e8 a9 b3 c9 d9 e9 a10 c10 d10 e10 a11 c11 d11 e11 a12 c12 d12 e12 R= r1 r2 r2 A B C D E a5 b2 c5 d5 e5 a6 c6 d6 e6 a7 c7 d7 e7 a8 c8 d8 e8 No data item appears in more than one fragment A B C D E a9 b3 c9 d9 e9 a10 c10 d10 e10 a11 c11 d11 e11 a12 c12 d12 e12

R= r1 r2 r2 A B C D E a1 b1 c1 d1 e1 a2 c2 d2 e2 a3 c3 d3 e3 a4 c4 d4 No data item appears in more than one fragment except for the primary key

Check Correctness: Horizontal Fragmentation Given: PROPERTY_FOR_RENT(pno, street, area, city, pcode, type, rooms, rent, ono) Horizontal fragmentation by property type of PROPERTY_FOR_RENT P1: type=‘House’ (PROPERTY_FOR_RENT ) P2: type=‘Flat’ (PROPERTY_FOR_RENT )

Check Correctness: Horizontal Fragmentation P1: type=‘House’ (PROPERTY_FOR_RENT ) P2: type=‘Flat’ (PROPERTY_FOR_RENT ) Completeness: Each tuple in the relation appears in either fragment P1 or P2. Reconstruction: The Property_For_Rent relation can be reconstructed from the fragments using Union operation: S1  S2 = Property_For_Rent Disjointness: The fragments are disjoint; There can be no property type that is both house and flat

Check Correctness: Vertical Fragmentation Given: STAFF(Sno, Fname, Lname, Address, Telno, Position, Sex, DOB, Salary, NIN,Bno) Vertical Fragmentation of staff S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff)

Check Correctness: Vertical Fragmentation S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) Completeness: Each attribute in Staff relation appears in either fragment S1 or S2. Reconstruction: The Staff relation can be reconstructed from the fragments using Natural Join operation: S1 S2 Disjointness: S1 and S2 are disjoint except for the necessary duplication of the primary key.

Mixed Fragmentation Given: STAFF(Sno, Fname, Lname, Address, Telno, Position, Sex, DOB, Salary, NIN,Bno) Vertically fragment Staff into: S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff)

Mixed Fragmentation Horizontally fragment S2 according to branch number. S2: Sno, fname, lname,address,telno,bno (Staff) S21:  bno=‘B3’ (S2) S22:  bno=‘B5’ (S2) S22:  bno=‘B7’ (S2)

Check Correctness: Mixed Fragmentation S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) S21:  bno=‘B3’ (S2) S22:  bno=‘B5’ (S2) S22:  bno=‘B7’ (S2) Completeness Each attribute in Staff relation appears either in fragment S1 or S2 Each tuple (part) appears in fragment S1 and either fragment S21, S22, or S23.

Check for correctness: Mixed Fragmentation S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) S21:  bno=‘B3’ (S2) S22:  bno=‘B5’ (S2) S22:  bno=‘B7’ (S2) Reconstruction The Staff relation can be reconstructed from the fragments using the Union and Natural Join operations: S1 ( S21  S22  S23 ) = Staff

Check for correctness: Mixed Fragmentation S1: Sno, position,sex, dob, salary,nin (Staff) S2: Sno, fname, lname,address,telno,bno (Staff) S21:  bno=‘B3’ (S2) S22:  bno=‘B5’ (S2) S22:  bno=‘B7’ (S2) Disjointness The fragments are disjoint There can be no staff member that works in more than one branch S1 and S2 are disjoint except for the necessary duplication of the primary key.

Advantages of Horizontal Fragmentation allows parallel processing on fragments of a relation allows a relation to be split so that tuples are located where they are most frequently accessed

Advantages of Vertical Fragmentation allows tuples to be split so that each part of the tuple is stored where it is most frequently accessed

Basis of design: definition and allocation of fragments Analyze applications Concentrate on the most important ones 80/20 rule may be used as a guideline most active 20% of user queries account for 80% of the total data access Quantitative information (used in allocation) Frequency with which an application is run Site from which an application is run Performance criteria for transactions and applications

Basis of design: definition and allocation of fragments Qualitative information (used in fragmentation) Transactions executed by the application Type of access (read or write) Predicates of read operations

Objectives of fragment definition and allocation Locality of reference Data should be stored close to where it is used If a fragment is used at several sites, it may be beneficial to store copies of the fragments at these sites Improved reliability and availability Made possible through replication If one site fails, another copy is available at another site Acceptable performance Bad allocation may result in occurrence of bottlenecks; Bad allocation may also result in under utilization of resources.

Objectives of fragment definition and allocation Balanced storage capacities and costs Availability and cost of storage at each site Minimal communication costs Consider cost of remote requests Retrieval costs are minimized when locality of reference is maximized or when each site has its own copy of the data Updating replicated data is costly

Strategies for Data allocation Centralized Partitioned (or fragmented) Complete replication Selective replication

Users distributed across the network Single database Single DBMS Users distributed across the network Reliability and availability are low – failure of central site results in loss of the entire database Communication costs are high LAN Database Workstation 1 Workstation 2 Workstation 3 Centralized Server with DBMS

Partitioned Site 1 DB Site 4 Site 2 Computer Network Site 3 DB DB

Partitioned Site 1 DB Site 4 Site 2 Computer Network Site 3 DB DB DB

Partitioned Database is partitioned into disjoint fragments Each fragment is assigned to one site Storage costs are low since there are no replications Availability and reliability are low but higher than centralized Locality of reference is high if data access frequently occurs in the site where data is located Performance should be good Communication costs low if distribution is designed properly Computer Network DB Site 1 Site 4 Site 2 Site 3

Complete Replication Site 1 DB Site 4 Site 2 Computer Network Site 3

Complete Replication Site 1 DB Site 4 Site 2 Computer Network Site 3

Complete Replication Maintaining a complete copy of the database at each site Locality of reference, reliability, performance and availability are high Storage costs are high Communication costs for updates are high Computer Network DB Site 1 Site 4 Site 2 Site 3

Selective Replication Selective replication is a combination of replication, partitioning and centralization. Site 1 DB Site 4 Site 2 Computer Network Site 3 DB DB

Outline Evolution of data processing What is a DDBMS? Motivation behind DDBMS Types of Distributed Databases Distributed Data Storage Replication Fragmentation Transparencies in a DDBMS

Transparencies in a DDBMS Transparency hide implementation details from the user Types of DDBMS transparencies Distribution transparency Transaction transparency Performance transparency DBMS transparency

Transparencies in a DDBMS Distribution transparency Transaction transparency Performance transparency DBMS transparency

Distribution transparency Allows the user to perceive the database as a single logical entity. Types of distribution transparency The user does not need to know : that the data is fragmented (fragmentation transparency) location of data items (location transparency) The user is unaware of the replication of fragments (replication transparency) If the user needs to know about fragmented data and location of fragments, then there is local mapping transparency The DBMS must ensure that no two sites create a database object with the same name (naming transparency)

Transparencies in a DDBMS Distribution transparency Transaction transparency Performance transparency DBMS transparency

Transaction transparency Ensures that all distributed transactions maintain the distributed database’s integrity and consistency. What is a distributed transaction? Accesses data stored at more than one location Each transaction is divided into a number of subtransactions, one for each site that has to be accessed

Transaction transparency Fragmented schema: S1, S2, S21, S22 , S23 , a transaction T that prints out the names of all staff; Subtransactions: Ts3 : at site 3 Ts5 : at site 5 Ts7 : at site 7 S1: Sno, position,sex, dob, salary,nin (Staff) site5 S2: Sno, fname, lname,address,telno,bno (Staff) S21:  bno=‘B3’ (S2) site 3 S22:  bno=‘B5’ (S2) site 5 S22:  bno=‘B7’ (S2) site 7

Transaction transparency Distributed transaction (transactions can execute concurrently; inherent parallelism) Time Ts3 Ts5 Ts7 t1 begin begin begin t2 read(fname,lname) read(fname,lname) read(fname,lname) t3 print(fname,lname) print(fname,lname) print(fname,lname) t4 end end end

Transaction transparency The DBMS must ensure the indivisibility of each subtransaction It must ensure the synchronization of subtransactions with other local transactions that are executing concurrently at a site It must ensure the synchronization of subtransactions with global transactions that are running simultaneously at the same or different sites Note: Transaction transparency in a DDBMS is complicated by the fragmentation, allocation and replication schemas

Transaction transparency Aspects of transaction transparency Concurrency transparency Failure transparency

Transaction transparency: Concurrency transparency Results of all concurrent transactions (distributed and non-distributed) : execute independently Logically consistent with the results that are obtained if transactions are executed one at a time, in some arbitrary serial order Note: There is added complexity because the DDBMS must ensure: that both local and global transactions do not interfere with each other The consistency of all subtransactions of the global transaction

Transaction transparency: Concurrency transparency Strategies (for replication, which makes concurrency more complex): Propagate the changes If one site holding a copy is not reachable, the transaction is delayed until the site is reachable Limit the update propagation to currently available sites; remaining sites are updated when they become available Allow the updates to copies to happen asynchronously sometime after the original update NOTE: there may be a delay in regaining consistency and this may range from a few seconds to several hours

Transaction transparency: Failure transparency The DDBMS must provide for a recovery mechanism: Ensure subtransactions of a global transaction are atomic, that is, all commit or all abort Before recording a final commit for the global transaction, ensure that all subtransactions completed successfully In addition to the above, it must cater for: Loss of a message Failure of a communication link Failure of a site Network partitioning

Transaction transparency: Failure transparency Example: Given a global transaction that has to update data at two sites, S1 and S2 Site 1 DB Site 4 Subtransaction at S1 completes successfully and COMMIT Site 2 Computer Network Subtransaction at S2 is unable to commit and rolls back the changes to ensure local consistency Site 3 Problem: The distributed database is now in an inconsistent state. We are unable to uncommit the data at site S1 due to the durability of the subtransaction at S1 DB DB

Transparencies in a DDBMS Distribution transparency Transaction transparency Performance transparency DBMS transparency

Performance transparency Requires a DDBMS to perform as if it were a centralized DBMS. Requires DDBMS to determine the most cost-effective strategy to execute a request.

Transparencies in a DDBMS Distribution transparency Transaction transparency Performance transparency DBMS transparency

It hides the knowledge that the local DBMSs may be different DBMS transparency It hides the knowledge that the local DBMSs may be different It is applicable to heterogeneous DDBMSs. One of the most difficult to provide.

Date’s Twelve Rules for a DDBMS (0) Fundamental principle To the user, a distributed system should look exactly like a distributed system. Local autonomy Local data is locally owned and managed Local operations remain purely local All operations at a given site are owned by that site No reliance on a central site Continuous operation (no shutdown) in the case of: Adding or removing a site from the system Dynamic creation and deletion of fragments at one or more site

Date’s Twelve Rules for a DDBMS (4) Location independence (5) Fragmentation independence (6) Replication independence (7) Distributed query processing (8) Distributed transaction processing (9) Hardware independence (10) Operating system independence (11) Network independence (12) Database independence

Summary: Characteristics of a DDBMS A collection of logically related data. The data is split into a number of fragments. Fragments may be replicated. Fragments/replicas are allocated to sites. The sites are linked by a communications network. The data at each cite is under the control of a DBMS. The DBMS at each site can handle local applications autonomously. Each DBMS participates in at least one global application.

The End Thank You!!!