Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :

Slides:



Advertisements
Similar presentations
Types of Distributed Database Systems
Advertisements

Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Consistency and Replication (3). Topics Consistency protocols.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 12: Three-Phase Commits (3PC) Professor Chen Li.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
E-Transactions: End-to-End Reliability for Three-Tier Architectures Svend Frølund and Rachid Guerraoui.
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Transaction Management and Concurrency Control
CS 582 / CMPE 481 Distributed Systems
CMPT 431 Dr. Alexandra Fedorova Lecture XII: Replication.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
The Atomic Commit Problem. 2 The Problem Reaching a decision in a distributed environment Every participant: has an opinion can veto.
1 ICS 214B: Transaction Processing and Distributed Data Management Replication Techniques.
Overview Distributed vs. decentralized Why distributed databases
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Understanding Replication in Database & Distributed Systems SRDS’ Database Replication Techniques: A Three Parameter Classification M. Wiesmann F.
Manajemen Basis Data Pertemuan 10 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Databases
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
Database Design – Lecture 16
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 10 Transaction Management.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.
Distributed Database Systems Overview
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Databases Illuminated
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Fault Tolerance and Replication
Chap 7: Consistency and Replication
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Don’t be lazy, be consistent: Postgres-R, A new way to implement Database Replication Paper by Bettina Kemme and Gustavo Alonso, VLDB 2000 Presentation.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Lecturer : Dr. Pavle Mogin
6.4 Data and File Replication
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
RELIABILITY.
Outline Announcements Fault Tolerance.
7.1. CONSISTENCY AND REPLICATION INTRODUCTION
Chapter 10 Transaction Management and Concurrency Control
CSE 486/586 Distributed Systems Consistency --- 1
Active replication for fault tolerance
Replication and Recovery in Distributed Systems
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Transactions
Database System Architectures
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors : Matthias Wiesmann, Fernando Pedonet, Andr Schiper, Bettina Kemmet, Gustavo Alonso As summarized by : Preethi Vishwanath San Jose State Computer Science

2 Data Replication The copying of data to and from sites to improve local service response times and availability Challenges Introduce replication without severely affecting performance Introduce replication without severely affecting performance Technique used Currently –Lazy replication *1 AdvantageEfficientDisadvantage Compromises consistency –Eager replication *2 Advantage Guarantees consistency Disadvantage Most existing protocols have a prohibitive cost Distributed System –Cost effective way to increase availability Database System –Performance –Fault-tolerant purposes Lazy replication is very commonly used, however this is not consistent enough Consistency can be observed by performing eager replication, but t is not easy to design efficient eager replication protocols until recent. *1 *1 Lazy replication: asynchronously propagates replica updates to other nodes after replicating transaction comments. *2 *2 Eager replication: all replicas are kept synchronized by updating all replicas in single transaction.

System Model and Architecture 1. Consider set of database clients and a set S = {s 1,s 2,….,s n } of database servers. 2. Database is fully replicated on every server s i 3. Client connects to one of the database servers, to execute a transaction *1. 4. In case of interactive transaction (transaction submitted operation by operation), after submitting an operation, the client waits for an answer from the server, after which it send the next operation. 4. In case of interactive transaction (transaction submitted operation by operation), after submitting an operation, the client waits for an answer from the server, after which it send the next operation. Sending the transaction as a single message is called a service request, that is, a call to a procedure stored on the database servers Once the transaction is completed, the server si sends the transaction outcome to the client and the connection between the client and si is closed If the transaction was submitted operation by operation, the outcome is a commit or abort confirmation. In case of a service request, the transaction outcome also includes the results of the request If si fails during the execution of the transaction, the transaction aborts In this case, it is up to the client to retry the execution, either by connecting to a different database server sj, or to the same server si later (after si recovers). Correctness Criteria One copy serialization ie One copy serialization ie It ensures that any interleaved execution of transactions is equivalent to a serial execution of these transactions on a single copy of the database. Reliable broadcast Ensures that a message sent by a correct database server, or delivered by a correct database server, is eventually delivered to all correct database servers. Uniform Reliable broadcast Ensures that a message delivered by any server (i.e., it can be correct or fail directly after delivering the message) is eventually delivered to all correct database servers. *1 : Sequence of read and/or write operation followed by a commit or abort. Queries : Only read present in Transaction Update transaction : Contain read and write requests

Server Architecture Where transaction is executed in the 1 st place Primary Copy replication –Requires specific site (Primary copy) associated with each data item. –Update should be 1 st send to primary copy. –Primary copy then propagates to other sites. –Introduces single point failure and bottleneck. –If Primary crashes, one of the other servers takes over the role of primary which requires an election protocol. –To avoid, bottlenecks, don’t make a single site the primary, instead data is partitioned and different sites become the primary for different data subsets. Update everywhere replication –Allows updates to a data item to be performed anywhere in the system. –Update everywhere approaches are more graceful when dealing with failures since no election protocol is necessary to continue processing. –Introduces no performance bottlenecks. –May require that instead of one site doing the work (the primary copy) all sites do the same work. –If not careful with design, update everywhere may affect performance much more than primary copy approaches. Server Interaction Degree of communication among database servers during the execution of a transaction. Constant Interaction – –Corresponds to techniques where a constant number of messages is used to synchronize the servers for a given transaction, independently of the number of operations in the transaction. –Protocols in this category exchange a single message per transaction by grouping all operations of the transaction in a single message. Linear Interaction –Corresponds to techniques where a database server propagates each operation of a transaction on a per operation basis. –The operations can be sent either as SQL statements or as log records containing the results of having executed the operation in a particular server. Transaction Termination How atomicity is guaranteed Voting termination –requires an extra round of messages to coordinate the different replicas. –Round could either be complex (2PC) or as simple as a single confirmation send by a given site. Non-voting termination Non-voting termination –sites can decide on their own whether to commit or abort a transaction. –Non-voting techniques require replicas to behave deterministically –Transactions or operations that do not conflict can be executed in different orders at different sites.

Plethora of Replication Techniques – Update Everywhere Replication Update Everywhere - Constant Interaction -Non-Voting Techniques 1. 1.The transaction starts on the delegate server The transaction is processed in a non- deterministic way The determinism point is reached The transaction is sent to all servers using an atomic 5. 5.Processing continues on all replicas in a deterministic 6. 6.Each replica terminates the transaction in the same way Update Everywhere - Constant Interaction - Voting Techniques 1. 1.The transaction starts on the delegate server The transaction is processed in a non- deterministic way The transaction is broadcast to all servers Processing continues on all replicas A voting termination phase takes place Each replica terminates the transaction according to the voting protocol Update everywhere - Linear Interaction - Non- Voting Techniques 1.The transaction starts on the delegate server. 2.The first operation is sent to all servers using an atomic broadcast. 3.The first operation is executed on all servers. 4.Items (2) and (3) are repeated until the transaction ends. 5.The delegate sends a termination message to indicate the end of the transaction Update everywhere - Linear Interaction – Voting Techniques 1.The transaction starts on the delegate server. 2.Each operation is broadcast to a quorum of sites. 3.Each operation is executed on its quorum. 4.Items (2) and (3) are repeated until the transaction ends. 5.A voting protocol is executed. 6.Each replica terminates the transaction according to the voting protocol Determinism point The point in the execution of a transaction after which the processing will be deterministic.

Plethora of Replication Techniques – Primary Copy Replication Primary Copy - Constant Interaction - Non-Voting 1. 1.The transaction is executed at the primary When the transaction terminates, the corresponding log 3. 3.The primary commits the transaction without waiting for 4. 4.The backups eventually install the changes Primary Copy - Constant Interaction – Voting 1. 1.The transaction is executed at the primary When the transaction terminates, the corresponding log 3. 3.records are broadcast to all backups The primary initiates a 2PC protocol The transaction is installed and committed at all sites Primary Copy - Linear Interaction - Non-voting 1.The transaction starts at the primary. 2.Read operations are executed locally. 3.The results of write operations are broadcast to the backups 4.A termination message indicates the end of the transaction Primary Copy - Linear Interaction - Voting 1.The transaction starts at the primary. 2.Read operations are executed locally. 3.The results of write operations are broadcast to the backups. 4.The primary starts a 2PC protocol. 5.The transaction is installed and committed at all sites. Backup and Replication in database Database primary copy techniques are classified along two dimensions: atomicity of transactions and recovery time

Discussions – Comparisons and Classifications between different approaches Server Architecture: Primary Copy vs. Update Everywhere – –Update everywhere does not cause bottlenecks – –Most replication protocols today are primary copy techniques. – –Update everywhere does not necessarily distribute the load among sites, Since data is replicated, all sites need to perform the updates. – –Unless there is a significant amount of read operations in the overall load, the system might not scale up as more server nodes are added. – –Performance can be improved by preprocessing operations at one site and sending the results to the other sites. That way, the processing does not need to be done everywhere. Server Interaction: Constant vs. Linear –rule of thumb : less messages exchanged per transaction, the better. –Protocols based on linear interaction in combination with update everywhere are largely infeasible in databases. –In the primary copy case, consequences of how many messages are exchanged are not negligible. –sending all updates in one message at the end of the transaction can help to propagate the changes of those transactions that actually commit. –Exchanging one message per transaction Works especially well for service requests where the data to be accessed is known in advance; In this case implementation is straightforward and abort rates are small. For ordinary transactions, some form of optimism must be used to first execute the transaction at the delegate server and then determine the serialization order. If the conflict rates are high, this optimism might result in high abort rates.

Transaction Termination Voting 2 possibilities – –Based on 2PC – –Based on a confirmation message sent by the delegate or primary copy to indicate whether the transaction can be committed or must be aborted. – –confirmation message is needed when only the delegate server (or the primary copy) of a transaction can unilaterally decide on the outcome of the transaction. Non-Voting techniques – –More demanding in terms of determinism requirements than voting techniques – –Each server must independently guarantee the same serialization as that of other servers; performed using total order as a guideline.

Conclusions 1. 1.Update everywhere has a good potential of distributing the load among the sites Since linear interactions seem to be a significant source of overhead, realistically speaking, the only options left are approaches based on constant interactions A judicious choice of the determinism point helps in designing protocols but, with minimal additional cost, there is always the possibility of using voting strategies where the determinism requirements are greatly reduced.