CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication.

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.
Consistency and Replication (3). Topics Consistency protocols.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture X: Transactions.
Byzantine Generals Problem: Solution using signed messages.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture IX: Coordination And Agreement.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS 582 / CMPE 481 Distributed Systems
CMPT 431 Dr. Alexandra Fedorova Lecture XII: Replication.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Primary-Backup Systems CS249 FALL 2005 Sang Soo Kim.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
1 CMPT 471 Networking II DHCP Failover and multiple servers © Janice Regan,
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.
Practical Byzantine Fault Tolerance
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Fault Tolerant Services
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Fault Tolerance and Replication
Distributed Systems CS Consistency and Replication – Part I Lecture 10, September 30, 2013 Mohammad Hammoud.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Consensus and leader election Landon Cox February 6, 2015.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Spring 2003CS 4611 Replication Outline Failure Models Mirroring Quorums.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Distributed Computing Systems Replication Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
Replication Chapter Katherine Dawicki. Motivations Performance enhancement Increased availability Fault Tolerance.
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
Lecturer : Dr. Pavle Mogin
Outline Announcements Fault Tolerance.
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
Distributed Systems CS
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
Implementing Consistency -- Paxos
Presentation transcript:

CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture XII: Replication

2 CMPT 401 Summer 2007 © A. Fedorova Replication

3 CMPT 401 Summer 2007 © A. Fedorova Why Replicate? (I) q Fault-tolerance / High availability I As long as one replica is up, the service is available I Assume each of n replicas has same independent probability p to fail. l Availability = 1 - p n Fault-Tolerance: Take-Over

4 CMPT 401 Summer 2007 © A. Fedorova Why Replicate? (II) Fast local access (WAN replication) –client can always send requests to closest replica –Goal: no communication to remote replicas necessary during request execution –Goal: client experiences location transparency since all access is fast local access Fast local access Toronto Montreal Rome

5 CMPT 401 Summer 2007 © A. Fedorova Why Replicate? Scalability and load distribution (LAN replication) –Requests can be distributed among replicas –Handle increasing load by adding new replicas to the system cluster instead of bigger server

6 CMPT 401 Summer 2007 © A. Fedorova Challenges: Data Consistency We will study systems that use data replication It is hard, because data must be kept consistent Users submit operations against the logical copies of data These operations must be translated into operations against one, some, or all physical copies of data Nearly all existing approaches follow a ROWA(A) approach: –Read-one-write-all-(available) –Update has to be (eventually) executed at all replicas to keep them consistent –Read can be performed at any replica

7 CMPT 401 Summer 2007 © A. Fedorova Challenges: Fault Tolerance The goal is to have data available despite failures If one site fails others should continue providing service How many replicas should we have? It depends on: –How many faults we want to tolerate –The types of faults we expect –How much we are willing to pay

8 CMPT 401 Summer 2007 © A. Fedorova Roadmap Replication architectures –Active replication –Primary-backup (passive, master-slave) replication Design considerations for replicated services Surviving failures

9 CMPT 401 Summer 2007 © A. Fedorova Active Replication Replicated Servers A A Client B C AA

10 CMPT 401 Summer 2007 © A. Fedorova Active Replication

11 CMPT 401 Summer 2007 © A. Fedorova Active Replication 1. The client send request to the servers using totally ordered reliable multicast (logical clocks or vector clocks) 2. Server coordination is given by the total order property (assumption: synchronous system) 3. All replicas execute the request in the order they are delivered 4. No additional coordination necessary (Assumption: determinism) q All replicas produce the same result 5. All replicas send result to the client; client waits for the first answer

12 CMPT 401 Summer 2007 © A. Fedorova Fault Tolerance: Failstop Failures As long as at least one replica survives the client will continue receiving service Assuming there are no partitions! Suppose B and C are partitioned, so the cannot communicate They cannot agree on how to order client’s requests Replicated Servers A A Client B C AA

13 CMPT 401 Summer 2007 © A. Fedorova Fault Tolerance: Byzantine Failures Can survive Byzantine failures (assuming no partitions) The system must have n ≥ 2f + 1 replicas (f is the number of failures) The client will compare results of all replicas, will choose the result returned by the majority f + 1 non-faulty replicas This is the idea used in LOCKSS (Lots of Copies Keep Stuff Safe)

14 CMPT 401 Summer 2007 © A. Fedorova Primary-Backup Replication (PB) Replicated Servers A A Client primary backup A B A C Also known as passive replication If the primary fails, a backup takes over, becomes the primary

15 CMPT 401 Summer 2007 © A. Fedorova System Requirements How do we want the system to behave? Just like a single-server system? –Must ensure that there is only one primary at a time Data is kept consistent: –If a client received a response from an update operation and then the system crashed, the client should find the data reflecting that update –Results of operations should be the same as they would be if executed on a single-server system Can we tolerate loose data consistency? –The client eventually gets the consistent data, but not right away

16 CMPT 401 Summer 2007 © A. Fedorova Example of Data Inconsistency Client operations: write(x = 5) read (x) // should return 5 on a single-server system On a replicated system: write (x = 5) Primary responds to client Primary crashed before propagating update to other replicas A new primary is selected read (x) // may return x ≠ 5, the new primary does not know about the update to x

17 CMPT 401 Summer 2007 © A. Fedorova Design Considerations for Replicated Services Where to submit updates? –A designated server or any server? When to propagate updates? –Eager or lazy? How many replicas to install?

18 CMPT 401 Summer 2007 © A. Fedorova Where to Submit Updates? Primary Copy: - Each object has a primary copy - Often there is a designated primary - it holds primary copies for all objects - Updates on object x have to be submitted to the primary copy of x - Primary propagates changes on x to secondary copies - Secondary copies are read-only - Also called master/slave approach

19 CMPT 401 Summer 2007 © A. Fedorova Where to Submit Updates Update Everywhere: –Both read and write operations can be submitted to any server –This server takes care of the execution of the operation and the propagation of updates to the other copies T2:r(y)w(y) T1:r(x)w(y)

20 CMPT 401 Summer 2007 © A. Fedorova When to Propagate Updates? Eager : –Within the boundaries of the transaction for replicated databases –Before response is sent to client for non-transactional services Lazy: –After the commit of the transaction for replicated databases –After the response is sent to client for non-transactional services

21 CMPT 401 Summer 2007 © A. Fedorova PB Replication with Eager Updates 1.The client sends the request to the primary 2.There is no initial coordination 3.The primary executes the request 4.The primary coordinates with the other replicas by sending the update information to the backups 5.The primary (or another replica) sends the answer to the client Updates are propagated eagerly, before we respond to client

22 CMPT 401 Summer 2007 © A. Fedorova Eager Update Propagation

23 CMPT 401 Summer 2007 © A. Fedorova Eager Update Propagation For Transactional Services On every update At the end of transaction

24 CMPT 401 Summer 2007 © A. Fedorova When Can a Failure Occur? F1: Primary fails before replica coordination –Client receives no response. It will retry. Eventually will get data from new primary. F2: Primary fails during replica coordination –Replicas may or may not have reached agreement w.r.t. client’s transaction. Client may receive a response after system recovers. The system may fail to recover (if the agreement protocol blocks). F3: Primary fails after replica coordination –A new primary responds Phase 1: Client Request Phase 3: Execution Phase 4: Replica Coordination Phase 5: Client response F1F2 F3

25 CMPT 401 Summer 2007 © A. Fedorova Lazy Update Propagation (Transactional Services) Primary Copy: –Upon read: read locally and return to user –Upon write: write locally and return to user –Upon commit/abort: terminate locally –Sometime after commit: multicast changed objects in a single message to other sites (in FIFO) Secondary copy: –Upon read: read locally –Upon message from primary copy: install all changes (FIFO) –Upon write from client: refuse (writing clients must submit to primary copy) –Upon commit/abort request (only for read-only txn): local commit Note: existing systems allow different objects to have different primary copies –A transaction that wants to write X (primary copy is site S1) and Y (primary copy on site S2) is usually disallowed

26 CMPT 401 Summer 2007 © A. Fedorova Lazy Update Propagation A client may end up with an inconsistent view of the system

27 CMPT 401 Summer 2007 © A. Fedorova Lazy Propagation: Discussion Lazy replication has no server/agreement coordination within response time –Faster –Transactions might be lost in case of primary crash Weak data consistency –Simple to achieve –Secondary copies only need to apply updates in FIFO order –Data at secondary copies might be stale Multiple Primaries possible (multi-master replication) – More locality

28 CMPT 401 Summer 2007 © A. Fedorova How Many Replicas? Properties of correct PB protocol –Property 1: There is at most one primary at any time –Property 2: Each client maintains the identity of the primary, and sends its requests only to the primary –Property 3: If a client update arrives at a backup, it is not processed When a primary fails, we must elect a new one Network partitions may cause election of more than one primary We can avoid network by choosing the right number of replicas (under certain failure assumptions) How many replicas do we need to tolerate failures?

29 CMPT 401 Summer 2007 © A. Fedorova System Model Synchronous system (useful for deriving theoretical results) Fully connected network (exactly one FIFO link between any two processes) Failure model: –Crash failures: also known as failstop failures –Crash+Link failures: A server may crash or a link may lose messages (but links do not delay, duplicate or corrupt messages) –Receive-Omission failures: A server may crash and also omit to receive some of the messages send over a non-faulty link –Send-Omission failures: A server may fail not only by crashing but also by omitting to send some messages over a non-faulty link –General-Omission failures: A server may exhibit send-omission and receive-omission failures

30 CMPT 401 Summer 2007 © A. Fedorova Lower Bounds on Replication How many replicas n do you need to tolerate f failures? Failure ModelDegree of Replication crashn > f crash+linkn > f+1 receive-omissionn > send-omissionn > f general-omissionn > 2f

31 CMPT 401 Summer 2007 © A. Fedorova Crash Failures, Send-Omission Failures: n > f Replicas FAILED (crashed or fail to send) Becomes primary

32 CMPT 401 Summer 2007 © A. Fedorova Other Failure Models The rest of the failure models may create partitions Partitions: Servers are divided into mutually non- communicating partitions A primary may emerge in each partition, so we’ll have more than one primary – against the rules To avoid partitions, we use more replication

33 CMPT 401 Summer 2007 © A. Fedorova Crash+Link Failures: n > f+1 Replicas Scenario 1: f servers fail FAILED Scenario 2: f links fail Becomes primary UNREACHABLE BUT ALIVE Becomes primary Problem! 2 primaries!!!

34 CMPT 401 Summer 2007 © A. Fedorova Crash+Link Failures: n > f+1 Replicas Becomes primary UNREACHABLE BUT ALIVE Becomes primary We need another correct node that would serve as a link between the two partitions We can assume that its links will be correct, because we allow no more than f failures

35 CMPT 401 Summer 2007 © A. Fedorova Omission Failures Precise definitions of omission failures[Perry-Toueg86] Notation: sent(Pj, Pi) – a message sent from Pj to Pi received(Pi, Pj) – a message received by Pi from Pj Receive-omission failure of Pi with respect to Pj: sent(Pj, Pi) ≠ received(Pi, Pj) Send-omission failure of Pi with respect to Pj: Pi fails to send a message prescribed by a protocol to Pj General-omission failure of Pi w.r.t. Pj Pi commits both receive-omission and send-omission w.r.t. Pj

36 CMPT 401 Summer 2007 © A. Fedorova Receive-Omission Failures: n > 3f/2 Replicas A B C f/2 Server in A becomes primary f servers in B and C fail f/2 FAIL

37 CMPT 401 Summer 2007 © A. Fedorova Receive-Omission Failures: n > 3f/2 Replicas A B C f/2 Server in B becomes primary f servers in A and C fail FAIL f/2

38 CMPT 401 Summer 2007 © A. Fedorova Receive-Omission Failures: n > 3f/2 Replicas A B C Server in B becomes primary Servers in A: receive-omission failures w.r.t. processes outside their partition From A servers’ perspective, everyone else has crashed: partition! Server in A becomes primary Servers in B: receive-omission failures Problem! 2 primaries!!! f/2 Need another non- failed server that links the partitions

39 CMPT 401 Summer 2007 © A. Fedorova General-Omission Failures: n>2f Replicas B A B A FAIL Becomes primary FAIL Becomes primary f f ff

40 CMPT 401 Summer 2007 © A. Fedorova General-Omission Failures: n>2f Replicas A commits general-omission failures w.r.t. servers in B A’s servers think all servers in B failed – one of them becomes primary B’s servers think all servers in A failed – one of them becomes primary A server in A becomes a primary, a server in B becomes a primary: We have two primaries To fix this, we need another non-faulty server that will link the two partitions B A Becomes primary ff

41 CMPT 401 Summer 2007 © A. Fedorova How Many Replicas? Summary We showed how many replicas are needed to prevent partitions in the face of f failures However partitions do happen due to router failures, for example So having extra replicas won’t help, because they will also be on one of the sides of the faulty router Next we’ll talk about surviving failures despite network partitions

42 CMPT 401 Summer 2007 © A. Fedorova Surviving Network Partitions Most systems operate under assumption that a partition will eventually be repaired Optimistic approach: –Allow updates in all partitions –When the partition is repaired, eventually synchronize the data –OK for a distributed file system (think about your laptop in disconnected mode) Pessimistic approach: –Allow updates only in a single partition – used where strong consistency is required (flight reservation system) –Which partition? This is usually decided by quorum consensus –After partition is repaired update copies of data in the other partition

43 CMPT 401 Summer 2007 © A. Fedorova Quorum Consensus Quorum is a sub-group of servers whose size gives it the right to carry out the operation Usually the majority gets the quorum Design/implementation challenges: –Replicas must agree that they are behind a partition – must rely on timeouts, failure detectors (special devices?) –If the quorum set does not contain the primary, the replicas must elect the new primary –Cost consideration: to tolerate one partition, must have at least three servers. Implement one as a simple witness? Quorum

44 CMPT 401 Summer 2007 © A. Fedorova Bringing Replicas Up-to-Date Version numbers: –Each copy has a version number (or a timestamp) –Only copies that are up-to-date have the current version number –Operations should be applied only to copies with the current version number How does a failed server finds out that its not up-to-date? –Periodically compare all version numbers? Log sequence numbers: –Each operation is written to a log (like a transactional log) –Each log record has a log sequence number (LSN) –Replica managers compare LSN’s to find out if they are not up-to- date –Used by Berkeley DB replication system

45 CMPT 401 Summer 2007 © A. Fedorova Summary Discussed replication –Used for performance, high availability Active replication –Client sends updates to all replicas –Replicas co-ordinate amongst themselves, apply updates in order Passive replication (primary copy, primary-backup) –Eager/lazy update propagation –Number of replicas to prevent partitions Handling partitions –Optimistic –Pessimistic (quorum consensus) Next time we will look at real systems that use replication