Distributed Systems Fall 2010 Replication
Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly available services
Fall 20105DV0204 Group communication Static vs. Dynamic groups Primary partition vs. partitionable groups Group management –Interface for membership changes –Failure detection –Notification upon membership changes –Provide group address expansion
Fall 20105DV0205 Group views Views contain a set of members at a given point in time –Failed identified processes are not in the view Events occur in views View-synchronous group communication –Based on view delivery, we can know which messages must have been delivered to other members
Fall 20105DV0206 View-synchronous group communication Correct processes deliver the same set of messages in any given view Messages are delivered at most once Correct processes always deliver messages they send –If delivering to q fails, the next view excludes q
Fall 20105DV0207 Why replication? Many algorithms require a working server node Performance (load balancing) Increased availability 1 – p(all replicas crashed) = 1 – p n Fault-tolerance –Correct servers in majority
Fall 20105DV0208 Replication Replication transparency –Client unaware of replication Problem with >1 client –Concurrent access, rather than exclusive –Operations are interleaved How do we ensure correctness?
Fall 20105DV0209 Correctness of interleavings Always –Interleaved sequence of operations must meet the specification of a single correct copy of the object(s) Sequential consistency property –Order of operations is consistent with the program order in which each individual process executed them Linearizability property –Order of operations is consistent with the real times at which the operations occurred during execution
Fall 20105DV02010 Example (interleaved operations) C1: A, B, C C2: d, e, f Order during execution: A, B, d, C, e, f An interleaving with sequential consistency: A, B, d, e, f, C Interleaving with linearizability: A, B, d, C, e, f
Fall 20105DV02011 Generalized replication 1. Request: client makes request 2. Coordination: replica managers decide upon order of request 3. Execution: request is executed 4. Agreement: replica managers agree on result of execution 5. Response: response is sent back to the client
Fall 20105DV02012 Passive replication One Primary replica manager, many backups If primary fails, backups can take its place (election!) Implements linearizability if: –A failing primary is replaced by a unique backup –Backups agree on which operations had been performed when primary crashed View-synchronous group communication!
Fall 20105DV02013 Passive replication 1. Request: front end issues request with unique ID 2. Coordination: primary checks if request has been carried out, if so, returns cached response 3. Execution: perform operation, cache results 4. Agreement: primary sends updated state to backups 5. Response: primary sends result to front end, which forwards to the client
Fall 20105DV02014 Active replication More distributed All replica managers carry out all operations Requests to RM are totally ordered Front ends issue one request at a time (FIFO) Implements sequential consistency
Fall 20105DV02015 Active replication 1. Request: front end adds unique identifier to request, mcasts it to RMs 2. Coordination: totally ordered request delivery to RMs 3. Execution: each RM executes request 4. Agreement: not needed 5. Response: all RMs respond to front end, front end interprets response and forwards interpretation to client
Fall 20105DV02016 Comparison (Active/Passive) Handling of crash failures? –Both: yes (but differently) Handling of arbitrary failures? –Active: yes, Passive: no Complexity? Optimizations? –Send “reads” to backups in passive Lose linearizability property! –Send “reads” to single backup in active Lose fault tolerance
Fall 20105DV02017 Highly available services Goal is to allow clients to use service for as long as possible –Even if network connections are lost –Even if results may be inconsistent
Gossip Guarantees by Gossip – Each client gets a consistent service over time Replicas will provide data that is fresher than what the client has seen so far – Relaxed consistency between replicas Generally less than sequential consistency Eventually, all updates are applied (in order), but clients may observe stale data Fall DV020
Gossip contd. Covered more in-depth later by Daniel Highly relevant for today’s distributed systems Used by e.g. Facebook for Cassandra (source)source Fall DV020
Fall 20105DV02020 Summary Group communication –Views –View-synchronous group communication Replication –Correctness Linearizability: time Sequential consistency: program order –Passive and active replication schemes
Fall 20105DV02021 Next lecture Transactions –Nested transactions Concurrency control –Locks –Optimistic concurrency control