CS 582 / CMPE 481 Distributed Systems

Name: CS 582 / CMPE 481 Distributed Systems
Uploaded: 2017-12-28T06:18:38+00:00
Duration: PTM9S34
Description: CS 582 / CMPE 481 Distributed Systems

CS 582 / CMPE 481 Distributed Systems
Replication (cont.)

Class Overview Group Model Replication Model Request Ordering
Case study Lazy replication Transactions with replicated data

Lazy Replication Design principle Supported ordering Characteristic
provide a method to support causal ordering for consistency while providing better performance request is executed at one replica as updating of other replicas happens by exchange of gossip messages (lazy replication) Supported ordering causal ordering forced ordering (causal, total ordering) immediate ordering (sync ordering) Characteristic it’s a replication method, not a replication framework no client replication is considered claimed to be better suited for providing replicated services

Lazy Replication (cont)
Operational model FE (front end) client’s request always goes through FE usually communicates with one replica (closely available one) if fails or heavily loaded, contact another keeps vector timestamp attaches it to request updates it to a new time stamp obtained from result or contacts other FEs operation types query: read only operation update: modification without reading state

Replica operation keeps states for consistent data state value value timestamp represents previous updates to validate the current value update log for safety and consistency among replicas replica timestamp represents updates that have been processed request id unique id for update request

Causal operation FE informs replicas of causal ordering of requests every reply message for update request contains a unique id, uid every request takes a set of uids (label) listing all the updates that should have happened before it query returns value and label listing all the updates that should have happened before it FE guarantees causality FE sends its label in every request and merge uid or label in each reply with its label (union of two sets) FE communicates with other clients and merges labels in messages from other FEs with its label gossip message handling at replica gossip message: replica log + replica timestamp Tasks add new info in the message to replica’s log merge replica’s timestamp with message’s timestamp apply any updates that have become stable and have not been executed before discard update records from log if they have been received by all replicas discard redundant entries from log and cid list

Forced ordering provided only for updates implementation causal and total ordering same as causal ordering except that uids are totally ordered with respect to one another one of replicas becomes a primary sequencer; any replica can take over the role when the primary fails forced update operation phase 1: prepare assign uid to update create log record for update and send it to backup replicas which, in turn, acknowledge receipt phase 2: commit when majority of backups acknowledge, primary adds record to its log, applies update to value, and replies to FE backups are informed of commit in next gossip message

Immediate ordering provided only for updates Implementation one of replicas becomes a primary primary coordinates with all replicas to determine what updates precedes sync update and generate label 3 phase operation are used to prevent failure of primary from queries being blocked phase 1: pre-prepare ask every backup to send its log and timestamp once reply for each backup arrives, stop responding to queries phase 2: prepare create a log record for immediate update and send it to backups phase 3: commit if majority of backups acknowledges receipt, then primary adds record to its log, applies update to value, and replies to FE if failure happens, new primary will carry out phase 2 again

Transactions with Replication Data
Replicated transactions transactions in which a physical copy of each logical data item is replicated at a group of servers (replicas) One-copy serializability effects of transactions performed by various clients on replicated data items are the same as if they had been performed one at a time on single data item to achieve this concurrency control mechanisms are applied to all of replicas 2PC protocol becomes two level nested 2PC protocol phase 1 a worker forwards “ready” message to replicas and collects answers phase 2 a worker forward “commit” message to replicas primary copy replication: concurrency control is only applied to primary

Transactions with Replicated Data (cont.)
Available copies replication designed to allow for some replicas being allowed unavailable client’s Read operation is performed on any of available copy but Write operation on all of available copies failures and recoveries of replicas should be serialized to support one-copy serializability local validation concurrency control mechanism to ensure that any failure or recovery event does not appear to happen during the progress of a transaction

Transactions with Replicated Data (cont)
Network partition can separate a group of replicas into subgroup between which communications are not possible assume that partition will be repaired Resolutions available copies with validation quorum consensus virtual partition

Available copies with validation available copies algorithm is applied to each partition after partition is repaired, possibly conflicting transaction is validated version vector can be used to check validity of separately committed data items precedence graphs can be used to detect conflicts between Read and Write operations between partitions only feasible with applications where compensation is allowed

Quorum Consensus [Gifford 1979] operations are only allowed when a certain number of replicas (i.e. quorum) are available in the partition only one partition can allow operations to be committed so as to prevent transactions in different partitions from producing inconsistent results Other sub groups will have out-of-date copies (version no. or time-stamps) Quorum set W > half the total votes R + W > total number of votes for group any pair of read quorum and write quorum must contain common copies, so no conflicting operations on the same copy read operations check if there is enough number of copies ≥ R (maybe out-of-date) perform operation on up-to-date copy write operations check if there is enough number of up-to-date copies ≥ W perform operation on all replicas

Quorum Consensus Example Three examples of the voting algorithm: A correct choice of read and write set A choice that may lead to write-write conflicts A correct choice, known as ROWA (read one, write all)

Virtual partition combination of quorum consensus (to cope with partition) and available copies algorithm (inexpensive Read operation) to support one-copy serializability, transaction aborts if replica fails and virtual partition changes during progress of transaction when a virtual partition is formed, all the replicas must be brought up to date by copying from other replicas virtual partition creation phase 1 initiator sends Join request to each potential replica with logical timestamp each replica compares timestamp of current virtual partition if proposed time stamp is greater than local one, reply yes otherwise, no phase 2 if initiator gets sufficient Yes replies to form read and write quora send confirmation message with list of members each member records timestamp and members

CS 582 / CMPE 481 Distributed Systems

Similar presentations

Presentation on theme: "CS 582 / CMPE 481 Distributed Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 582 / CMPE 481 Distributed Systems

Similar presentations

Presentation on theme: "CS 582 / CMPE 481 Distributed Systems"— Presentation transcript:

Similar presentations

About project

Feedback