Distributed Systems CS

Distributed Systems CS 15-440
Replication – Part II Lecture 20, November 22, 2017 Mohammad Hammoud

Today… Last Session: Today’s Session: Announcements:
Replication- Part I Data-Centric Consistency Models Today’s Session: Replication- Part II Client-Centric Consistency Models Consistency Protocols Announcements: Quiz II and PS5 grades are out P4 is due on Monday, Nov 27 by midnight PS6 is due on Tuesday, Nov 28 by midnight The final exam is on Thursday, Nov 30 from 4:00 to 7:00PM in the classroom. It is open book, open notes.

Overview Motivation Consistency Models Consistency Protocols
Last lecture Motivation Consistency Models Data-Centric Consistency Models Client-Centric Consistency Models Consistency Protocols Today’s lecture

Data-Centric Consistency Models Client-Centric Consistency Models Consistency Protocols

Client Consistency Guarantees
Client-centric consistency provides guarantees for a single client for its accesses to a data-store Example: Providing consistency guarantees to a client process for data x replicated on two servers. Let xi be the local copy of a data x at server Li WS(x1) x+=2 x-=1 x*=5 W(x1)0 W(x1)2 W(x1)1 W(x1)5 L1 WS(x1;x2) x-=2 W(x2)0 R(x2)5 WS(x1) L2 W(x2)3 WS(x1) = Write Set for x1 = Series of ops being done at some replica that reflects how x1 was updated at L1 till this time WS(x1;x2) = Write Set for x1 and x2 = Series of ops being done at some replica that reflects how x1 was updated at L1 and, later on, how x2 was updated on L2 Li = Read variable x at replica i; Result is b = Write variable x at replica i; Result is b = Replica i R(xi)b W(x)b WS(xi) = Write Set

We will study four types of client-centric consistency models Monotonic Reads Monotonic Writes Read Your Writes Write Follow Reads

Overview Consistency Models Data-centric Client-centric Client Consistency Guarantees Monotonic Reads Monotonic Writes Read Your Writes Write Follow Reads

Result of R(x2) should at least be as recent as R(x1)
Monotonic Reads This model provides guarantees on successive reads If a client process reads the value of data item x, then any successive read operation by that process should return the same or a more recent value for x Order in which client process carries out the operations WS(x1) R(x1) L1 WS(x1;x2) R(x2) L2 Result of R(x2) should at least be as recent as R(x1)

Monotonic Reads – Puzzle
Recognize data-stores that provide monotonic read guarantees L1 L2 WS(x1) WS(x1;x2) R(x2)6 R(x1)5 FIGURE 1 W(x2)6 L1 L2 WS(x1) WS(x2) R(x2)6 R(x1)5 FIGURE 2 W(x2)6 L1 L2 WS(x1) WS(x1;x2) R(x2)6 R(x1)5 FIGURE 3 W(x2)6 W(x2)7 R(x1)7 WS(x2;x1)

This data-store does not provide monotonic write consistency
Monotonic Writes This consistency model ensures that writes are monotonic A write operation by a client process on a data item x is completed before any successive write operation on x by the same process A new write on a replica should wait for all old writes on any replica W(x1) L1 L2 W(x2) W(x1) This data-store does not provide monotonic write consistency L1 WS(x1) W(x2) L2 W(x2) operation should be performed only after the result of W(x1) has been updated at L2

A data-store that does not provide Read Your Write consistency
Read Your Writes The effect of a write operation on a data item x by a process will always be seen by a successive read operation on x by the same process Example scenario: In systems where password is stored in a replicated data-base, the password change should be propagated to all replicas L1 L2 WS(x2) R(x2) W(x1) A data-store that does not provide Read Your Write consistency W(x1) L1 WS(x1;x2) R(x2) L2 R(x2) operation should be performed only after propagating WS(x1) to L2

Write Follow Reads A write operation by a process on a data item x following a previous read operation on x by the same process is guaranteed to take place on the same or a more recent value of x that was read Example scenario: Users of a newsgroup should post their comments only after they have read the article and (all) previous comments WS(x1) R(x1) L1 L2 WS(x2) W(x2) R(x1) A data-store that does not guarantee Write Follow Read Consistency Model WS(x1) L1 WS(x1;x2) W(x2) L2 W(x2) operation should be performed only after all previous writes have been propagated

Data-Centric Consistency Models Client-Centric Consistency Models Consistency Protocols

Consistency Protocols
A consistency protocol describes the implementation of a specific consistency model (e.g., strict consistency) We will study 2 types of consistency protocols: Primary-based Protocols One primary coordinator is elected to control replication across multiple replicas Replicated-write Protocols Multiple replicas coordinate to provide consistency guarantees

Replica Control Protocols Primary-Based Protocols Replicated-Write Protocols

Primary-Based Protocols
In primary-based protocols, a simple centralized design is used to implement consistency models Each data-item x has an associated “primary replica” The primary replica is responsible for coordinating write operations We will study one example of primary-based protocols that implements the Strict Consistency Model The Remote-Write Protocol When the consistency models become complex, designing distributed consistency protocols are difficult For the ease of development, simple protocols are often widely used

Remote-Write Protocol
Two Rules: All write operations are forwarded to the primary replica Read operations are carried out locally at each replica Approach for write operations: Client connects to some replica RC If the client issues write operation to RC RC forwards the request to the primary replica RP, which Updates its local value Then forwards the update to other replicas Ri Other replicas Ri perform updates, and send ACKs back to RP After RP receives all ACKs, it informs RC that the write operation was successful RC acknowledges the client, stating that the write operation was successful x+=5 Client 1 Primary Replica R1 R2 R3 x1=0 x1=5 x2=0 x2=5 x3=0 x3=5 Data-store

Remote-Write Protocol – Discussion
The Remote-Write Protocol Provides a simple way to implement strict consistency Guarantees that clients see always the most recent values However, latency is high in the Remote-Write Protocol The client blocks until all the replicas are updated In what scenarios would you use the Remote-Write protocol? Typically, for distributed databases and file systems in data-centers (i.e., in LAN settings) Replicas are placed on the same LAN to reduce latency

Primary-Based Protocols Remote-Write Protocol Replicated-Write Protocols

Replicated-Write Protocols
In replicated-write protocols, updates can be carried out at multiple replicas We will study two examples of the replicated-write protocols Active Replication Protocol Clients write at any replica (no primary replicas) The altered replica will propagate updates to other replicas Quorum-Based Protocol A voting scheme is used

Active Replication Protocol
Protocol: when a client writes at a replica, the replica will send the update to all other replicas Challenges with Active Replication Ordering of operations can differ leading to conflicts/inconsistencies So how to maintain consistent ordering? x+=2 x*=3 Client 1 Client 2 W(x) R2 R3 R1 R(x)2 R(x)0 R(x)6 x+=2 x*=3 R1 R2 R3 x1=0 x1=6 x1=2 x2=6 x2=0 x2=2 x3=0 x3=2 x3=6 Data-store

Centralized Active Replication Protocol
A Possible Approach: Elect a centralized coordinator (let us call it sequencer (Seq)) When a client connects to a replica RC and issues a write operation RC forwards the update to Seq Seq assigns a sequence number to the update operation RC propagates the sequence number and the operation to other replicas Operations are carried out at all replicas in the order of the sequence numbers x+=5 x-=2 Client 1 Client 2 10 Seq R1 R2 11 R3 10 x+=5 11 x-=2 Data-store

Replicated-Write Protocols
In replicated-write protocols, updates can be carried out at multiple replicas We will study two examples of the replicated-write protocols Active Replication Protocol Clients write at any replica (no primary replicas) The replica will propagate updates to other replicas Quorum-Based Protocol A voting scheme is used

Quorum-Based Protocols
Replicated writes can also be accomplished via using a voting scheme, originally proposed by Thomas (1979) then generalized by Gifford (1979) Basic Idea (Recap): Clients are required to request and acquire the permission of multiple servers before either reading or writing from or to a replicated data item Rules on reads and writes should be established Each replica is assigned a version number, which is incremented on each write

Working Example: Consider a distributed file system and suppose that a file is replicated on N servers Write Rule: A client must first contact N/2 + 1 servers (a majority) before updating a file Once majority votes are attained, the file is updated and its version number is incremented This is pursued at replica sites

Working Example: Consider a distributed file system and suppose that a file is replicated on N servers Read Rule: A client must contact N/2 + 1 servers, asking them to send their version numbers of its requested file If all the version numbers are equal, this must be the most recent version of the file

Gifford's scheme generalizes Thomas’ one Gifford’s Scheme: Read Rule: A client needs to assemble a read quorum, which is an arbitrary collection of any NR servers, or more Write Rule: To modify a file, a write quorum of at least NW servers is required

The values of NR and NW are subject to the following two constraints: Constraint 1 (or C1): NR + NW > N Constraint 2 (or C2): NW > N/2 Claim: C1 prevents read-write (RW) conflicts C2 prevents write-write (WW) conflicts Another protocol was proposed by Lamport in 1998 and referred to as Paxos

Assumptions in Paxos Paxos assumes asynchronous, non-Byzantine (more on this under fault-tolerance) model, in which: Processes: Operate at arbitrary speeds May fail by stopping, but may restart Since any process may fail after a value is chosen and then restart, a solution is impossible unless some information can be remembered (e.g., through logging) by a process that has failed and restarted Messages: May be lost, duplicated, delayed (and thus reordered), but not corrupted

Roles in Paxos Processes can take different roles: Client:
Issues a request (e.g., write on a replicated file) to the distributed system and waits for a response Proposer (or a process bidding to become a coordinator/leader): Advocates for a Client and suggests values for consideration by Acceptors Acceptor (or a voter): Considers the values proposed by Proposers and renders an accept/reject decision Learner: Once a Client’s request has been agreed upon by the Acceptors, the Learner can take action (e.g., execute the request and send a response to the Client)

Quorums in Paxos Any message sent to an Acceptor must be sent to a quorum of Acceptors consisting of more than half of all Acceptors (i.e., majority-- not unanimity) Any two quorums should have a nonempty intersection Common node acts as “tie-breaker” This helps avoid the “split-brain” problem (or a situation when Acceptors’ decisions are not in agreement) In a system with 2m+1 Acceptors, m Acceptors can fail and consensus can still be reached

Paxos Algorithm: Phase I
Step 1: Prepare The Proposer selects a unique sequence (or round) number n and sends a prepare(n) request to a quorum of Acceptors Step 2: Promise Each acceptor does the following: If n > (the sequence number of any previous promises or acceptances) It writes n to a stable storage, promising that it will never accept any future proposed number less than n It sends a promise(n, (N, U)) response, where N and U are the last sequence number and value it accepted so far (if any) Note that multiple processes can bid to become coordinators Hence, how can each coordinator select a unique sequence number? Every process, P, can be assigned a unique IDP, between 0 and k – 1, assuming a total of k processes P can select the smallest sequence number, s, that is larger than all sequence numbers seen thus far, so that s % k = IDP E.g., P will pick a sequence number of 23 for its next bid if IDP = 3, k = 5, and largest number seen = 20

Paxos Algorithm: Phase I
Step 1: Prepare The Proposer selects a unique sequence (or round) number n and sends a prepare(n) request to a quorum of Acceptors Step 2: Promise Each Acceptor does the following: If n > (the sequence number of any of its previous promises or acceptances) It writes n to a stable storage, promising that it will never accept any future proposed number less than n It sends a promise(n, (N, U)) response, where N and U are the last sequence number and value it accepted so far (if any)

Example Quorum Size = 3, which is decided by the proposer Client
Acceptor Acceptor Acceptor request prepare(n) Quorum Size = 3, which is decided by the proposer promise(n, NULL) promise(n, NULL) promise(n, NULL)

Example Quorum Size = 2, which is the min acceptable quorum size
Client Proposer Acceptor Acceptor Acceptor request prepare(n) Quorum Size = 2, which is the min acceptable quorum size in this example promise(n, NULL) promise(n, NULL)

Paxos Algorithm: Phase II
Step 1: Accept If the Proposer receives promise responses from a quorum of Acceptors, it sends an accept(n, v) request to those Acceptors (v is the value of the highest-numbered proposal among the promise responses, or any value if no promise contained a proposal) Step 2: Accepted Each acceptor does the following: If n >= the number of any previous promise It writes (n, v) to a stable storage, indicating that it accepts the proposal It sends an accepted(n, v) response Else It does not accept (it sends a NACK)

Paxos Algorithm: Phase II
Step 1: Accept If the Proposer receives promise responses from a quorum of Acceptors, it sends an accept(n, v) request to those Acceptors (v is the value of the highest-numbered proposal among the promise responses, or any value if no promise contained a proposal) Step 2: Accepted Each Acceptor does the following: If n >= the number of any previous promise It writes (n, v) to a stable storage, indicating that it accepts the proposal It sends an accepted(n, v) response Else It does not accept (it sends a NACK)

But, an Acceptor can accept multiple concurrent proposals!
Example Client Proposer Acceptor Acceptor Acceptor request prepare(n) promise(n, NULL) promise(n, NULL) accept(n, v) accepted(n, v) accepted(n, v) But, an Acceptor can accept multiple concurrent proposals!

Example Proposer Proposer Acceptor Acceptor Acceptor prepare(1) promise(1, NULL) promise(1, NULL) prepare(2) promise(2, NULL) promise(2, NULL) accept(1, A) accepted(1, A) NAK(1) accept(2, B) accepted(2, B) accepted(2, B) But, what if before the blue Proposer sends its accept message, another Proposer (could be the green one again) submits a new proposal with a higher sequence number?

The blue round will fail also!
Example Proposer Proposer Acceptor Acceptor Acceptor prepare(1) promise(1, NULL) promise(1, NULL) prepare(2) promise(2, NULL) promise(2, NULL) accept(1, A) accepted(1, A) NAK(1) accept(2, B) accepted(2, B) accepted(2, B) The blue round will fail also!

What if this keeps happening?
Example Proposer Proposer Acceptor Acceptor Acceptor prepare(1) promise(1, NULL) promise(1, NULL) prepare(2) promise(2, NULL) promise(2, NULL) accept(1, A) accepted(1, A) NAK(1) accept(2, B) accepted(2, B) accepted(2, B) What if this keeps happening?

Paxos will not commit until this scenario stops!
Example Proposer Proposer Acceptor Acceptor Acceptor prepare(1) promise(1, NULL) promise(1, NULL) prepare(2) promise(2, NULL) promise(2, NULL) accept(1, A) accepted(1, A) NAK(1) accept(2, B) accepted(2, B) accepted(2, B) Paxos will not commit until this scenario stops!

A Note on Liveness If two Proposers keep concurrently issuing proposals with increasing sequence numbers, none of them will succeed Hence, Paxos cannot guarantee liveness (i.e., cannot guarantee that a proposed value will be chosen within a finite time) Is there a way liveness can be guaranteed in Basic Paxos? Short Answer: No But: We can apply an optimization to potentially expedite (not guarantee) liveness in the presence of multiple concurrent Proposers

A Note on Liveness To expedite liveness:
A distinguished Proposer can be selected as the only entity to try submitting proposals If this distinguished Proposer: Can communicate successfully with a majority of Acceptors And uses a sequence number that is greater than any number used already Then it will succeed in issuing a proposal that can be accepted, assuming enough of the system (Proposer, Acceptors, and network) is working properly Clearly, liveness remains impossible to guarantee in finite time since any component in the system could fail (e.g., a network partition can arise)

Network Partitions A simple, dedicated LAN Server S1 Server S3 Network Partition Server S2 The failure of a communication medium/device (e.g., router) between two networks is known as a network partition Over simple LANs, processes at different partitions may get fully disconnected E.g., S3 and S2 may get fully disconnected, but S1 and S2 can still communicate

Network Partitions Server S1 Server S3 A complex WAN Network Partition Server S2 The failure of a communication medium/device (e.g., router) between two networks is known as a network partition Over a network with complex topologies and independent routing choices, connectivity may render: Asymmetric: S1 can communicate with S3, but not vice versa Intransitive: S2 can communicate with S1, and S1 can communicate with S3, but S2 cannot communicate with S3

Possible Failures in Paxos
Would a network partition impact Paxos’s correctness (NOT liveness)? No, due to the quorum mechanism What if an Acceptor fails? Case 1: The Acceptor is not a member of the Proposer’s quorum No recovery is needed Case 2: The Acceptor is a member of the Proposer’s quorum, but quorum size > majority of Acceptors

Would a network partition impact Paxos’s correctness? No, because of the quorum mechanism, which entails that at most one partition will be able to construct a majority What if an Acceptor fails? Case 3: The Acceptor is a member of the Proposer’s quorum and quorum size equals to the majority of Acceptors Sub-case 3.1: The Acceptor fails after accepting the proposal No recovery is needed, assuming the Proposer will receive (or has received already) its acceptance message Sub-case 3.2: The Acceptor fails before accepting the proposal Worst case: New quorum and round can be established

What if a Proposer fails? Case 1: The Proposer fails after proposing a value, but before a consensus is reached New Proposer can take over Case 2: The Proposer fails after a consensus is reached, but before it gets to know about it Either its failure gets detected and a new round is launched Or, it recovers and starts a new round itself Case 3: The Proposer fails after a consensus is reached and after it gets to know about it (but before letting the Learner knowing) Or, it recovers and learns again from its stable storage that it have succeeded in its bidding

Next Lecture Fault-tolerance

Distributed Systems CS

Similar presentations

Presentation on theme: "Distributed Systems CS"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Systems CS

Similar presentations

Presentation on theme: "Distributed Systems CS"— Presentation transcript:

Similar presentations

About project

Feedback