Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Client-Centric Consistency Models Figure The principle of a mobile user accessing different replicas of a distributed database. Eventual Consistency
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Client-Centric Consistency Client-centric consistency provides guarantees for a single client concerning the consistency of accesses to a data store by that client. Monotonic reads Monotonic writes Read-your-writes Write-follows-reads Notation: WS(x i [t]) is the set of write operations (at L i ) that lead to version x i of x (at time t); WS(x i [t 1 ]; x j [t 2 ]) where t 1 < t 2 indicates that it is known that WS(x i [t 1 ]) is part of WS(x j [t 2 ]).
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Monotonic Reads A data store is said to provide monotonic-read consistency: If a process reads the value of a data item x any successive read operation on x by that process will always return that same value or a more recent value. Figure The read operations performed by a single process P at two different local copies of the same data store. (a) A monotonic- read consistent data store. (b) A data store that does not provide monotonic reads.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Monotonic Writes In a monotonic-write consistent store, the following condition holds: A write operation by a process on a data item x is completed before any successive write operation on x by the same process. Figure The write operations performed by a single process P at two different local copies of the same data store. (a) A monotonic-write consistent data store. (b) A data store that does not provide monotonic-write consistency.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Read Your Writes A data store is said to provide read-your-writes consistency, if : The effect of a write operation by a process on data item x will always be seen by a successive read operation on x by the same process. Figure (a) A data store that provides read-your-writes consistency. (b) A data store that does not.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Writes Follow Reads A data store is said to provide writes-follow-reads consistency, if : A write operation by a process on a data item x following a previous read operation on x by the same process is guaranteed to take place on the same or a more recent value of x that was read. Figure (a) A writes-follow-reads consistent data store. (b) A data store that does not.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Replica Management Replica server placement Content replication and placement Content distribution
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Replica-Server Placement Essence: Figure out what the best K places are out of N possible locations. Select best location out of N −K for which the average distance to clients is minimal. Then choose the next best server. (Note: The first chosen location minimizes the average distance to all clients.) Computationally expensive. Select the K-th largest autonomous system and place a server at the best-connected host. Computationally expensive. Position nodes in a d-dimensional geometric space, where distance reflects latency. Identify the K regions with highest density and place a server in every one. Computationally cheap.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Replica-Server Placement Figure Choosing a proper cell size for server placement.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Content Replication and Placement Model: We consider objects (and don’t worry whether they contain just data or code, or both) Distinguish different processes: A process is capable of hosting a replica of an object or data: Permanent replicas: Process/machine always having a replica Server-initiated replica: Process that can dynamically host a replica on request of another server in the data store Client-initiated replica: Process that can dynamically host a replica on request of a client (client cache)
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Content Replication and Placement Figure The logical organization of different kinds of copies of a data store into three concentric rings.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Server-Initiated Replicas Keep track of access counts per file, aggregated by considering server closest to requesting clients Number of accesses drops below threshold D ⇒ drop file Number of accesses exceeds threshold R ⇒ replicate file Number of access between D and R ⇒ migrate file
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Content Distribution Model: Consider only a client-server combination : Possibilities for what is to be propagated: 1.Propagate only a notification of an update. 2.Transfer data from one copy to another. 3.Propagate the update operation to other copies. Observation: No single approach is the best, but depends highly on available bandwidth and read- to-write ratio at replicas.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Pull versus Push Protocols Figure A comparison between push-based and pull-based protocols in the case of multiple-client, single-server systems. Pushing updates: server-initiated approach, in which update is propagated regardless whether target asked for it. Pulling updates: client-initiated approach, in which client requests to be updated.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Pull versus Push Protocols Issue: Make lease expiration time dependent on system’s behavior (adaptive leases): Age-based leases: An object that hasn’t changed for a long time, will not change in the near future, so provide a long-lasting lease Renewal-frequency based leases: The more often a client requests a specific object, the longer the expiration time for that client (for that object) will be State-based leases: The more loaded a server is, the shorter the expiration times become
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Consistency Protocols Consistency protocol: describes the implementation of a specific consistency model. Continuous consistency Primary-based protocols Replicated-write protocols
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Continuous Consistency Principle: consider a data item x and let weight(W) denote the numerical change in its value after a write operation W. Assume that ∀ W : weight(W) > 0. W is initially forwarded to one of the N replicas, denoted as origin(W). TW[i, j] are the writes executed by server S i that originated from S j :
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Continuous Consistency Problem: We need to ensure that v(t) − v i < δ i for every server S i. Approach: Let every server S k maintain a view TW k [i, j] of what it believes is the value of TW[i, j]. This information can be gossiped when an update is propagated. Note: 0 ≤ TW k [i, j] ≤ TW[i, j] ≤ TW[j, j] Solution: S k sends operations from its log to S i when it sees that TW k [i, k] is getting too far from TW[k, k], in particular, when TW[k, k] − TW k [i, k] > δ i /(N − 1). Note: Staleness can be done analogously, by essentially keeping track of what has been seen last from S i (see book). The ordering deviations could be bounded by specifying the maximal length of the queue of tentative writes.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Primary-Based Protocols Example: Traditionally applied in distributed databases and file systems that require a high degree of fault tolerance. Replicas are often placed on same LAN. Primary-backup Protocol
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Primary-Based Protocols Local-Write Protocols Example: Mobile computing in disconnected mode (ship all relevant files to user before disconnecting, and update later on).
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Replicated-Write Protocols Figure Three examples of the voting algorithm. (a) A correct choice of read and write set. (b) A choice that may lead to write-write conflicts. (c) A correct choice, known as ROWA (read one, write all). Quorum-based protocols: Ensure that each operation is carried out in such a way that a majority vote is established: distinguish read quorum and write quorum:
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Cache Coherence Protocols Coherence Detection Strategy –Distributed Database Pessimistic Optimistic When the transaction commits Coherence Enforcement Strategy –Shared data –When a process modifies cached data Update only by servers Write-through Write-back
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Implementing Client-Centric Consistency A naïve Implementation –Assign a globally unique identifier to each write operation Improved Efficiency –Vector timestamps