C. Xu, 1998-20101 Replication and Consistency Data and Object Replication Data Consistency Model Implementation Issues.

C. Xu, 1998-20101 Replication and Consistency Data and Object Replication Data Consistency Model Implementation Issues

C. Xu, 1998-20102 Replication What replicate? –Data, File, process, object, service, etc Why replicate? –Enhance performance: Caching data in clients and servers to reduce latency Server farm to share workload –Increase availability Reduce downtime due to individual server failure through server replication Support network partitions and disconnected operations through caching (e.g. email cache in POP3) –Fault tolerance Highly available data is not necessarily current data. Fault-tolerant service should guarantee correct behavior despite a certain number and type of faults. Correctness: freshness, timeliness If f servers can exhibit byzantine failures, at least 2f+1 servers needed

C. Xu, 1998-20103 Challenges Replication Transparency –Clients should not be aware of the presence of multiple physical copies; data are organized as logical objects. –Exactly one set of return –“We use replication techniques ubiquitously to guarantee consistent perf and high availablity. Although replication brings us closer to our goals, it cannot achieve them in a perfectly transparent manner” [Wenrner Vogels, CTO of Amazon, Jan 2009] Consistency –Whether the operations on a collection of replicated objects produce results that meet the specification of correctness for those objects –A read should return the value of last write operation on the data –Global synchronization for atomic update operations (transactions), but they are too costly –Relax the atomic update requirement so as to avoid instantaneous global synchronizations To what extent consistency can be loosened Without a global clock, how to define the “last”? Overhead? Overhead for keeping copies up to data

C. Xu, 1998-20104 Replication of Shared Objects Replicating a shared remote object without correctly handling of concurrent invocations lead to consistency problem. –W(x)  R(x) in one site, but R(x)  W(x) in another site Replicas need additional sync. to ensure that concurrent invocations are performed in a correct order at each replica. –Need support for concurrent access to shared objects –Need support for coordination among local and remote accesses

C. Xu, 1998-20105 Consistency Models A consistency model is a contract between processes and the data store. For replicated objects in the data store that are updated by multiple processes, the model specifies the consistency guarantees that the data store makes about the return values of read ops. –E.g x=1; x=2; print(x) in the same process P1 ? –E.g x=1 (at P1 @5:00pm); x=2 (at P2 @ 1 ns after 5:00pm); print(x) (at P1)?

C. Xu, 1998-20106 Overview of Consistency Models Strict Consistency Linearizable Consistency Sequential Consistency Causal Consistency FIFO Consistency Weak Consistency Release Consistency Entry Consistency Ease of Programming Less Efficiency Hard to Programming More Efficient

C. Xu, 1998-20107 Strict Consistency Read of data item x returns value of the most recent write on x. –Most recent in the sense of absolute global time –E.g x=1; x=2; print(x) in the same process P1 ? –E.g x=1 (at P1 @5:00pm); x=2 (at P2 @ 1 ns after 5:00pm); print(x) (at P1)? A strictly consistent store requires all writes are instantaneously visible to all processes. (a) Strictly consistent; (b) Non-strictly consistent time

C. Xu, 1998-20108 Linearizability Operations are assumed to receive a timestamp using a globally available clock, but one with only finite precision. A data store is linearizable when each op is timestamped and i) The result of any execution is the same as if the ops by all processes on the data store were executed in some sequential order ii) Ops of each individual process appear in this sequence in the order specified by its program iii) If t op1 (x)<t op2 (x), then op1(x) should precede op2(x) in this sequence.

C. Xu, 1998-20109 Sequential Consistency a)A sequentially consistent data store. b)A data store that is not sequentially consistent. A data store is sequentially consistent if i) The result of any execution is the same as if the ops by all processes on the data store were executed in some sequential order ii) Ops of each process appear in this sequence in the order specified by its program time Writer Observer

C. Xu, 1998-201010 Possible Executions in SC Model Process P1Process P2Process P3 x = 1; print ( y, z); y = 1; print (x, z); z = 1; print (x, y); x = 1; print ((y, z); y = 1; print (x, z); z = 1; print (x, y); Prints: 001011 Signature: 001011 (a) x = 1; y = 1; print (x,z); print(y, z); z = 1; print (x, y); Prints: 101011 Signature: 101011 (b) y = 1; z = 1; print (x, y); print (x, z); x = 1; print (y, z); Prints: 010111 Signature: 110101 (c) y = 1; x = 1; z = 1; print (x, z); print (y, z); print (x, y); Prints: 111111 Signature: 111111 (d) Assume initially x=y=z=0 Is signature 001001 valid by SC model ?

C. Xu, 1998-201011 Remarks on the SC Model SC model was conceived by Lamport in 1979 for shared memory of multiprocessors. –Maintain program order of each thread –Coherence view of any data item Low efficiency in implementation –Each read/write op must be atomic. –Atomic read/write on distributed shared objects costs too much

C. Xu, 1998-201012 Causal Consistency In SC, two ops of w(x) and w(y) must provide consistent view to all processes. In CC, the requirement is relaxed if w(x) and w(y) have no causal relationship (Hutto and Ahamad, 90) That is, writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines. Valid in CC, but invalid in SC

C. Xu, 1998-201013 Causal Consistency (Cont) a)A violation of a casually-consistent store. b)A correct sequence of events in a causally-consistent store.

C. Xu, 1998-201014 Remarks on Causal Consistency Causal Consistency Model allows concurrent writes to be seen in a different order on different machines. So Concurrent writes can be executed in parallel (overlapped) to improve efficiency It relies on a compiler/run-time support for constructing and maintaining a dependency graph that captures causality between the operations.

C. Xu, 1998-201015 FIFO Consistency In Causal Consistency, causally-related writes must be seen in the same order by all machines. FIFO consistency model is to relax this requirement. Writes done by a single process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes. In FIFO consistency, there are no guarantees about the order of writes that are issued by different processes. (All writes from different processes are concurrent) Also called Pipelined Random Access Memory (RAM) consistency because writes by a single process can be pipelined.

C. Xu, 1998-201016 FIFO vs Causal Consistency vs SC A valid sequence of events of FIFO consistency, but invalid in Causal Consistency In SC, the order of ops are non-deterministic, but all processes agree what it is; In FIFO, different processes can see the ops in a different order

C. Xu, 1998-201017 FIFO vs Sequential Consistency Possible executions: –P1 kill P2 –P2 kill P1 –Neither is killed Question: Is it possible to have both processes killed in SC and FIFO models? Process P1Process P2 x = 1; if (y == 0) kill (P2); y = 1; if (x == 0) kill (P1); Assume initially x=y=0

C. Xu, 1998-201018 FIFO Consistency Process P1Process P2Process P3 x = 1; print ( y, z); y = 1; print (x, z); z = 1; print (x, y); Assume initially x=y=z=0 Question: Is signature 001001 valid in FIFO consistency model?

C. Xu, 1998-201019 Weak Consistency Weak Consistency relaxed the ordering requirement by distinguishing synchronization op from normal ops. (Dubois et al. 1988) –Accesses to synchronization variables associated with a data store are sequentially consistent –No operation on a synchronization variable is allowed to be performed until all previous writes have been completed everywhere –No read or write ops are allowed to be performed until all previous operations to synchronization variables have been performed.

C. Xu, 1998-201020 Weak Consistency (cont’) An invalid sequence for WC. A valid sequence by WC WC enforces consistency on a group of operations, not on individual reads/writes; WC guarantees the time when consistency holds, not the form of consistency.

C. Xu, 1998-201021 Release Consistency Sync. op is performed when a process wants to enter or exit from a critical section. In weak consistency, be sure that all locally initiated writes are propagated and writes from all other processes are received. Release consistency distinguishes between Acquire and Release sync operations (Gharachorloo et al. 1990) –When Acquire is performed, bring up all protected data to be consistent with remote ones –When Release is performed, propagate all updated protected data A valid event sequence for release consistency.

C. Xu, 1998-201022 Simple Impl. of RC Assume a central synchronization manager for a replicated database To access a replica, a process first sends a msg to the manager to acquire a particular lock Upon granted, the process performs an arbitrary sequence of reads and writes locally. When the release is done, the modified data are sent to the other replicas. After each replica has acknowledged receipt of the data, the sync manager is informed of the release.

C. Xu, 1998-201023 Sufficient conditions for RC Three rules in general –Before a read or write on shared data is performed, all pervious acquires done by the process must have completed successfully –Before a release is allowed to be performed, all previous reads and writes done by the process must have been completed –Accesses to synchronization variables are FIFO consistency. If these conditions are met, the results of any execution will be the same as those from SC.

C. Xu, 1998-201024 Eager Release vs Lazy Release Eager Release (normal release consistency) –On Release, all updated protected data are propagated to others –Problem is these new data may not be needed by others Lazy Release (Keleher et al. 1992) –On Release, nothing is sent. Instead, –When an Acquire is done, the process gets the most recent values of the data from the processes that hold them (based on timestamps) P1: while (a<100) acquire(lock) a = a+1; b = b+1; release(lock) P2: acquire(lock) print(b) release(lock);

C. Xu, 1998-201025 Entry Consistency Unlike RC where Acquire and Release are to protect a collection of shared items, EC requires each ordinary shared item to be associated with some sync. variables like lock or barrier. –Associating each shared data with a sync. variable reduces the overhead due to acquire and release because only a few need to be synchronized –Multiple critical sections involving disjoint shared data to be run concurrently –But, complex association and error-prone programming EC is fine-grained. Lazy RC needs to determine empirically, at acquire time, which variables it needs. A valid event sequence for entry consistency

C. Xu, 1998-201026 Summary of Consistency Models (a) Consistency models not using synchronization operations. (b) Models with synchronization operations. ConsistencyDescription StrictAbsolute time ordering of all shared accesses matters. Linearizability All processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (nonunique) global timestamp SequentialAll processes see all shared accesses in the same order. Accesses are not ordered in time CausalAll processes see causally-related shared accesses in the same order. FIFO All processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order (a) ConsistencyDescription WeakShared data can be counted on to be consistent only after a synchronization is done ReleaseShared data are made consistent when a critical region is exited EntryShared data pertaining to a critical region are made consistent when a critical region is entered. (b)

C. Xu, 1998-201027 Double-Checked Locking (DCL) for Lazy Initialization: A Motivation Example public class Singleton { private static Singleton resource = null; private int a; private int b; private Singleton() { a=1; b=2;} public static Singleton getInstance() { if (resource==null) { resource = new Singleton(); } return resource; } DCL was designed to support lazy initialization: initialization of an owned object is deferred until it is actually needed.

C. Xu, 1998-201028 Lazy Initializaion in Multithreaded Code public class Singleton implements Runnable { private static Singleton resource = null; private int a; private int b; private Singleton() { a=1; b=2;} public static synchronized Singleton getInstance() { if (resource==null) resource = new Singleton(); } return resource; } Synchronized getInstance is 100 times slower than ordinary unsynchronized methods. Lazy initialization is for efficiency, but synchronized methods slows down the execution!

C. Xu, 1998-201029 DCL for Lazy Initialization public class Singleton { private static Singleton resource = null; private int a; private int b; private Singleton() { a=1; b=2;} public static Singleton getInstance() { if (resource==null) { synchronized (Singleton.class) { if (resource==null) resource = new Singleton(); } return resource; } Is it correct?

C. Xu, 1998-201030 Eventually Consistent Distributed Data Store In such special distributed data stores, all replicas will gradually become consistent, if there are no updates for a long time. Examples –In distributed web servers, web pages are usually updated by a single authority; –Web cache may not return the latest contents, but this inconsistency acceptable in some situations –DNS implements EC Usages or Target Applications: –Most ops involve reading data –Lack of simultaneous updates –A relatively high degree of inconsistency can be tolerated

C. Xu, 1998-201031 EC stores require only that updates are guaranteed to propagate to all replicas. Conflicts due to concurrent writes are often easy to resolve. Cheap implementation Problems arise when different replicas are accessed by the same process at different time  Causal Consistency for a single client Client-Centric Eventual Consistency Eventual consistency data stores work fine as long as clients always access the same replica

C. Xu, 1998-201032 Variations of Client-Centric EC Monotonic-reads –If a process reads the value of a data item x, any successive read on x by that process will never return an older value of x. –E.g. Mail read from a mbox in SF will also be seen in mbox of NY Monotonic-writes –A write by a process on a data item x is completed before any successive write on x by the same process. –Similar to FIFO consistency, but monotonic-write model is about the behavior of a single process. Read Your Writes –A write on data item x by a process will always be seen by a successive read on by the same process. –E.g. Integrated editor and browser; Integrated dvips and ghostview Writes follow reads –A write on x by a process following its previous read of x is guaranteed to take place on the same or a more recent value of x that was read

C. Xu, 1998-201033 Outline Object Replication Data Consistency Model (client-side) Implementation Issues (server-side) –Update Propagation Distributing updates to replicas, independent of consistency model –Other Consistency Model Specific Issues Case Studies

C. Xu, 1998-201034 Replica Placement A major design issue is to decide where, when, and by whom copies of the data store are to be placed. Permanent Replicas –Initial setup of replicas that constitute a distributed store –For examples, Distributed web servers: server mirrors, distributed database systems (shared nothing arch vs federated DB)

C. Xu, 1998-201035 Replica Placement: server-initiated Dynamic replica placement of replica is a key to content delivery network (CDN). –CDN company installs hundreds of CDN servers throughout Internet –CDN replicates its customers’ content in CDN servers. When provider updates content, CDN updates servers Key issue: When and where replicas should be created or deleted origin server in North America CDN distribution node CDN server in S. America CDN server in Europe CDN server in Asia

C. Xu, 1998-201036 CDN Content Placement An algorithm (Rabinvoich et al 1999) –Each server keeps trace of access counts per file, and where access requests come from. –Assume given a client C, each server can determine which of the servers is closest to C.

C. Xu, 1998-201037 Server-initiated Replication (cont’) An algorithm (Rabinvoich et al 1999) –Initial placement –Migration or replication of objects to servers in the proximity of clients that issue many requests for the objects Counting access requests to file F at server S from different client Deletion threshold, replication threshold –If #access(S,F) <= del(S,F), remove the file, unless it is the last copy –If #access(S,F)>= rep(S,F), duplicate the file somewhere –If del(S,F) < #access(S,F) < rep(S,F), migrate

C. Xu, 1998-201038 Replica placement: Client-Initiated Cache: a local storage to temporarily store a copy of the recently accessed data mainly for reducing request time and traffic on the net –Client-side cache (forward cache in Web) –Proxy cache –Proxy cache network client Proxy server client HTTP request HTTP response HTTP request HTTP response origin server origin server

Update Propagation Invalidation vs Update Protocols –In invalidation protocol, replicas are invalidated by a small message –In update protocol, replicas are brought to up to date by providing with modified data, or specific update operations (active replication) Pull vs Push Protocols –Push-based (server-based) for appl with high read-to-update ratios (why?) –Pull-based (client-based) is often used by client caches Unicasting versus Multicasting IssuePush-basedPull-based State of serverList of client replicas and cachesNone Messages sentUpdate (and possibly fetch update later)Poll and update Response time at clientImmediate (or fetch-update time)Fetch-update time Push vs Pull in a multiple-clients-single-server system C. Xu, 1998-201039

C. Xu, 1998-201040 Lease-based Update Propagation [Gray&Chariton’89]: A lease is a server promise that updates will be pushed for a specific time. When a lease expires, the client is forced to –pull the modified data from the server, if exists, or –requests a new lease for pushing updates [Duvvuri et al’90]: Flexible lease system: the lease period can be dynamically adapted –age-based lease, based on the last time the item was modified. Long lasting leases to inactive data items. –renewal-frequency based lease. Long term lease be granted to clients whose caches need to be refreshed often. –Based on server-side state-space overhead. Overloaded server should reduce the lease period so that it needs to keep track of fewer clients as leases expire more quickly. [Yin et al’99] Volume lease on objects as well as volumes (i.e. collections of related objects)

C. Xu, 1998-201041 Volume Lease-based Update Leases provide good performance when the cost of lease renewal is amortized over many reads If the renewal cost is high, lease time must be long enough; but a long lease implies a high chance of reading stale data Volume lease uses a combination of long object lease and a short volume lease: –Long object lease save reading time –Short volume lease keeps objects updated –Spatial locality between objects in a volume

C. Xu, 1998-201042 Other Impl. Issues of Consistency Model Passive Replication (Primary-Backup Organization) –Single primary replica manager at any time and one or more secondary replica manager –Writes can be carried out only the primary copy –Two Write Strategies: Remote-Write Protocols Local-Write Protocols Active Replication: –There are multiple replica managers and writes can be carried out at any replica Cache Coherence Protocols

C. Xu, 1998-201043 Passive Replication: Remote Write Write is handled only by the remote primary server, and the backup servers are updated accordingly; Read is performed locally. E.g. Sun Network Information Service (NIS, formerly Yellow Pages)

C. Xu, 1998-201044 Consistency Model of Passive Replication Blocking update, waiting till backups are updated –Blocking update of backup servers must be atomic so as to implement sequential consistency as the primary can sequence all incoming writes and all processes see all writes in the same order from any backup servers. Nonblocking, returning as soon as primary is updated –what happens if backup fails after update is acknowledged –Consistency model with non-blocking update ?? Atomic multicasting in the presence of failures! –Primary replica failure –Group membership change –Virtual Synchrony Implementation

C. Xu, 1998-201045 Passive Replication: Local-Write Based on object migration –Multiple successive writes can be carried out locally How to locate the data item? –broadcast, home-based, forwarding pointers, or hierarchical location service

C. Xu, 1998-201046 Remarks Without backup servers –Implement linerizability With backup servers –Read/write can be carried out simultaneously. –Consistency model ?? Applicable to mobile computing – Mobile host serves a primary server before disconnecting –While being disconnected, all update operations are carried out locally; other processes can still perform read operations –When connecting again, updates are propagated from the primary to the backups

C. Xu, 1998-201047 Active Replication In active replication, each replica has an associated process that carries out update ops. –Client makes a request (via a front-end) and the request is multicast to the group of replica managers. –Totally-Ordered Reliable Multicast, based on Lamport timestamps –Implement Sequential Consistency Alternative is based on central coordinator (Sequencer) –First forward each op to the sequencer for a unique sequence number –Then forward the op, together with the number, to all replicas Another problem is replicated invocations –Object A invokes object B, which in turn invokes object c; If object B is replicated, each replica will invoke C independently, in principle –This problem occur in any client-server problem –No satisfactory solutions

C. Xu, 1998-201048 Replication-Aware Solution to Replicated Invoc Forwarding an invocation request from a replicated object. Returning a reply to a replicated object.

C. Xu, 1998-201049 In Summary Replication –data, file (web files), object replication –reliability and performance Client-Perceived Consistency Models –Strong consistency –Weak Consistency –Eventual Consistency and client-centric variations Server Implementation (Update) CAP Theorem [Brewer’00, Gilbert&Lynch’02] –Of three properties of shared-data systems: data consistency, service availability, and tolerance to network partition, only two can be achieved at any give n time.

CAP Theorem Three core systemic requirements in designing and deploying applications in a distributed env –Consistency, Availability and Partition Tolerance –Partition tolerance: no set of failures less than total network failure is allowed to cause the system to respond incorrectly. –Availability (minimal latency) Amazon claim that just an extra one tenth of a second on their response times will cost them 1% in sales. Google said they noticed that just a half a second increase in latency caused traffic to drop by a fifth. C. Xu, 1998-201050

CAP If we want A and B to be highly available (i.e. working with minimal latency) and we want our nodes N 1 to N n (where n could be hundreds or even thousands) to remain tolerant of network partitions (lost messages, undeliverable messages, hardware outages, process failures) then sometimes we are going to get cases where some nodes think that V is V 0 and other nodes will think that V is V 1. C. Xu, 1998-201051

CAP Theorem Proof Impossibility Proof 52 If M is an async msg (no clocks at all), N1 has no way of knowing whether N2 getsthe msg. Even with guaranteed delivery of M, N1 has no way of knowing if a msg is delayed by a partition event or something failing in N2. Making M sync doesn’t help because that treats the write by A on N1 and the update event from N1 to N2 as an atomic op, which gives us the same latency issues Even a partially-sync model atomicity cannot be guaranteed. In a partially async network, each site has a local clock and all clocks increment at the same rate, but the clocks are not synchronized to display the same value at the same instant.

CAP Theorem Proof Consider a transaction (i.e. unit of work based around the persistent data item V) called α, then α 1 could be the write op from before and α 2 could be the read. –On a local system this would be easily be handled by a database with some simple locking, isolating any attempt to read in α 2 until α 1 completes safely. –In the distributed model though, with nodes N 1 and N 2 to worry about, the intermediate synchronising message has also to complete. Unless we can control when α 2 happens, we can never guarantee it will see the same data values α 1 writes. All methods to add control (blocking, isolation, centralised management, etc) will impact either partition tolerance or the availability of α 1 (A) and/or α 2 (B). C. Xu, 1998-201053

Dealing with CAP Drop partition tolerance –Partition tolerance limits scaling Drop availability –On encountering a partition event, affected services simply wait until data is consistent and therefore remain unable during that time. Controling this could be complex in large scale systems Drop consistency –Lots of inconsistency don’t actually require much work –Eventually consistent model works in most apps BASE: Basically available, Soft-state, Eventually consistent (logical opposite of ACID in database: atomicity, consistency, isolation, durability) C. Xu, 1998-201054

C. Xu, 1998-20101 Replication and Consistency Data and Object Replication Data Consistency Model Implementation Issues.

Similar presentations

Presentation on theme: "C. Xu, 1998-20101 Replication and Consistency Data and Object Replication Data Consistency Model Implementation Issues."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

C. Xu, 1998-20101 Replication and Consistency Data and Object Replication Data Consistency Model Implementation Issues.

Similar presentations

Presentation on theme: "C. Xu, 1998-20101 Replication and Consistency Data and Object Replication Data Consistency Model Implementation Issues."— Presentation transcript:

Similar presentations

About project

Feedback