CS 603 Data Replication February 25, 2002
Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased reliance on network This is a two-edged sword
Data Replication: What? Correctness criterion: Replication invisible –Results indistinguishable from one-copy database –One-copy serializability (1SR) Alternatives –Bounded inconsistency –User selection of real/copy More discussion Friday
Data Replication: How? Goal: Ensure one-copy serializability Write-all solution: All copies identical –Write goes to every site –Read from any site –Standard single-copy concurrency control –Guarantees 1SR Single-copy concurrency control gives serializable execution Equivalent to serial execution where all writes happen in one transaction
Write All Approach 3 Writer 5 Reader read 3 5
Problem: Site Failure Failure causes write to block –Must maintain locks –Clogs up entire system Is this fault tolerance? What about “write all available”? –T 0 : w 0 [x A ] w 0 [x B ] w 0 [y C ] c 0 –B-fails –T 1 : r 1 [y C ] w 1 [x A ] c 1 –B-recovers –T 2 : r 2 [x B ] w 2 [y C ] c 2 What is the serial equivalent order?
3 Writer 5 Reader read53 53
Model for Replicated Data Data and Transaction Managers at each site –Data Manager: local concurrency control to guarantee local serializability –Transaction manager: Distributed actions Turns reads/writes into multi-site reads/writes Runs commit protocol Directory to get sites of each copy
Failure Assumptions Communications failure: Site A does not receive reads/writes on x A issued by B Site failure: Site A is unable to process reads/writes on x A issued by B Communications failure: Site A processes but does not acknowledge reads/writes on x A issued by B Fail-stop model, detectable by timeout
Types of Write Write(x): All copies of x will eventually be written Immediate write –Send write to all sites on request –Quick detection of conflict Delayed write –Delays non-local writes until commit –Minimizes message traffic –Abort is cheap Primary copy write –Quick detection of conflict –Lower message traffic than immediate write
Distributed Serializability A complete replicated data (RD) history H over T = {T 0, …, T n } is a partial order with ordering relation < where –H = h( n i=0 T i ) for some translation function h –for each T i and all operations p i, q i in T i, if p i < i q i, then every operation in h(p i ) is related by < to every operation in h(q i ) –for every r j [x A ], there is at least one w i [x A ] < r j [x A ] –if w i [x] H and r j [x] H, then w i [x] < r j [x] or r j [x] < w i [x] –if w i [x] < i r i [x] and h(r i [x]) = r i [x A ] then w i [x A ] h(w i [x]) Theorem: If reads-from relationships same as serial history, RD history is 1-copy serializable
Write All Available Fails Even if no recovery!
Solutions Validate availability on commit –Check if any failed writes now available –Check that all sites read or written still available –Enforces serializability for site failures Doesn’t work with communication failures!
Communication Failures Available copies fails on network partition –Each side succeeds in validation Write all blocks Write n-k, read k+1 –Generalization of the “write all” approach –Handles up to min(n-k, k+1) failures –Tradeoff read vs. write performance –Partition effect based on size of partition: <k+1: small partition acts as if all sites failed, large continues Otherwise entire system becomes read-only
Other approaches: Don’t enforce Serializability! Master copy –Writes must update master copy –Reads can be consistent or inconsistent Bounded inconsistency –Time bound on update of copies –Value bound: write all if difference too great Dumps consistency on the application –Added complexity –Better performance