EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha
Primary Backup Replication View service Backup Client Primary Backup Backup September 27, 2017 EECS 498 – Lecture 7
View service Monitors primary and backups to detect when to change view Can change only after primary has ACKed view Primary ACKs only after syncing with backups Clients cache view for scalability To address split brain, primary must check with backup before serving client September 27, 2017 EECS 498 – Lecture 7
Replicating Bank Database One copy in SF (primary), one in NY (backup) “Deposit $100” “Pay 1% interest” $1,000 $1,000 September 27, 2017 EECS 498 – Lecture 7
Primary-Backup Sync C1 C2 P B “Deposit $100” “Pay 1% interest” $1,000 $1,111 $1,000 B $1,110 September 27, 2017 EECS 498 – Lecture 7
Ordering of Updates All updates must be applied in the same order at all replicas External view: Total ordering of writes Primary effectively serializes all writes September 27, 2017 EECS 498 – Lecture 7
Serving Reads Can backups serve reads? Assume no split brain What if primary’s state is ahead of backup? Updates to primary not yet externally visible Effect of read equivalent to if primary fails at this point What if backup’s state is ahead of primary? Different backups may not be in sync Primary may get replaced before it applies update September 27, 2017 EECS 498 – Lecture 7
Reads: Primary vs. Backup “Deposit $100” P B1 B2 $1000 $1100 C2 September 27, 2017 EECS 498 – Lecture 7
Desired Properties All writes are totally ordered Once read returns particular value, all later reads should return that value or value of later write Once a write completes, all later reads should return value of that write or value of later write September 27, 2017 EECS 498 – Lecture 7
Reads relative to Writes C1 “Pay 1% interest” “Deposit $100” P B $1100 $1111 C2 September 27, 2017 EECS 498 – Lecture 7
Linearizability Total ordering of writes Read returns last completed write Single copy semantics Externally visible effects of writes and reads are equivalent to if there existed a single copy Users oblivious to replication September 27, 2017 EECS 498 – Lecture 7
Consistency Spectrum Consistency: What are the properties of externally visible effects? Read-after- write Eventual Causal Sequential Linearizability Consistency Ease of programming September 27, 2017 EECS 498 – Lecture 7
Why weaken consistency? Shouldn’t we always strive for single copy semantics? Comes at the expense of lower performance Latency vs. consistency tradeoff September 27, 2017 EECS 498 – Lecture 7
Consistency Spectrum Read-after- write Eventual Causal Sequential Linearizability Consistency Ease of programming Latency September 27, 2017 EECS 498 – Lecture 7
Causal Consistency Order of causally related writes must be preserved in values returned to reads If W1 W2, then if a read sees effect of W2, it must see effect of W1 Example: Facebook News Feed Okay to not see all completed posts But, if you see a comment, you must see the post on which the comment is made Main utility: Lazy sync between replicas September 27, 2017 EECS 498 – Lecture 7
Linearizability with Locks Lock service Replica 1 Client Replica 2 Problems? Client failures! Replica 3 September 27, 2017 EECS 498 – Lecture 7
Lease Lock with timeout If lease holder fails, not a problem because lease will expire How to pick lease timeout value? Short timeout Client needs to renew lease Long timeout Unnecessarily block operations September 27, 2017 EECS 498 – Lecture 7
Discrepancy in Lease Validity Lease service Replica 1 Client Replica 2 Scenario in which lease server and client differ about lease validity? Replica 3 September 27, 2017 EECS 498 – Lecture 7
Discrepancy in Lease Validity Message that grants lease may have high delay Clock at lease holder and lease service may have different skew How to account for potential discrepancy? September 27, 2017 EECS 498 – Lecture 7
Discrepancy in Lease Validity Lease service Replica 1 Client Replica 2 Replica must check with lease service to confirm lease validity Replica 3 September 27, 2017 EECS 498 – Lecture 7
Case study: GFS Google File System Distributed storage system tailored to Google’s workload Workload characteristics and setting: Multi-GB files Files are mostly appended to Failures are extremely common September 27, 2017 EECS 498 – Lecture 7
High-level Design Files are split into 64 MB chunks Every chunk is replicated on three randomly selected machines A central chunkmaster server picks and knows where every replicas of every chunk are stored September 27, 2017 EECS 498 – Lecture 7
GFS Overview September 27, 2017 EECS 498 – Lecture 7
Replication in GFS Chunkmaster Backup Client Primary Backup September 27, 2017 EECS 498 – Lecture 7
Replication in GFS Chunkmaster Backup Client Primary Backup Challenge introduced due to large writes: High latency when writing to distant primary How to optimize write performance? September 27, 2017 EECS 498 – Lecture 7
Data flow vs. Control flow September 27, 2017 EECS 498 – Lecture 7
Handling Server Failures Chunkmaster grants 60 sec lease to primary Utility of lease? What if lease expires in the midst of write? New version number upon lease renewal Replicas locally log version number Helps detect stale replicas Store checksums to detect corrupted data September 27, 2017 EECS 498 – Lecture 7
Handling Master Failures Replicate chunkmaster Any update to state logged to local disk and propagated to replicas Shadow masters only serve reads Potentially out of date What if all replicas of master wiped out? September 27, 2017 EECS 498 – Lecture 7
GFS Performance Benchmark September 27, 2017 EECS 498 – Lecture 7