EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha
Primary Backup Replication Client Primary Backup Backup September 25, 2017 EECS 498 – Lecture 6
Primary Backup Replication Promote one of the backups if primary fails Replace any failed backup When should primary sync with backups? Before making state change externally visible Primary and backups must be externally consistent What to sync? Entire state when bootstrapping new backup Thereafter, all sources of non-determinism September 25, 2017 EECS 498 – Lecture 6
RSM with Primary Backup Application Application Server1 Operating System Operating System Server2 Virtual Machine Monitor Virtual Machine Monitor Hardware Hardware September 25, 2017 EECS 498 – Lecture 6
Example: Bank Server Deposit(user, amount) { balance[user] += amount show new balance } AddInterest() { while(currtime != 12am) { sleep(1 hour) add 1% interest for all users September 25, 2017 EECS 498 – Lecture 6
Example Execution Primary Backup Timer interrupt fires Wake up AddInterest Read current time T1 Go to sleep Receive deposit request Show new balance Read current time T2 Add interest Receive deposit request Wait for timer interrupt Wake up AddInterest Return current time as T1 Go to sleep Timer interrupt fires Show new balance Return current time as T2 Add interest September 25, 2017 EECS 498 – Lecture 6
Primary Backup Replication Client Primary Backup Backup September 25, 2017 EECS 498 – Lecture 6
Client perspective What does a worker need to know in order to register itself with replicated master? Needs to know which machine is primary September 25, 2017 EECS 498 – Lecture 6
Primary Backup Replication Client Primary Backup Backup September 25, 2017 EECS 498 – Lecture 6
Client perspective What does a worker need to know in order to register itself with replicated master? Needs to know which machine is primary Can primary be hard coded into client code? No, primary gets replaced when it fails How does client discover current primary? September 25, 2017 EECS 498 – Lecture 6
Primary Backup Replication View service Backup Client Primary Backup Backup September 25, 2017 EECS 498 – Lecture 6
View service Maintains current membership of primary-backup service (called view) View number, primary, backup When does view service change view? When primary or any backup fails Periodically exchange heartbeat messages to detect failures What if view service is down or not reachable? September 25, 2017 EECS 498 – Lecture 6
Transitioning between views How to ensure new primary is up-to-date? Only promote a previous backup This is why view service needs to pick backups How does view service know if a backup is up-to-date? Two scenarios for ill-timed primary failure: Primary applies operation but fails before syncing with backup Primary fails before new backup is initialized September 25, 2017 EECS 498 – Lecture 6
Transitioning between views View service broadcasts view change to all Primary must ACK new view once backup is up-to-date Two implications: Liveness detection timeout > State transfer time Cannot change view if primary fails during sync September 25, 2017 EECS 498 – Lecture 6
View service View change has three steps: View service announces new view Primary syncs with new backup if there is one Primary acknowledges new view Stuck if primary fails in the midst of this process September 25, 2017 EECS 498 – Lecture 6
Scalability of View service Does every client need to contact view service before any operation? Clients can cache view across operations When to invalidate cached view? Client invalidates cache when no response or negative response from primary September 25, 2017 EECS 498 – Lecture 6
Split Brain View service Client S2 S1 (1,S1, _) (2,S1,S2) (3,S2, _) September 25, 2017 EECS 498 – Lecture 6
Avoiding Split Brain Primary must forward all operations to backups Goal: Get ACKs from backups that they too recognize primary Why can’t backups be mistaken about who is primary? Only a backup can be promoted as primary September 25, 2017 EECS 498 – Lecture 6
View service Valid sequence of views: (1, S1, _) (2, S1, S2) (3, S1, S3) (4, S3, S4) (5, S4, _) Examples of invalid transitions between views? (1, S1, S2) (2, S3, S4) (1, S1, S2) (2, _, S2) (1, S1, _) (2, S2, S1) September 25, 2017 EECS 498 – Lecture 6