Bayou: Replication with Weak Inter-Node Connectivity Brad Karp UCL Computer Science CS GZ03 / 4030 12 th November, 2007.

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

Eventual Consistency Jinyang. Sequential consistency Sequential consistency properties: –Latest read must see latest write Handles caching –All writes.
1 CS 194: Elections, Exclusion and Transactions Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Consistency and Replication (3). Topics Consistency protocols.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
CS 582 / CMPE 481 Distributed Systems
“Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System ” Distributed Systems Κωνσταντακοπούλου Τζένη.
Strong Consistency and Agreement COS 461: Computer Networks Spring 2011 Mike Freedman 1 Jenkins,
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 16 Wenbing Zhao Department of Electrical and Computer Engineering.
Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
SynchronizationCS-4513, D-Term Synchronization in Distributed Systems CS-4513 D-Term 2007 (Slides include materials from Operating System Concepts,
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Replication and Consistency CS-4513 D-term Replication and Consistency CS-4513 Distributed Computing Systems (Slides include materials from Operating.
Synchronization in Distributed Systems CS-4513 D-term Synchronization in Distributed Systems CS-4513 Distributed Computing Systems (Slides include.
G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.
Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Mobility in Distributed Computing With Special Emphasis on Data Mobility.
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements.
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements Tomorrow’s class is officially cancelled. If you need someone to go over the reference implementation.
Bayou. References r The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer, and Marvin M. Theimer.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
Practical Byzantine Fault Tolerance
Microsoft ® Office SharePoint ® Server 2007 Training SharePoint calendars II: Connect a SharePoint calendar to Outlook Bellwood-Antis School District presents:
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
Distributed Systems Fall 2010 Logical time, global states, and debugging.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Write Conflicts in Optimistic Replication Problem: replicas may accept conflicting writes. How to detect/resolve the conflicts? client B client A replica.
An Architecture for Mobile Databases By Vishal Desai.
Consistency Guarantees Prasun Dewan Department of Computer Science University of North Carolina
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Eventual Consistency Jinyang. Review: Sequential consistency Sequential consistency properties: –All read/write ops follow some total ordering –Read must.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Paxos Steve Ko Computer Sciences and Engineering University at Buffalo.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Nomadic File Systems Uri Moszkowicz 05/02/02.
Distributed Systems – Paxos
Eventual Consistency: Bayou
Replication and Consistency
Replication and Consistency
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
Fault-tolerance techniques RSM, Paxos
EEC 688/788 Secure and Dependable Computing
Eventual Consistency: Bayou
Consensus, FLP, and Paxos
Lecture 21: Replication Control
EEC 688/788 Secure and Dependable Computing
Distributed Systems CS
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Paxos Made Simple.
Lecture 21: Replication Control
CSE 486/586 Distributed Systems Consistency --- 2
Replication and Consistency
Presentation transcript:

Bayou: Replication with Weak Inter-Node Connectivity Brad Karp UCL Computer Science CS GZ03 / th November, 2007

2 Context: Availability vs. Consistency NFS, Ivy, 2PC all had single points of failure; not available under failures Paxos allows view-change to elect primary, thus state machine replication –Strong consistency model: all operations in same order at all replicas, always appearance of single system-wide order for all operations –Strong reachability requirement: majority of nodes must be reachable by leader If reachability weaker, can we provide any consistency when we replicate?

3 Bayou: Calendar Application Case Study Today’s lecture: –Bayou’s office calendar application as case study in ordering and conflicts in a distributed system with poor connectivity Each calendar entry: room, time, and set of participants Want everyone to see same set of entries (eventually) –else, users may double-book room, avoid using unoccupied room, &c.

4 Traditional Calendar Application: One Central Server Ordering of users’ requests: only one copy, server picks order Conflict resolution: server checks for conflicts (i.e., “is this room already booked during this period?”) before accepting updates –Returns error to user if conflict; user decides what to do

5 What’s Wrong with Central Server? Want my calendar on my PDA –i.e., each user wants database replicated on his PDA or laptop –No master copy PDA has only intermittent connectivity –GPRS/EDGE/3G expensive, WiFi not everywhere; nothing on Tube –Bluetooth useful for direct contact with other calendar users’ PDAs, but very short range

6 Simple Proposal: Swap Complete DBs Suppose two users in Bluetooth range Each sends entire calendar DB to other, as with Palm sync Possibly lots of network bandwidth What if conflict, i.e., two concurrent meetings? –Palm sync just keeps both meetings! –Want to do better: automatic conflict resolution

7 Automatic Conflict Resolution Can’t just view DB items as bits—too little information to resolve conflicts! –“Both files have changed” can falsely conclude entire DBs conflict –“Distinct record in each DB changed” can falsely conclude no conflict Want to build intelligent DB app that knows how to resolve conflicts –More like users’ updates: read DB, think, change request to eliminate conflict –Must ensure all nodes resolve conflicts in same way to keep replicas consistent

8 Insight: Ordering of Updates Maintain ordered list of updates at each node Make sure every node holds same updates Make sure every node applies updates in same order Make sure updates are deterministic function of DB contents If we obey above, “sync” really just a simple merge of two ordered lists!

9 What’s in a Write? Each node’s ordered list of writes: write log Suppose calendar update takes form: –“10 AM meeting, Room 6.12, Mark and Brad” –Sufficient for our goal? Better: “1-hour meeting, Room 6.12, Mark and Brad, at 9, else 10, else 11” –Also include unique ID: Instructions for write more than data to write Write log really an “instruction” for calendar program Want all nodes to execute same instructions in same order, eventually

10 Write Log Example : Node A asks for meeting M1 to occur at 10 AM, else 11 AM : Node B asks for meeting M2 to occur at 10 AM, else 11 AM Let’s agree to sort by write ID (e.g., As “writes” spread from node to node, nodes may initially apply updates in different orders

11 Write Log Example (2) Each newly seen write merged into log Log replayed –May cause calendar displayed to user to change! –i.e., all entries really “tentative,” nothing stable After everyone has seen all writes, everyone will agree (contain same state)

12 Global Time Synchronization Impossible Does this mean that globally ordering writes by local timestamps impossible? No—timestamps just allow agreement on order –Nodes may have wrong clocks –OK, so long as users don’t expect writes to reach calendar in real-time order made

13 Timestamps for Write Ordering: Limitations Ordering by write ID arbitrarily constrains order –Never know if some write from past hasn’t yet reached your node –So all entries in log must be tentative forever –And you must store entire log forever Problem: how can we allow committing a tentative entry? –So we can have meetings and trim logs

14 Criteria for Committing Writes For log entry X to be committed, everyone must agree on: –Total order of all previous committed entries –Fact that X is next in total order –Fact that all uncommitted entries are “after” X

15 How Bayou Agrees on Total Order of Committed Writes One node designated “primary replica” Primary marks each write it receives with permanent CSN (commit sequence number) –That write is committed –Complete timestamp is Nodes exchange CSNs CSNs define total order for committed writes –All nodes eventually agree on total order –Uncommitted writes come after all committed writes

16 Showing Users that Writes Have Committed Still not safe to show users that an appointment request has committed Entire log up to newly committed entry must be committed –else there might be earlier committed write a node doesn’t know about! –…and upon learning about it, would have to re-run conflict resolution Result: committed write not stable unless node has seen all prior committed writes Bayou propagates writes between nodes to enforce this invariant i.e., Bayou propagates writes in order

17 Committed vs. Tentative Writes Can now show user if a write has committed –When node has seen every CSN up to that point, as guaranteed by propagation protocol Slow or disconnected node cannot prevent commits! –Primary replica allocates CSNs; global order of writes may not reflect real-time write times What about tentative writes, though—how do they behave, as seen by users?

18 Tentative Writes Two nodes may disagree on meaning of tentative (uncommitted) writes –Even if those two nodes have synced with each other! –Only CSNs from primary replica can resolve these disagreements permanently

19 Example: Disagreement on Tentative Writes ABC W time logs

20 Example: Disagreement on Tentative Writes ABC W time logs W

21 Example: Disagreement on Tentative Writes ABC W time logs W

22 Example: Disagreement on Tentative Writes ABC W time logs

23 Example: Disagreement on Tentative Writes ABC W time logs sync (3)

24 Example: Disagreement on Tentative Writes ABC W time logs sync (3) sync (4)

25 Example: Disagreement on Tentative Writes ABC W time logs sync (3) sync (4)

26 Example: Disagreement on Tentative Writes ABC W time logs sync (3) sync (4)

27 Trimming the Log When nodes receive new CSNs, can discard all committed log entries seen up to that point –Update protocol guarantees CSNs received in order Instead, keep copy of whole database as of highest CSN –By definition, official committed database –Everyone does (or will) agree on contents –Entries never need go through conflict resolution Result: no need to keep years of log data!

28 Ordering of Commits by Primary Replica Can primary commit writes in any order it pleases? –Suppose user creates appointment, then decides to delete it, or change attendee list –What order must these ops take in CSN order? Create first, then delete or modify Must be true in every node’s view of tentative log entries, too! Total order of writes must preserve order of writes made at each node –Not necessarily order among different nodes’ writes

29 How Does Primary Replica Commit Each Node’s Writes in Order? Nodes don’t quite use real-time clocks for timestamps—use Lamport logical clocks –Anytime see message with later timestamp than current time, set clock to after that timestamp All nodes send updates in order So primary receives updates in per-node causal order, and commits them in that order

30 Syncing with Trimmed Logs Suppose nodes discard all writes in log with CSNs –Just keep copy of “stable” DB, reflecting discarded entries Cannot receive writes that conflict with DB –Only could be if write has CSN less than a discarded CSN –Already saw all writes with lower CSNs in right order—if see them again, can discard!

31 Syncing with Trimmed Logs (2) To propagate to node X If node X’s highest CSN less than mine: –Send X full stable DB –X uses that DB as starting point –X can discard all his CSN log entries –X can play his tentative writes into that DB If node X’s highest CSN greater than mine: –X can ignore my DB!

32 Bayou: Summary Seems more useful than my Palm’s calendar! –Almost always disconnected when I make appointments –Automatic conflict resolution convenient Not at all transparent to applications! –Very strange programming practices –Writes are code, not just bits! –Check for conflicts, resolve conflicts Doesn’t work for all apps –Bank account may be OK –But hard to imagine for source code repository!