Replication and Consistency. References r The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer,

Slides:



Advertisements
Similar presentations
Consistency Guarantees and Snapshot isolation Marcos Aguilera, Mahesh Balakrishnan, Rama Kotla, Vijayan Prabhakaran, Doug Terry MSR Silicon Valley.
Advertisements

TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Eventual Consistency Jinyang. Sequential consistency Sequential consistency properties: –Latest read must see latest write Handles caching –All writes.
Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements.
Consistency and Replication (3). Topics Consistency protocols.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Recovery CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
“Managing Update Conflicts in Bayou, a Weekly Connected Replicated Storage System” Presented by - RAKESH.K.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
CS 582 / CMPE 481 Distributed Systems
“Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System ” Distributed Systems Κωνσταντακοπούλου Τζένη.
Flexible Update Propagation for Weakly Consistent Replication Karin Petersen, Mike K. Spreitzer, Douglas B. Terry, Marvin M. Theimer and Alan J. Demers.
Department of Electrical Engineering
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Mutual Consistency Detection of mutual inconsistency in distributed systems (Parker, Popek, et. al.) Distributed system with replication for reliability.
Administering Active Directory
G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.
Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer.
Mobility Presented by: Mohamed Elhawary. Mobility Distributed file systems increase availability Remote failures may cause serious troubles Server replication.
Distributed Databases
Transactions and Recovery
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory, Enhanced Chapter 7: Active Directory Replication.
Distributed Deadlocks and Transaction Recovery.
WORKFLOW IN MOBILE ENVIRONMENT. WHAT IS WORKFLOW ?  WORKFLOW IS A COLLECTION OF TASKS ORGANIZED TO ACCOMPLISH SOME BUSINESS PROCESS.  EXAMPLE: Patient.
Mobility in Distributed Computing With Special Emphasis on Data Mobility.
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements.
CS Storage Systems Lecture 14 Consistency and Availability Tradeoffs.
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements Tomorrow’s class is officially cancelled. If you need someone to go over the reference implementation.
Bayou. References r The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer, and Marvin M. Theimer.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Exercises for Chapter 2: System models
Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Overview – Chapter 11 SQL 710 Overview of Replication
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
 Distributed file systems having transaction facility need to support distributed transaction service.  A distributed transaction service is an extension.
By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved.
Mobile File System Byung Chul Tak. AFS  Andrew File System Distributed computing environment developed at CMU provides transparent access to remote shared.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Write Conflicts in Optimistic Replication Problem: replicas may accept conflicting writes. How to detect/resolve the conflicts? client B client A replica.
Asynchronous Replication client B Idea: build available/scalable information services with read-any-write-any replication and a weak consistency model.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Bayou: Replication with Weak Inter-Node Connectivity Brad Karp UCL Computer Science CS GZ03 / th November, 2007.
Eventual Consistency Jinyang. Review: Sequential consistency Sequential consistency properties: –All read/write ops follow some total ordering –Read must.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Mobility Victoria Krafft CS /25/05. General Idea People and their machines move around Machines want to share data Networks and machines fail Network.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Nomadic File Systems Uri Moszkowicz 05/02/02.
Chapter 25: Advanced Data Types and New Applications
Chapter 19: Distributed Databases
EECS 498 Introduction to Distributed Systems Fall 2017
Fundamentals of Databases
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Outline The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer, and Marvin M. Theimer. IEEE.
Transactions in Distributed Systems
EEC 688/788 Secure and Dependable Computing
Transactions, Properties of Transactions
Presentation transcript:

Replication and Consistency

References r The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer, and Marvin M. Theimer. IEEE Data Engineering, December 1998 r Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System Douglas B. Terry, Marvin M. Theimer, Karin Petersen, Alan J. Demers, Mike J. Spreitzer and Carl H. Hauser. In ACM Symposium on Operating Systems Principles (SOSP ’95)

ACID Transaction r Transaction has to show up everywhere at the same time or not at all. m e.g., When you withdraw cash from your ATM machine, the balance should reflect the actual money left. If it doesn’t, then you could go back to a store, use your ATM card and withdraw cash that you do not have m e.g., When you make your airline reservation and the system assigns you a seat, you expect the seat to be available to you (of course, Airlines overbook)

Replication and Availability r Replication is a powerful tool that allows us to tradeoff availability for consistency r Applications need different levels of consistency r Applications know best on how to deal with inconsistency

Application 1: Meeting room scheduler r Suppose we have two conference rooms of the same capacity. I want to schedule my meeting in one of the conference rooms. I don’t care which exact room it is. r If two people reserve the same room at the same time, there is a conflict, but if they reserve the same room at different times or reserve different rooms at the same time, there is no conflict. Rm2 Rm1 time No conflict

Application 1: Meeting room scheduler Rm2 Rm1 time No conflict Rm2 Rm1 time conflict

Application 1: Meeting room scheduler r We can lock the entire database m Not needed when there is no conflict m In the case of conflicts, there is an application specific way to deal with the conflict – we move on reservation to the other room m If the other room is reserved, we ask the user, they can easily move the reservation to another acceptable time

Application 2: Shared mailbox r Shared mailbox folders- shared between, me and my 2TAs. r We all replicate the mailbox. m OP1: I see a mail from the class, respond to it and delete it. m OP2: The TA sees the same mail and files it in CS402. m OP3: I see an from a friend and file it as important. mailbox CS402ImportantRecruitingChocolate

Application 2: Shared Mailbox r All of us operate on the same mailbox r You can lock the entire mailbox before someone operates on it. m Can’t work when disconnected m Clearly not necessary for doing only one operation m For operation OP1 and OP2, it is not clear who should win, should the mail be deleted or should it be filed in Assign1?

Two Approaches to Building Replicated Services r Transparent replication system: m Allow systems that were developed assuming a central file system or database to run unchanged on top of a strongly-consistent replicated storage system (as seen in Oceanstore) r Non-transparent replication system: m Relaxed consistency model – access-update-anywhere m Applications involved in conflict detection and resolution. Hence applications need to be modified (e.g. Bayou, Coda file system etc)

Hypothesis r Applications know best on how to resolve conflicts r The challenge is providing the right interface to support cooperation between applications and their data managers m Programmers do not want to deal with propagating updates, ensuring eventual consistency Anyone who has synchronized the project files in school, work, and home can feel the pain. m Programmers want to set replication schedules and control how conflicts or detected and resolved Record level conflict detection rather than file level

Bayou r Update-anywhere replication model m Bayou manages databases that can be fully replicated at any number of sites m Applications can read and write to any single replica of the database (lazy group update) m Once a replica accepts a write operation, this write is performed locally and propagated to all other replicas via pair-wise reconciliation protocol

Conflict Detection : Dependency Checks r Each Write operation includes a dependency check consisting of an application-supplied query and its expected result. r If the check fails, then the requested update is not performed and the server invokes a procedure to resolve the detected conflict.

Example of Bayou Write 3-tuple: For example, Update: Dependency check: Mergeproc: sometimes users like conflicts m A different merge procedure altogether could search for the next available time slot to schedule the meeting, which is an option a user might choose if any time would be satisfactory.

Conflict Resolution : Merge Procedure r In practice, Bayou merge procedures are written by application programmers in the form of templates that are instantiated with the appropriate details filled in for each Write. r In the case where automatic resolution is not possible, the merge procedure will still run to completion, but is expected to produce a revised update that logs the detected conflict in some fashion that will enable a person to resolve the conflict later.

Replica Management r Replicas held by two servers at any time may vary in their contents because they have received and processed different Writes. However, this fundamental property is satisfied: m Bayou system guarantees that all servers eventually receive all Writes via the pair-wise anti-entropy process and that two servers holding the same set of Writes will have the same data contents.  It cannot enforce strict bounds on Write propagation delays since these depend on network connectivity factors that are outside of Bayou ’ s control

Replica Consistency r Bayou has two features that allows servers to achieve eventual consistency. m Writes performed in the same, well-defined order at all servers (global-ordering) m Conflict detection and merge procedures are deterministic

Replica Consistency r When a Write is accepted by a Bayou server from a client, it is deemed tentative. r Tentative writes are ordered according to timestamps assigned to them by their accepting servers. r Eventually, each Write is committed, by the anti- entropy process that will be described shortly. r Timestamps for tentative Writes must monotonically increase at each server. r Servers do not have to have synchronized clocks

Replica Consistency r Consistency is potentially an issue since servers may receive Writes from clients and from other servers in an order that differs from the required execution order and because servers immediately apply all known Writes to their replicas. r This implies that there must be support of undoing writes (use of write logs) and reapplying them r Each server maintains a log of all Write operations that it has received, sorted by their committed or tentative timestamps, with committed Writes at the head of the log.

Anti-Entropy r Entropy - a process of degradation or running down or a trend to disorder. r Bring 2 replicas up-to-date r Three Major Design Decisions m Pairwise communication between replicas m Exchange of update operations m Ordered propagation of operations

Pair Reconciliation Replica Eventual consistency Global commit order assigned by Primary server

Example r Suppose a user keeps the primary copy of his calendar with him on his laptop and allows others, such as a spouse or secretary, to keep secondary (mostly read copies). r The user updates to his own calendar; This is committed immediately. r Updates by the spouse/secretary are tentative until anti-entropy takes place with the user. At this point, the user can commit and propagate the order to the spouse/secretary during anti- entropy.

Basic Anti-Entropy r Protocol: m Between pairs of servers m The propagation of writes is constrained by the accept order. r Prefix property: A server R that holds a write stamped write, W i, that was initially accepted by another server X will also hold all writes accepted by X prior to W i

Basic Anti-Entropy r Protocol m R.V: This denotes R’s version vector; This is used to determine which writes are unknown to the receiving server R anti-entropy(S,R) { Get R.V from receiving server R #now send all the writes unknown to R w = first write in S.write-log while (w) do if R.V(w.server-id) < w.accept-stamp then # w is new for R SendWrite(R,w) w = next write in S.write-log end }

Basic Anti-Entropy r Anti-entropy is incremental r When a new write arrives at the receiver it can be immediately included in the receiver's write-log because the sending replica ensures that the receiving server will hold all writes necessary to satisfy the prefix property. r Reconciliation between two replicas can make progress independently of where the protocol may get interrupted due to network failures or voluntary disconnections. r The protocol does not address the issue of the growing size of write logs.

Effective Write-Log Management r Storage is of concern r We want to be able to prune the prefix of the write logs r A protocol is needed to stabilize writes (we look at a primary commit) protocol. r Primary replica commits write and assigns a monotonically increasing commit sequence number called CSN. r Committed writes are totally ordered r Propagation: m First send the committed writes m Second send the tentative writes

Anti-Entropy with Support for Committed Writes anti-entropy(S,R) { Get R.V from receiving server R #First send all the committed writes that R does #not know about if R.CSN < S.CSN then w = first committed write that R does not know about. while (w) do if w.accept-stamp < R.V(w.server-id) then # R has the write, but does not know it is committed. SendCommitNotification(R, w.accept-stamp,w.server-id, w.CSN) else SendWrite(R,w) end w = next committed write in S.write-log. #now send all the tentative writes while (w) do if R.V(w.server-id) < w.accept-stamp then SendWrite(R,W) w = next write in S.write-log end }

Effective Write-Log Management r It is necessary to allow replicas to truncate any prefix of the committed (stable) part of the write log when there is a need. r Implication: A write-log may not hold enough writes to allow incremental reconciliation with another replica. r A commit sequence number is maintained for the omitted part of the log. r A vector characterizing the omitted prefix of the server’s write-log is also maintained. r If the commit sequence number of the receiver is less than the omitted sequence number of the server then a (perhaps full) database transfer occurs.

Access Control r Certificates – Grant, delegate and revoke r No assumptions about trust r Mutual authentication and access control is based on public-key cryptography. r Every user possesses a public/private key pair and a set of digitally signed access control certificates granting user access to various data collections.

Performance r Size is acceptable r Write performance is acceptable

Future r Partial Databases m Carry part of the database instead of the entire database (mobile clients do not have enough storage space) The problem is that, if a client did not have a particular record, was it because it didn’t replicate that part of because it didn’t know about it?

Technology Impact r TrueSync - end-to-end synchronization software and infrastructure solutions for the wireless Internet m  SyncML - SyncML is the common language for synchronizing all devices and applications over any network. m Ericsson, IBM, Lotus, Motorola, Nokia, Palm Inc., Psion, Starfish Software etc. (614 companies) m

Conclusions r Difference from other replicated systems m Non-transparency m Application-specific conflict detection m Per-write conflict resolvers m Partial and multi-object updates m Tentative and stable resolutions m Security r Future goal m Partial replication, policies for choosing servers for anti-entropy, building servers with conventional database managers, alternate data models, and finer grain access control.