Practical Replication. Purposes of Replication Improve Availability Replicated databases can be accessed even if several replicas are unavailable Improve.

Slides:



Advertisements
Similar presentations
1 Scaleable Replicated Databases Jim Gray (Microsoft) Pat Helland (Microsoft) Dennis Shasha (Columbia) Pat ONeil (U.Mass)
Advertisements

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS 582 / CMPE 481 Distributed Systems
Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?
Overview Distributed vs. decentralized Why distributed databases
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.
Ordering of events in Distributed Systems & Eventual Consistency Jinyang Li.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Client-Centric.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Mobility Presented by: Mohamed Elhawary. Mobility Distributed file systems increase availability Remote failures may cause serious troubles Server replication.
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
Distributed Deadlocks and Transaction Recovery.
1 The Google File System Reporter: You-Wei Zhang.
AN OPTIMISTIC CONCURRENCY CONTROL ALGORITHM FOR MOBILE AD-HOC NETWORK DATABASES Brendan Walker.
Continuous Consistency and Availability Haifeng Yu CPS 212 Fall 2002.
1 6.4 Distribution Protocols Different ways of propagating/distributing updates to replicas, independent of the consistency model. First design issue.
Itrat Rasool Quadri ST ID COE-543 Wireless and Mobile Networks
M.Menelaou CCNA2 ROUTING. M.Menelaou ROUTING Routing is the process that a router uses to forward packets toward the destination network. A router makes.
Feb 7, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements Tomorrow’s class is officially cancelled. If you need someone to go over the reference implementation.
CS Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.
Concurrency Server accesses data on behalf of client – series of operations is a transaction – transactions are atomic Several clients may invoke transactions.
Replicated Databases. Reading Textbook: Ch.13 Textbook: Ch.13 FarkasCSCE Spring
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Association Rule Mining in Peer-to-Peer Systems Ran Wolff Assaf Shcuster Department of Computer Science Technion I.I.T. Haifa 32000,Isreal.
1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.
1 Multiversion Reconciliation for Mobile Databases Shirish Hemanath Phatak & B.R.Badrinath Presented By Presented By Md. Abdur Rahman Md. Abdur Rahman.
Information/File Access and Sharing Coda: A Case Study J. Kistler, M. Satyanarayanan. Disconnected operation in the Coda File System. ACM Transaction on.
Ing. Erick López Ch. M.R.I. Replicación Oracle. What is Replication  Replication is the process of copying and maintaining schema objects in multiple.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Write Conflicts in Optimistic Replication Problem: replicas may accept conflicting writes. How to detect/resolve the conflicts? client B client A replica.
An Architecture for Mobile Databases By Vishal Desai.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
3/6/99 1 Replication CSE Transaction Processing Philip A. Bernstein.
CSE 486/586 Distributed Systems Consistency --- 3
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Transactions on Replicated Data Steve Ko Computer Sciences and Engineering University at Buffalo.
Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks Authors: Q. Huang, C. Julien, G. Roman Presented By: Jeff.
Distributed Systems Lecture 6 Global states and snapshots 1.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Chapter Name Replication and Mobile Databases Transparencies
6.4 Data and File Replication
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
EECS 498 Introduction to Distributed Systems Fall 2017
Consistency and Replication
Assignment 8 - Solution Problem 1 - We replicate database DB1.
Transactions and Concurrency
Replica Placement Model: We consider objects (and don’t worry whether they contain just data or code, or both) Distinguish different processes: A process.
Distributed Database Management Systems
Presentation transcript:

Practical Replication

Purposes of Replication Improve Availability Replicated databases can be accessed even if several replicas are unavailable Improve Performance Replicas can be geographically diverse, with closest replica serving each client

Problems with Replication Consistency of the replicated data Many applications require consistency regardless of which replica is read from or inserted into Consistency is expensive Some replication schemes will reduce update availability Others require reconciliation after inconsistency occurs Performance may suffer as agreement across replicas may be necessary

The Costs and Limits of Availability for Replicated Services Consistency vs. Availability Many applications don’t need strong consistency Can specify a maximum deviation Consistency don’t need to be sacrificed during normal operation Only perform tradeoff when failure occurs Typically two choices of consistency Strong consistency Low availability, high data accuracy Weak consistency High availability, low accuracy (lots of conflicts and stale access) Continuous Consistency Model A spectrum of different levels of consistency Dynamically adapt consistency bounds in response to environmental changes

Continuous Consistency Model

Metrics of Consistency Three categories of errors in consistency at a replica Numerical error The total number of writes accepted by the system but not seen by the replica Staleness Difference between current time and the acceptance time of the oldest write not seen locally Order error Number of writes that have not established their commit order at the local replica

Example of Numerical and Order Error

Deriving tight upper bound on availability Want to derive a tight upper bound on the Avail service based on a given level of consistency, workload, and faultload Avail service  F(consistency, workload, faultload) Upper bound helps evaluate existing consistency protocols Reveal inherent impact of consistency on availability Optimize existing consistency protocols Questions: Must determine which write to accept or reject Accepting all writes that do not violate consistency may preclude acceptance of a larger number of write in the future Determine when and where to propagate writes Write propagation decreases numerical error but can increase order error Must decide serialization order Can affect the order error

Upper bound as a function of Numerical error and staleness Questions on write propagation When and where to propagate writes Simply propagate writes to all replicas whenever possible – Aggressive write propagation Always help reduce both numerical error and staleness Questions on write acceptance Must perform a exhaustive search on all possible sets of accepted writes To maximize availability and ensure numerical and staleness bounds are not violated Search space can be reduced by collapsing all writes in an interval to a single logical write Due to Aggressive write propagation

Upper bound as a function of order error To commit a write, a replica must see all preceding writes in the global serialization order Must determine the global serialization order Factorial number of serialization order Search space can be reduced Causal order  Serialization orders compatible with causal order Cluster order  Writes accepted by the same partition during a particular interval cluster together

Serialization order Example: Suppose Replica 1 receives transaction W 1 and W 2 and Replica 2 receives W 3 and W 4 Causal S = W 1 W 2 W 3 W 4 better than S’ = W 2 W 1 W 3 W 4 Whenever W 2 can be committed using S’, the replica must bave already seen W 1 and thus can also commit W 2 in S. The same is true for W 1, W 3, W 4 Cluster Only 2 possible clusters S = W 1 W 2 W 3 W 4 and W 3 W 4 W 1 W 2 Intuition is that it does not expedite write commitment on any replica if the writes accepted by the same partition during a particular interval are allowed to split into multiple sections in the serialization order Cluster has smallest search space

What can we get from this? Modify an existing protocol with ideas from proof Each replica ensure that the error bound on other replicas are not violated Replica may push writes to other replicas before accepting a new write Added aggressive write propagation Analyze other protocols for order error Primary copy protocol A write is committed when it reaches the primary replica Serialization order is the write order as seen by primary replica Golding’s algorithm Each write assigned a logical timestamp that determines serialization order Each replica maintains a version vector to determine whether it has seen all writes with time less than t Pulls in writes from other replicas to advance version vector Voting Order is determining by voting of members Is there anything else other than Aggressive Write Propagation that we can get from this proof?

Numerical Error Bound As predicted, aggressive write propagation improves availability Also increases the number of messages required Removes the optimization of combining multiple updates to amortize communication costs Packet header overhead, packet boundaries, ramping up to the bottleneck bandwidth in TCP

Order Error Bound Aggressive Voting also performs well Base voting is awful Lazy replication can cause each replica to casts a vote for a different uncommitted write Each replica must collect votes from all replicas to determine winner and any unknown vote can be the deciding one Aggressive ensures most votes for the same uncommitted write Only need to contact a subset of nodes

Effects of Replication Scale Adding more replicas Reduces network failure rate Increases replica rejection rate Availability = (1 – Network Failure Rate ) * (1 – Rejection Rate)

The Dangers of Replication and a Solution Replication works well with a few nodes Limited deadlocks and reconciliation needed Does not scale well or handle mobile nodes that are normally disconnected Cubic growth of deadlock and reconciliation rates predicated in this paper Is this a fundamental limitation? Eventually reaches system delusion Database is inconsistent and there is no obvious way to repair it What about mobile nodes? Does replication currently work well with nodes that can be disconnected?

How does replication models affect deadlock/reconciliation rates Models to propagate updates to replica Eager replication Updates applied to all replicas of an object as part of original transaction Lazy replication One replica is updated by the original transaction Updates to other replicas propagate asynchronously as separate transactions Models to regulate replica updates Group Any node with a copy of the data can update it Master Each object has a master node Only master can update the primary copy

Eager Replication Updates all replicas in same transaction No serialization anomalies, no need for reconciliation Not an option for mobile systems Updates may fail even if all nodes are connected all the time When replicated, deadlock rate grows cubic to the rate number of nodes Each node must do its own work and also apply updates generated by other nodes Probability of a wait also increases Deadlocks can be removed if used with an object-master approach Lower throughput due to synchronous updates

Lazy Group Replication Any node can update any local data Updates are propagated asynchronously in separate transactions Timestamps are used to detect and reconcile updates Each object carries the timestamp of its most recent update Each replica update carries the new value and is tagged the old object timestamp Receiving replica tests if local timestamp and the update’s old timestamp are equal If so, update is safe, local timestamp advances to the new transaction timestamp Else, update may be dangerous, and requires reconciliation on the transaction

Lazy Group Replication Waits in a eager replication system faces reconciliation in a lazy group system Waits much more frequent than deadlocks Can be used for mobile systems

Lazy Master Replication Updates are propagated asynchronously in separate transactions Only object master can update object No reconciliation required Deadlock possible Not appropriate for mobile applications Requires atomic transaction with the owner

Non-Transactional Schemes Let’s be less ambitious and reduce the domain Abandon serializability for convergence Add timestamps to each update Lotus Notes approach: If update has a greater timestamp than current, replace current Else, discard update System works if updates are commutative Value is completely replaced Adding or subtracting constants May not even need timestamp

Two-tier system Two node types Mobile Nodes Often disconnected May originate tentative transactions Base nodes Always connected Two version types Master Version Most recent value received from object master Tentative Version Local version and may be updated by tentative transactions

Two-tier system Base Transaction Work on master data and produces new master data Involved with at most one mobile node, and several base nodes Tentative Transaction Work on local tentative data Produces tentative version and a base transaction to be run later on the base nodes Base transaction generated by tentative transaction may fail or produce different results Based on a user specified acceptance criteria  E.g. The bank balance must not go negative

Two-tier system If tentative transaction fails Originating node informed of failure Similar to lazy-group replication except Master database is always converged Originating node need to only contact a base node to discover whether the tentative transaction is acceptable

Example When Mobile node connects Discard tentative object version since it will be soon refreshed Send its master object updates Objects that the mobile node is master Send all tentative transactions Accept replica updates from the base node Accept notice of success or failure of each tentative transaction

Example On host Send delayed replica updates to mobile node Accepts delayed mobile-mastered objects Accepts list of tentative transactions with acceptance criteria After base node commits, propagate update to other replicas Converge mobile node state with base state

Two-tier system Does the two-tier system solve the scalability of replication problem Yes, but only if we can restrict the domain Can we do better? Or is this a fundamental problem that can’t be solved entirely?