VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.
DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Transaction Management and Concurrency Control
CS 582 / CMPE 481 Distributed Systems
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks Jiaqing Du | Calin Iorgulescu | Amitabha Roy | Willy Zwaenepoel École polytechnique.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Distributed Databases
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Distributed Storage System Survey
1 6.4 Distribution Protocols Different ways of propagating/distributing updates to replicas, independent of the consistency model. First design issue.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 10 Transaction Management.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
Replicated Databases. Reading Textbook: Ch.13 Textbook: Ch.13 FarkasCSCE Spring
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Consistency.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Partitioning and Replication.
Replica Consistency in a Data Grid1 IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research December 1-5, 2003 High.
CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
CSE 486/586 Distributed Systems Consistency --- 3
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Advanced Database Design.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Introduction to Cloud.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Trade-offs in Cloud Databases
Transaction Management and Concurrency Control
MongoDB Distributed Write and Read
CSE 486/586 Distributed Systems Consistency --- 1
Partitioning and Replication
Dynamo: Amazon’s Highly Available Key-value Store
Lecturer : Dr. Pavle Mogin
6.4 Data and File Replication
CSE 486/586 Distributed Systems Consistency --- 3
Outline Announcements Fault Tolerance.
CSE 486/586 Distributed Systems Consistency --- 1
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Lecture 21: Replication Control
Distributed Systems CS
Database Recovery 1 Purpose of Database Recovery
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
CSE 486/586 Distributed Systems Consistency --- 3
Presentation transcript:

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer : Dr. Pavle Mogin

Advanced Database Design and Implementation 2015 Data Versioning 1 Plan for Data Versioning Influence of the replication on reads and writes Models of writing and reading as a function of the client consistency model Master Slave model –Two phase commit No Master model –Quorum based two phase commit –Version vector –Gossip state transfer model –Reedings: Have a look at Useful Links on the Home Page

Advanced Database Design and Implementation 2015 Data Versioning 2 R / W Operations on Replicated Data Introducing replication into a partitioning scheme provides for the following benefits: –Enhanced reliability, and –Work load balancing But, replication also has a negative influence on read and write operations: –To accelerate writing, a client application may update only one replica, and then the update is subsequently asynchronously propagated to all other replicas in the background, –Different replicas of the same data object may be updated concurrently by different clients –As a consequence, different replicas may have different (even conflicting ) versions of the same data object

Advanced Database Design and Implementation 2015 Data Versioning 3 A Motivating Example (1) Assume two client application simultaneously fetch the same data object from two different replicas: –[client 1] get(159) => {"name":“pavle", –[client 2] get(159) => {"name":“pavle", Then client 1 modifies the name and does a put [client 1] put(159, {"name":“pavle mogin", " ": "}) The client 2 modifies the and does a put [client 2] put(159, {"name":“pavle", Now, after a concurrent update, we have two conflicting versions of the same data object

Advanced Database Design and Implementation 2015 Data Versioning 4 A Motivating Example(2) In this example: –The two concurrent writes made the original data object value irrelevant (since they happened after the original) –But, there was no time available to propagate any of updates to the other replica before the other update has happen –So, we have got two conflicting versions of the same data object, and there is no rule which will tell the CDBS that it can throw away either the change to the or the change to the name data So, we need either: –A model of operation that will not allow conflicting writes to happen, or – A versioning system that will allow detecting overwrites and throw away the old versions, and allow detecting conflicts and let the client reconcile these

Advanced Database Design and Implementation 2015 Data Versioning 5 Client Consistency Models The partitioning and replication of data raise the following two questions: –How to dispatch a client request to a replica, and –How to propagate and apply updates to replicas The answers to the questions above depend on the client’s consistency model: –Strict consistency, –Strong consistency, –Read your writes consistency (RYOW), –Session consistency, –Monotonic read consistency, and –Eventual consistency

Advanced Database Design and Implementation 2015 Data Versioning 6 Master Slave Model of Operation(1) All replicas of each data partition are stored on a single master and multiple slave nodes –The master and slave nodes are virtual nodes of different physical nodes All update requests go to the master –The master stores the update in its log file on disk –Updates are asynchronously propagated to slaves –The master is a SPOF, if it crashes before at least one slave is written, the update will be lost –The acknowledgement of the write to the client may be delayed until at least one slave has been updated –If the master crushes, the most updated slave undertakes its role

Advanced Database Design and Implementation 2015 Data Versioning 7 Master Slave Model of Operation(2) If the client can tolerate some degree of data staleness, read requests go to any of replicas –Good for load balancing If stale data are inacceptable, reads have to go to the master, as well Since all writes go to master, there is a potential for achieving isolation and atomicity –Transactional properties may be guaranteed at least for writes (and updates) involving a single database object Mupltiple objects most probably belong to different shards Since Master(s) are single point of truth, versioning is considered not to be needed

Advanced Database Design and Implementation 2015 Data Versioning 8 MS Model with Consistent Hashing A H B C D E F G Replication factor n = 3 Master for the Range AB Slaves for the Range AB Another virtual node of the physical nod B has a slave role for the Range HA

Advanced Database Design and Implementation 2015 Data Versioning 9 Master Slave Model of Operation(3) A master may propagate updates to slaves in one of two ways: –State transfer and –Operation transfer In the state transfer, the master sends its latest state to slaves, which replace their current with the received new state It is robust, if a replica does not receive an update, it will be brought to a proper state by the next update In the operation transfer, the master sends a sequence of update operations to slaves, which then execute them –Less traffic due to shorter messages, but their order is crucial

Advanced Database Design and Implementation 2015 Data Versioning 10 No Master Model of Operation The master slave model of operation works very well when: –There is a high read/write ratio, and –The updates are evenly distributed over the key range If any of the two conditions above is not satisfied, updates to any replica are allowed This model of operation requires new techniques for replica synchronization: –Data versioning –Two phase commit: The techniques above have variants and can be applied either individually or can be combined

Advanced Database Design and Implementation 2015 Data Versioning 11 Data Versioning If clients can tolerate a relaxed data consistency model: –Updates can be propagated asynchronously by gossiping –Eventual consistency can be achieved by an anti-entropy protocol The anti-entropy protocol relies on data versioning Data versioning is achieved by appending such information to data objects that allows reasoning about the precedence of updates applied to object’s versions There are two characteristic versioning mechanisms: Timestamps and Version Vector

Advanced Database Design and Implementation 2015 Data Versioning 12 Timestamps Timestamps are fields of time data type appended to each independently updatable data item –When a data item is updated, its timestamp field is set to the corresponding real (or logical) clock value –A CDBMS may keep several versions of the same data item, each with its value of the timestamp Timestamps seem to be an obvious solution for developing a chronological update order –But, timestamps relay on synchronized clocks on all nodes –That may be a problem if replicas reside on nodes scattered around the world If a CDBMS keeps only the last update of an item, timestamps do not capture causality (the update history)

Advanced Database Design and Implementation 2015 Data Versioning 13 Version Vector (Intro) The version vector is a mechanism for tracking changes to data in a distributed system where multiple clients might update data objects at different times The version vector allows determining whether an update to a data object preceded another update, followed it, or two updates happened concurrently –Concurrent updates may be conflicting Version vectors enable causality tracking and are a basic mechanism for optimistic replication –In optimistic replication replicas are allowed to diverge The version vector is a special case of the vector clock mechanism –The two mechanisms have the same structure but their update rules slightly differ

Advanced Database Design and Implementation 2015 Data Versioning 14 Traditional Two Phase Commit Protocol The traditional two phase commit protocol provides for the strict consistency by bringing all replicas to the same state at every update Each update is coordinated by a replica node In the first (preparatory) phase, the coordinator sends updates to all replicas –Each replica writes data to a log file and, if successful, responds to the coordinator After gathering positive responses from all replicas, the coordinator initiates the second phase (commit) –Each replica writes another log entry and confirms the success to the coordinator

Advanced Database Design and Implementation 2015 Data Versioning 15 Quorum Based Two Phase Commit The traditional two phase commit protocol is not a popular choice for high throughput distributed database systems since it impairs availability –The coordinator has to wait quiet a number of network round trips and disk I/Os to complete –If a replica node crashes, the update is unsuccessful (all or nothing behaviour) A more efficient way is to use a quorum based two phase commit protocol with data versioning –The coordinator still issues an update request to all n replicas, but waits for a positive acknowledgement from only w (< n) replicas –A read has to return response from r (> 1) replicas and CDBMS returns the latest object version to the client –All conflicting objects are have to be reconciled –The condition r + w > n, guaranties a (relaxed) “strict consistency”, called strong consistency

Advanced Database Design and Implementation 2015 Data Versioning 16 Summary (1) The replication has a negative impact on read and write operations: –Different replicas may have conflicting versions of the same data object Read and write modes of replicated data depend on client’s consistency model ranging from strict to eventual consistency Master Slave mode of operation guaranties a (relaxed) strict consistency No master mode of operation relies on gossiping and uses one of, or combination of: –Data versioning and –Two phase commit

Advanced Database Design and Implementation 2015 Data Versioning 17 Summary(2) Except in the case of two phase commit, using gossip protocol even if there were no hardware outages or concurrent updates, unavoidably leads to the eventual consistency since the propagation of an update to all n replica nodes requires time proportional to log 2 n Two characteristic data versioning techniques are: Timestamping and Version vector The quorum based two phase commit also uses data versioning