Consistency Guarantees and Snapshot isolation Marcos Aguilera, Mahesh Balakrishnan, Rama Kotla, Vijayan Prabhakaran, Doug Terry MSR Silicon Valley.

Slides:



Advertisements
Similar presentations
Chen Zhang Hans De Sterck University of Waterloo
Advertisements

Transaction Intro slide XXX New builtin get_transaction() Returns transaction for current thread ZEO manages transactions across processes, machines Transactions.
Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky David G. Andersen * Princeton, Intel Labs, CMU Dont Settle for Eventual : Scalable Causal Consistency.
Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.
Dynamo: Amazon’s Highly Available Key-value Store
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Lecture 11 Recoverability. 2 Serializability identifies schedules that maintain database consistency, assuming no transaction fails. Could also examine.
Presented By Alon Adler – Based on OSDI ’12 (USENIX Association)
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
Consistency-Based Service Level Agreements for Cloud Storage Douglas B. Terry, Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, Marcos K. Aguilera,
CS 582 / CMPE 481 Distributed Systems
“Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System ” Distributed Systems Κωνσταντακοπούλου Τζένη.
Session - 14 CONCURRENCY CONTROL CONCURRENCY TECHNIQUES Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.
GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks Jiaqing Du | Calin Iorgulescu | Amitabha Roy | Willy Zwaenepoel École polytechnique.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer.
Mobility Presented by: Mohamed Elhawary. Mobility Distributed file systems increase availability Remote failures may cause serious troubles Server replication.
File Systems (2). Readings r Silbershatz et al: 11.8.
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
Practical Replication. Purposes of Replication Improve Availability Replicated databases can be accessed even if several replicas are unavailable Improve.
Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Bayou. References r The Case for Non-transparent Replication: Examples from Bayou Douglas B. Terry, Karin Petersen, Mike J. Spreitzer, and Marvin M. Theimer.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
Concurrency Server accesses data on behalf of client – series of operations is a transaction – transactions are atomic Several clients may invoke transactions.
Replicated Databases. Reading Textbook: Ch.13 Textbook: Ch.13 FarkasCSCE Spring
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
第5讲 一致性与复制 §5.1 副本管理 Replica Management §5.2 一致性模型 Consistency Models
Distributed File Systems
1 Multiversion Reconciliation for Mobile Databases Shirish Hemanath Phatak & B.R.Badrinath Presented By Presented By Md. Abdur Rahman Md. Abdur Rahman.
2/29/ Replication CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety Copyright ©2012 Philip A. Bernstein.
Caching Consistency and Concurrency Control Contact: Dingshan He
Write Conflicts in Optimistic Replication Problem: replicas may accept conflicting writes. How to detect/resolve the conflicts? client B client A replica.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Transactions on Replicated Data Steve Ko Computer Sciences and Engineering University at Buffalo.
State Machine Replication State Machine Replication through transparent distributed protocols State Machine Replication through a shared log.
Centiman: Elastic, High Performance Optimistic Concurrency Control by Watermarking Authors: Bailu Ding, Lucja Kot, Alan Demers, Johannes Gehrke Presenter:
Nomadic File Systems Uri Moszkowicz 05/02/02.
CSE 486/586 Distributed Systems Consistency --- 1
Dynamo: Amazon’s Highly Available Key-value Store
Cassandra Transaction Processing
Lecturer : Dr. Pavle Mogin
6.4 Data and File Replication
Two phase commit.
Consistency and Replication
MVCC and Distributed Txns (Spanner)
EECS 498 Introduction to Distributed Systems Fall 2017
I Can’t Believe It’s Not Causal
Chapter 10 Transaction Management and Concurrency Control
CSE 486/586 Distributed Systems Consistency --- 1
Replication and Recovery in Distributed Systems
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Scalable Causal Consistency
Distributed Transactions
Atomic Commit and Concurrency Control
Replicated data consistency explained through baseball
Concurrency Control II and Distributed Transactions
Last Class: Web Caching
EEC 688/788 Secure and Dependable Computing
Global Distribution.
CSE 486/586 Distributed Systems Consistency --- 3
Ch 6. Summary Gang Shen.
Concurrency control (OCC and MVCC)
Presentation transcript:

Consistency Guarantees and Snapshot isolation Marcos Aguilera, Mahesh Balakrishnan, Rama Kotla, Vijayan Prabhakaran, Doug Terry MSR Silicon Valley

Goals Develop a cloud storage system featuring 1.multiple consistency levels – requires one API to learn, one system to administer – handles diversity of requirements within and across applications 2.read-write transactions – with snapshot isolation – on replicated and partitioned data 3.consistency-based SLAs

Geo-Replication primary secondaries datacenter remote secondaries remote datacenter Read Write

Client API – Get (key) – Put (key, object) – BeginTx (consistency) – EndTx () – BeginSession (consistency) – EndSession () Puts/ Gets Transaction Session

Transaction Properties Conventional transaction model – BeginTx … EndTx Atomic updates to multiple objects Multi-object reads from snapshots Even across partitions

Partitioned Data for Scalability Data partitioned by key range Each partition has its own primary and secondary servers Key-rangePrimarySecondaries A-F S1 S2, S4 G-P S2 S4, S5 Q-Z S3 S1, S4, S5

Write Operations Writes performed at primary server(s) – May have different primaries for different objects Propagate to secondary servers eventually – Any gossip or anti-entropy protocol will do Have a commit timestamp, i.e. global order – And deterministic outcomes No write conflicts => All replicas converge towards a mutually consistent state

Versioned Data Store Store version history for each object Can perform writes as soon as commit timestamp is known – need not perform writes in commit order Can eventually prune old versions time Object A Object B V1 V2 V3V4

Per-Replica State Datastore = set of High-time = timestamp of latest received write transaction – Assumes transactions are received in order – May receive periodic null transactions Low-time = timestamp of most recent discarded object version

Read Operations Single-key Gets go to one server Multi-partition transactions may read from multiple servers Server(s) selected based on desired consistency – E.g. read from nearby server when possible Alternative: Broadcast operation to all servers – Take first response that is consistent enough

Read-Only Transactions Transaction assigned a read timestamp Read from snapshot at that time – See all write transactions committed before this time, and only those writes Consistency guarantee places constraints on read timestamp

Reads on Versioned Data Store Allows reads at any timestamp – Without placing constraints on write propagation Assuming no future transaction could be assigned a commit timestamp before the read timestamp time Object A Object B V1 V2 V3V4 Read timestamp

Selecting Read Timestamp GuaranteeRead timestamp Strong Consistencynow (or time of last committed write) Eventual Consistencyany time Consistent Prefixany time Bounded Stalenessany time within bound Monotonic Readsany time later or equal to that of previous read transaction in this session Read My Writesany time later or equal to that of previous write transaction in this session assuming in-order delivery of writes

Acceptable Read Timestamps time 0 BeginTx strong read-my-writes monotonic bounded causal eventual read timestamp

Selecting Read Timestamp time node A low high node B low high node C low high Read timestamp

Read-Write Transactions Transaction assigned a read timestamp and a commit timestamp Use optimistic concurrency control – Old read timestamps increase the chance of abort Read from snapshot at read timestamp – With selected consistency guarantee Batch writes until commit – No undo needed Validate transaction at commit timestamp

Transaction Lifetime Get(x) … Put(x, value) Session Transaction Select read timestamp and perform Get Buffer Put Get commit timestamp, validate, and perform Puts time

Committing Write Transactions Snapshot isolation => Check that no object being written has a version between the transaction’s read timestamp and commit timestamp Serializability => Check that no object being read or written has a version between the transaction’s read timestamp and commit timestamp