Transactional storage for geo-replicated systems Yair Sovran, Russell Power, Marcos K. Aguilera, Jinyang Li NYU and MSR SVC.

Slides:



Advertisements
Similar presentations
Chen Zhang Hans De Sterck University of Waterloo
Advertisements

Replication for Availability & Durability with MySQL and Amazon RDS Grant McAlister.
Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky David G. Andersen * Princeton, Intel Labs, CMU Dont Settle for Eventual : Scalable Causal Consistency.
Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
Consistency Guarantees and Snapshot isolation Marcos Aguilera, Mahesh Balakrishnan, Rama Kotla, Vijayan Prabhakaran, Doug Terry MSR Silicon Valley.
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Transaction chains: achieving serializability with low-latency in geo-distributed storage systems Yang Zhang Russell Power Siyuan Zhou Yair Sovran *Marcos.
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
High throughput chain replication for read-mostly workloads
Phase Reconciliation for Contended In-Memory Transactions Neha Narula, Cody Cutler, Eddie Kohler, Robert Morris MIT CSAIL and Harvard 1.
Serializable Isolation for Snapshot Databases Michael J. Cahill, Uwe Röhm, and Alan D. Fekete University of Sydney ACM Transactions on Database Systems.
Author: Yang Zhang[SOSP’ 13] Presentator: Jianxiong Gao.
SQL Server Disaster Recovery Chris Shaw Sr. SQL Server DBA, Xtivia Inc.
Presented By Alon Adler – Based on OSDI ’12 (USENIX Association)
1 Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Two phase commit. What we’ve learnt so far Sequential consistency –All nodes agree on a total order of ops on a single object Crash recovery –An operation.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
Signature Based Concurrency Control Thomas Schwarz, S.J. JoAnne Holliday Santa Clara University Santa Clara, CA 95053
Session - 14 CONCURRENCY CONTROL CONCURRENCY TECHNIQUES Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
Overview Distributed vs. decentralized Why distributed databases
GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks Jiaqing Du | Calin Iorgulescu | Amitabha Roy | Willy Zwaenepoel École polytechnique.
Low-Latency Multi-Datacenter Databases using Replicated Commit
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
RAMCloud Design Review Recovery Ryan Stutsman April 1,
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.
CS Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a.
Molecular Transactions G. Ramalingam Kapil Vaswani Rigorous Software Engineering, MSRI.
OpenACS: Porting Oracle Applications to PostgreSQL Ben Adida
Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson,Jean Michel L´eon, Yawei Li, Alexander Lloyd, Vadim Yushprakh Megastore.
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.
Eiger: Stronger Semantics for Low-Latency Geo-Replicated Storage Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡ * Princeton,
Homework 4 Code for word count com/content/repositories/releases/com.cloud era.hadoop/hadoop-examples/
Speculative Execution in a Distributed File System Ed Nightingale Peter Chen Jason Flinn University of Michigan.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
THE EVOLUTION OF CODA M. Satyanarayanan Carnegie-Mellon University.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Minimizing Commit Latency of Transactions in Geo-Replicated Data Stores Paper Authors: Faisal Nawab, Vaibhav Arora, Divyakant Argrawal, Amr El Abbadi University.
Storage Systems CSE 598d, Spring 2007 Lecture 13: File Systems March 8, 2007.
CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.
Slide credits: Thomas Kao
Wiera: Towards Flexible Multi-Tiered Geo-Distributed Cloud Storage Instances Zhe Zhang.
MDCC: Multi-data Center Consistency
Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li
CSE 486/586 Distributed Systems Consistency --- 2
MongoDB Distributed Write and Read
Dynamo: Amazon’s Highly Available Key-value Store
The SNOW Theorem and Latency-Optimal Read-Only Transactions
Faster Data Structures in Transactional Memory using Three Paths
6.4 Data and File Replication
Concurrency control in transactional systems
Clock-SI: Snapshot Isolation for Partitioned Data Stores
Livepatching data structures
Architecture of Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models Lecture 11 6/19/2006 Dr Steve Hunter.
I Can’t Believe It’s Not Causal
Consistent Data Access From Data Centers
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
Lecture 17: Transactional Memories I
EECS 498 Introduction to Distributed Systems Fall 2017
Presented by Marek Zawirski
Scalable Causal Consistency
Data-Intensive Distributed Computing
COS 418: Distributed Systems Lecture 16 Wyatt Lloyd
CSE 486/586 Distributed Systems Consistency --- 2
Distributed Databases
Presentation transcript:

Transactional storage for geo-replicated systems Yair Sovran, Russell Power, Marcos K. Aguilera, Jinyang Li NYU and MSR SVC

Life in a web startup

Web apps need geo-replicated storage Geo-replicated transactional storage

Consistency vs. performance: existing tradeoffs Eventual Consistency Less coordination More anomalies More coordination Fewer anomalies Serializability Maximize multi-site performance Have few anomalies Maximize multi-site performance Have few anomalies Snapshot Isolation

Our contribution 1.New semantics: Parallel Snapshot Isolation (PSI) 2.Walter: implementing PSI efficiently – Preferred site – Counting set 3.Application experience

Snapshot isolation Timeline of storage state Read-X Write-X Commit Read-Y Write-Y Commit Snapshot isolations guarantees 1.Read snapshots from global timeline 2.Prohibit write-write conflict 3.Preserve causality T1 T2

PSI avoids global transaction ordering Site1 Site2 Site1 timeline Site2 timeline Read-X Write-X Commit Read-Y Write-Y Commit A transaction commits locally first, then propagates to remote sites. T1 T2 Walter achieves this efficiently Snapshot isolations guarantees 1.Read snapshots from global timeline 2.Prohibit write-write conflict 3.Preserve causality Parallel Per-site

PSI has few anomalies short forkNoYes long forkNo Yes conflicting forkNo Yes AnomalySerializ- ability Snapshot Isolation PSIEventual dirty readNo Yes non-repeatable read No Yes lost updateNo Yes

PSIs anomaly T1 T2 Short fork (allowed by snapshot isolation) T1 commits T2 commits Long fork (disallowed by snapshot isolation) T1 T2 T1 commits T2 commits T1 and T2 propagate to both sites

Walter overview C C Start_TX Commit_TX Read Write C C C C C C C C C C Replicate data Coordinate for PSI Site1 Site2 Main challenge: avoid write-write conflict across sites Walters solution 1.Preferred site 2.Counting set

Technique #1: preferred site Associate each users data with a preferred site Common case: write at preferred site fast commit – Rare case: write at non-preferred site cross-site 2-phase commit Bobs photos Bobs photos Alices photos Write C C C C Alices photos Bobs photos Write (fast commit) slow commit Site1Site2 Alice Bob

Technique #2: counting set Problem: some objects are modified from many sites Counting set: a data type free of write-write conflict Be-friend Eve write C C C C Site 1Site 2 Eves friendlist Eves friendlist Alice Bob

Technique #2: counting set add(Bob) Add/del operations commute no need to check for write-write conflict Caveat: application developers must deal with counts C C Bob 1 Alice 1 Bob 1 add(Alice) C C add Alice 1 Eves friendlist Alice Bob Be-friend Eve Site1 Site2

Site failure Two options to handle a site failure – Conservative: block writes whose preferred site failed – Aggressive: re-assign preferred site elsewhere Warning: Committed but not-yet- replicated transactions may be lost

Application #1: WaltSocial Wall and Friendlist are counting sets Meow says: Meow Meow Meow Bob-cat says: I saw a mouse Peanut says: awldaiwdliawd Meow says: I think I ate too much catnip last night. Meow. Befriend transaction A read Alices profile B read Bobs profile Add A.uid to B.friendlist Add B.uid to A.friendlist Add Alice is now friends with Bob to A.wall Add Bob is now friends with Alice to B.wall

Applications #2: Twitter clone Third party app in PHP Our port: switch storage backend from Redis to Walter Each users timeline is a counting set Post-status transaction write status to new object O foreach f in users followers add O to fs timeline_cset

Evaluation Walter prototype – Implemented in C++ with PHP binding – Custom RPC library with Protocol Buffers Testbed: Amazon EC2 – Extra-large instance – Up to 4-sites (Virginia, California, Ireland, Singapore) Full replication across sites

Walter scales Read/write a 100-byte object Reads working set fits in memory Read Write

WaltSocial achieves low latency A post-on-wall transaction reads 2 objects, writes 2 objects, updates 2 counting sets A post-on-wall transaction reads 2 objects, writes 2 objects, updates 2 counting sets

Walter lets ReTwis scale to >1 sites Read Timeline Post status Follow user Redis Walter (1-site) Walter (2-site)

Related work Cloud storage systems – Single-site: Bigtable, Sinfonia, Percolator – No/limited transaction: Dynamo, COPS, PNUTS – Synchronous replication: Megastore, Scatter Replicated database systems – Eager vs. lazy replication – Escrow transactions: for numeric data Conflict-free replicated data types – Inspired counting sets

Conclusion PSI is a good tradeoff for geo-replicated storage – Allows fast commit with asynchronous replication – Prohibits write-write conflict and preserves causality Walter realizes PSI efficiently – Preferred site – Conflict-free counting set