Eiger: Stronger Semantics for Low-Latency Geo-Replicated Storage Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡ * Princeton,

Slides:



Advertisements
Similar presentations
Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky David G. Andersen * Princeton, Intel Labs, CMU Dont Settle for Eventual : Scalable Causal Consistency.
Advertisements

Transactional storage for geo-replicated systems Yair Sovran, Russell Power, Marcos K. Aguilera, Jinyang Li NYU and MSR SVC.
Transaction chains: achieving serializability with low-latency in geo-distributed storage systems Yang Zhang Russell Power Siyuan Zhou Yair Sovran *Marcos.
High throughput chain replication for read-mostly workloads
Author: Yang Zhang[SOSP’ 13] Presentator: Jianxiong Gao.
AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.
Map/Reduce in Practice Hadoop, Hbase, MongoDB, Accumulo, and related Map/Reduce- enabled data stores.
Amazon’s Dynamo Simple Cloud Storage. Foundations 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks”E.F. Codd –Idea of tabular.
Causal Consistency Without Dependency Check Messages Willy Zwaenepoel.
Presented By Alon Adler – Based on OSDI ’12 (USENIX Association)
PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
SPORC Group Collaboration using Untrusted Cloud Resources 1SPORC: Group Collaboration using Untrusted Cloud Resources — OSDI 10/5/10 Ariel J. Feldman,
Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks Jiaqing Du | Calin Iorgulescu | Amitabha Roy | Willy Zwaenepoel École polytechnique.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Dynamo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as well as related cloud storage implementations.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
Cloud Storage: All your data belongs to us! Theo Benson This slide includes images from the Megastore and the Cassandra papers/conference slides.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Distributed Database Systems (R&G.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.
UC Berkeley Scaleable Structured Datastorage for Web 2.0 Michael Armbrust, David Patterson October, 2007.
Presented by Dr. Greg Speegle April 12,  Two-phase commit slow relative to local transaction processing  CAP Theorem  Option 1: Reduce availability.
CSC 536 Lecture 10. Outline Case study Google Spanner Consensus, revisited Raft Consensus Algorithm.
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Carnegie Mellon Increasing Intrusion Tolerance Via Scalable Redundancy Mike Reiter Natassa Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor.
By Vaibhav Nachankar Arvind Dwarakanath.  HBase is an open-source, distributed, column- oriented and sorted-map data storage.  It is a Hadoop Database;
Haonan Lu*†, Kaushik Veeraraghavan†, Philippe Ajoux†, Jim Hunt†,
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#24: Distributed Database Systems (R&G.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Spanner Steve Ko Computer Sciences and Engineering University at Buffalo.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
RAMCloud and the Low-Latency Datacenter John Ousterhout Stanford Platform Laboratory.
Minimizing Commit Latency of Transactions in Geo-Replicated Data Stores Paper Authors: Faisal Nawab, Vaibhav Arora, Divyakant Argrawal, Amr El Abbadi University.
Centiman: Elastic, High Performance Optimistic Concurrency Control by Watermarking Authors: Bailu Ding, Lucja Kot, Alan Demers, Johannes Gehrke Presenter:
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.
Lecture 17 Page 1 CS 188,Winter 2015 A Design Problem in Distributed Systems CS 188 Distributed Systems March 10, 2015.
CS 440 Database Management Systems
Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li
The SNOW Theorem and Latency-Optimal Read-Only Transactions
Strong Consistency & CAP Theorem
The SNOW Theorem and Latency-Optimal Read-Only Transactions
Consistency in Distributed Systems
Strong Consistency & CAP Theorem
Replication and Consistency
Be Fast, Cheap and in Control
I Can’t Believe It’s Not Causal
NoSQL Databases An Overview
Consistent Data Access From Data Centers
Replication and Consistency
CS 440 Database Management Systems
Consistency and Replication
Scalable Causal Consistency
Atomic Commit and Concurrency Control
The SNOW Theorem and Latency-Optimal Read-Only Transactions
COS 418: Distributed Systems Lecture 16 Wyatt Lloyd
Replication and Consistency
Presentation transcript:

Eiger: Stronger Semantics for Low-Latency Geo-Replicated Storage Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡ * Princeton, † Intel Labs, ‡ CMU

Geo-Replicated Storage 2 is the backend of massive websites FriendOf Like “Halting is Undecidable” Status

Storage Dimensions A-F G-L M-R S-Z Shard Data Across Many Nodes 3 Like “Halting is Undecidable” Status FriendOf

Storage Dimensions A-F G-L M-R S-Z Shard Data Across Many Nodes A-F G-L M-R S-Z A-F G-L M-R S-Z Data Geo-Replicated In Multiple Datacenters 4

Sharded, Geo-Replicated Storage A-F G-L M-R S-Z A-F G-L M-R S-Z A-F G-L M-R S-Z 5

Low Latency –Improves user experience –Correlates with revenue Strong Consistency –Obey user expectations –Easier for programmers Fundamentally in Conflict [LiptonSandberg88, AttiyaWelch94] Fundamentally in Conflict [LiptonSandberg88, AttiyaWelch94] Strong Consistency or Low Latency

Megastore [SIGMOD ‘08] Spanner [OSDI ‘12] . Gemini [OSDI ‘12] . Walter [SOSP ’11] . Dynamo [SOSP ’07]  COPS [SOSP ’11]  Eiger Obey user expectations Easier for programmers Causal+ Consistency Rich Data Model Read-only Txns Write-only Txns

Eiger Ensures Low Latency A-F G-L M-R S-Z A-F G-L M-R S-Z A-F G-L M-R S-Z 8 Keep All Ops Local

Causal+ Consistency Across DCs If A happens before B –Everyone sees A before B Obeys user expectations Simplifies programming New Job! Friends Boss Friends Boss Then

Causal For Column Families Operations update/read many columns Range query columns concurrent w/ deletes Counter columns See paper for details Lovelace Turing ChurchLovelaceTuring --1/1/54 9/1/361/1/54- Friends Friends Coun t Profile AgeTown 197London 100Princeton Key 1 Key 2 Val

Viewing Data Consistently Is Hard Asynchronous requests + distributed data = ????? A A B B C C Update A 2 Update B 3 Update C 4 ???

Read-Only Transactions Logical time gives a global view of data store –Clocks on all nodes, carried with all messages Insight: Store is consistent at all logical times Logical Time A 02 B 03 C 05 A1A1 A2A2 B1B1 B2B2 C1C1 C2C2

Read-Only Transactions Extract consistent up-to-date view of data –Across many servers Challenges –Scalability Decentralized algorithm –Guaranteed low latency At most 2 parallel rounds of local reads No locks, no blocking –High performance Normal case: 1 round of reads 13

Read-Only Transactions Round 1: Optimistic parallel reads Calculate effective time Round 2: Parallel read_at_times Logical Time A 02 B 03 C 05 A1A1 A2A2 B1B1 B2B2 C1C1 C2C2 Distributed StorageClient 0 1 A1A1 4 6 C2C2 3 B2B2 5 A2A2

Logical Time A 2 B 3 C 5 A2A2 B2B2 C2C2 Transaction Intuition Read-only transactions –Read from a single logical time Write-only transactions –Appear at a single logical time Bonus: Works for Linearizability Bonus: Works for Linearizability A3A3 B3B3 C3C3

Eiger Provides √ Low latency √ Rich data model √ Causal+ consistency √ Read-only transactions √ Write-only transactions But what does all this cost? Does it scale? 16

Eiger Implementation Fork of open-source Cassandra +5K lines of Java to Cassandra’s 75K Code Available: –

Evaluation Cost of stronger consistency & semantics –Vs. eventually-consistent Cassandra –Overhead for real (Facebook) workload –Overhead for state-space of workloads Scalability

A-F G-L M-R S-Z A-F G-L M-R S-Z 19 Replication Experimental Setup Local Datacenter (Stanford) Remote DC (UW) 88 8

Facebook Workload Results % Overhead

Eiger Scales 21 Facebook Workload 384 Machines! Scales out

Improving Low-Latency Storage COPS  Eiger Data model Key-Value  Column-Family Read-only Txns Causal stores  All stores Write-only Txns None  Yes Performance Good  Great DC Failure Throughput  Resilient degradation

Eiger Low-latency geo-replicated storage –Causal+ for column families –Read-only transactions –Write-only transactions Demonstrated in working system –Competitive with eventual –Scales to large clusters –

Eiger: Stronger Semantics for Low-Latency Geo-Replicated Storage Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky † David G. Andersen ‡ * Princeton, † Intel Labs, ‡ CMU