CSE 486/586 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

Dynamo: Amazon’s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.
Amazon’s Dynamo Simple Cloud Storage. Foundations 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks”E.F. Codd –Idea of tabular.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Dynamo: Amazon's Highly Available Key-value Store Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Amazon Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
1 Dynamo Amazon’s Highly Available Key-value Store Scott Dougan.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.
Dynamo Kay Ousterhout. Goals Small files Always writeable Low latency – Measured at 99.9 th percentile.
Dynamo: Amazon’s Highly Available Key- value Store (SOSP’07) Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman,
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Dynamo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as well as related cloud storage implementations.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, et.al., SOSP ‘07.
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
Dynamo: Amazon’s Highly Available Key-value Store Presented By: Devarsh Patel 1CS5204 – Operating Systems.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Dynamo: Amazon’s Highly Available Key-value Store COSC7388 – Advanced Distributed Computing Presented By: Eshwar Rohit
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Mid-Semester Overview Steve Ko Computer Sciences and Engineering University at Buffalo.
Dynamo: Amazon's Highly Available Key-value Store Dr. Yingwu Zhu.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
Dynamo: Amazon’s Highly Available Key-value Store
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University.
CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
DYNAMO: AMAZON’S HIGHLY AVAILABLE KEY-VALUE STORE GIUSEPPE DECANDIA, DENIZ HASTORUN, MADAN JAMPANI, GUNAVARDHAN KAKULAPATI, AVINASH LAKSHMAN, ALEX PILCHIN,
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586 Distributed Systems Consistency --- 3
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Transactions on Replicated Data Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Google Spanner Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Big Data Yuan Xue CS 292 Special topics on.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
CSE 486/586 Distributed Systems Distributed Hash Tables
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586 Distributed Systems Gossiping
CSE 486/586 Distributed Systems Consistency --- 2
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
CSE 486/586 Distributed Systems Mid-Semester Overview
CSE 486/586 Distributed Systems Distributed Hash Tables
CSE 486/586 Distributed Systems Consistency --- 2
CSE 486/586 Distributed Systems Consistency --- 1
Dynamo: Amazon’s Highly Available Key-value Store
Lecturer : Dr. Pavle Mogin
CSE 486/586 Distributed Systems Consistency --- 3
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
Dynamo Recap Hadoop & Spark
EECS 498 Introduction to Distributed Systems Fall 2017
CSE 486/586 Distributed Systems Consistency --- 1
CSE 486/586 Distributed Systems Distributed Hash Tables
CSE 486/586 Distributed Systems Consistency --- 2
CSE 486/586 Distributed Systems Consistency --- 3
CSE 486/586 Distributed Systems Consistency --- 1
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Presentation transcript:

CSE 486/586 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo

CSE 486/586 Recap CAP Theorem? –Consistency, Availability, Partition Tolerance –P then C? A? Eventual consistency? –Availability and partition tolerance over consistency Lazy replication? –Replicate lazily in the background Gossiping? –Contact random targets, infect, and repeat in the next round 2

CSE 486/586 Amazon Dynamo Distributed key-value storage –Only accessible with the primary key –put(key, value) & get(key) Used for many Amazon services (“applications”) –Shopping cart, best seller lists, customer preferences, product catalog, etc. –Now in AWS as well (DynamoDB) (if interested, read dynamodb.html) dynamodb.html With other Google systems (GFS & Bigtable), Dynamo marks one of the first non-relational storage systems (a.k.a. NoSQL) 3

CSE 486/586 Amazon Dynamo A synthesis of techniques we discuss in class –Very good example of developing a principled distributed system –Comprehensive picture of what it means to design a distributed storage system Main motivation: shopping cart service –3 million checkouts in a single day –Hundreds of thousands of concurrent active sessions Properties (in the CAP theorem sense) –Eventual consistency –Partition tolerance –Availability (“always-on” experience) 4

CSE 486/586 Necessary Pieces? We want to design a storage service on a cluster of servers What do we need? –Membership maintenance –Object insert/lookup/delete –(Some) Consistency with replication –Partition tolerance Dynamo is a good example as a working system. 5

CSE 486/586 Overview of Key Design Techniques Gossiping for membership and failure detection –Eventually-consistent membership Consistent hashing for node & key distribution –Similar to Chord –But there’s no ring-based routing; everyone knows everyone else Object versioning for eventually-consistent data objects –A vector clock associated with each object Quorums for partition/failure tolerance –“Sloppy” quorum similar to the available copies replication strategy Merkel tree for resynchronization after failures/partitions –(This was not covered in class yet) 6

CSE 486/586 Membership Nodes are organized as a ring just like Chord using consistent hashing But everyone knows everyone else. Node join/leave –Manually done –An operator uses a console to add/delete a node –Reason: it’s a well-maintained system; nodes come back pretty quickly and don’t depart permanently most of the time Membership change propagation –Each node maintains its own view of the membership & the history of the membership changes –Propagated using gossiping (every second, pick random targets) Eventually-consistent membership protocol 7

CSE 486/586 Failure Detection Does not use a separate protocol; each request serves as a ping –Dynamo has enough requests at any moment anyway If a node doesn’t respond to a request, it is considered to be failed. 8

CSE 486/586 Node & Key Distribution Original consistent hashing Load becomes uneven 9

CSE 486/586 Node & Key Distribution Consistent hashing with “virtual nodes” for better load balancing Start with a static number of virtual nodes uniformly distributed over the ring 10

CSE 486/586 Node & Key Distribution One node joins and gets all virtual nodes 11 Node 1

CSE 486/586 Node & Key Distribution One more node joins and gets 1/2 12 Node 1 Node 2

CSE 486/586 Node & Key Distribution One more node joins and gets 1/3 (roughly) from the other two 13 Node 1 Node 2 Node 3

CSE 486/586 CSE 486/586 Administrivia Survey! – 14

CSE 486/586 Replication N: # of replicas; configurable The first is stored regularly with consistent hashing N-1 replicas are stored in the N-1 (physical) successor nodes (called preference list) 15

CSE 486/586 Replication Any server can handle read/write in the preference list, but it walks over the ring –E.g., try B first, then C, then D, etc. Update propagation: by the server that handled the request 16

CSE 486/586 Replication Dynamo’s replication is lazy. –A put() request is returned “right away” (more on this later); it does not wait until the update is propagated to the replicas. This leads to inconsistency, solved by object versioning 17

CSE 486/586 Object Versioning Writes should succeed all the time –E.g., “Add to Cart” –As long as there’s at least one reachable server Used to reconcile inconsistent data Each object has a vector clock –E.g., D1 ([Sx, 1], [Sy, 1]): Object D (version 1) has written once by server Sx and Sy. –Each node keeps all versions until the data becomes consistent –I.e., no overwrite, almost like each write creates a new object Causally concurrent versions: inconsistency If inconsistent, reconcile later. –E.g., deleted items might reappear in the shopping cart. 18

CSE 486/586 Object Versioning Example 19

CSE 486/586 Object Versioning Consistency revisited –Linearizability: any read operation reads the latest write. –Sequential consistency: per client, any read operation reads the latest write. –Eventual consistency: a read operations might not read the latest write & sometimes inconsistent versions need to be reconciled. Conflict detection & resolution required Dynamo uses vector clocks to detect conflicts Simple resolution done by the system (last-write-wins policy) Complex resolution done by each application –System presents all conflicting versions of data 20

CSE 486/586 Object Versioning Experience Over a 24-hour period 99.94% of requests saw exactly one version % saw 2 versions % saw 3 versions % saw 4 versions Usually triggered by many concurrent requests issued by robots, not human clients 21

CSE 486/586 Quorums Parameters –N replicas –R readers –W writers Static quorum approach: R + W > N Typical Dynamo configuration: (N, R, W) == (3, 2, 2) But it depends –High performance read (e.g., write-once, read-many): R==1, W==N –Low R & W might lead to more inconsistency Dealing with failures –Another node in the preference list handles the requests temporarily –Delivers the replicas to the original node upon recovery 22

CSE 486/586 Replica Synchronization Key ranges are replicated. Say, a node fails and recovers, a node needs to quickly determine whether it needs to resynchronize or not. –Transferring entire (key, value) pairs for comparison is not an option Merkel trees –Leaves are hashes of values of individual keys –Parents are hashes of (immediate) children –Comparison of parents at the same level tells the difference in children –Does not require transferring entire (key, value) pairs 23

CSE 486/586 Replica Synchronization Comparing two nodes that are synchronized –Two (key, value) pairs: (k0, v0) & (k1, v1) 24 h0 = hash(v0)h1 = hash(v1) h2 = hash(h0 + h1) h0 = hash(v0)h1 = hash(v1) h2 = hash(h0 + h1) Node0Node1 Equal

CSE 486/586 Replica Synchronization Comparing two nodes that are not synchronized –One: (k0, v2) & (k1, v1) –The other: (k0, v0) & (k1, v1) 25 h3 = hash(v2)h1 = hash(v1) h4 = hash(h2 + h1) h0 = hash(v0)h1 = hash(v1) h2 = hash(h0 + h1) Node0Node1 Not equal

CSE 486/586 Summary Amazon Dynamo –Distributed key-value storage with eventual consistency Techniques –Gossiping for membership and failure detection –Consistent hashing for node & key distribution –Object versioning for eventually-consistent data objects –Quorums for partition/failure tolerance –Merkel tree for resynchronization after failures/partitions Very good example of developing a principled distributed system 26

CSE 486/ Acknowledgements These slides contain material developed and copyrighted by Indranil Gupta (UIUC).