VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

Dynamo: Amazon’s Highly Available Key-value Store Slides taken from created by paper authors Giuseppe DeCandia, Deniz Hastorun,
Dynamo: Amazon’s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.
D YNAMO : A MAZON ’ S H IGHLY A VAILABLE K EY - V ALUE S TORE Presented By Roni Hyam Ami Desai.
Distributed Hash Tables Chord and Dynamo Costin Raiciu, Advanced Topics in Distributed Systems 18/12/2012.
Amazon’s Dynamo Simple Cloud Storage. Foundations 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks”E.F. Codd –Idea of tabular.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Dynamo: Amazon's Highly Available Key-value Store Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Amazon Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Dynamo: Amazon’s Highly Available Key-value Store Adopted from slides and/or materials by paper authors (Giuseppe DeCandia, Deniz Hastorun, Madan Jampani,
1 Dynamo Amazon’s Highly Available Key-value Store Scott Dougan.
Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS 582 / CMPE 481 Distributed Systems
Dynamo Kay Ousterhout. Goals Small files Always writeable Low latency – Measured at 99.9 th percentile.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Dynamo: Amazon’s Highly Available Key- value Store (SOSP’07) Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman,
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Dynamo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as well as related cloud storage implementations.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, et.al., SOSP ‘07.
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
Dynamo: Amazon’s Highly Available Key-value Store Presented By: Devarsh Patel 1CS5204 – Operating Systems.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Plan for Intro to Cloud Databases
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia et al. [Amazon.com] Jagrut Sharma CSCI-572 (Prof. Chris Mattmann)
Dynamo: Amazon’s Highly Available Key-value Store COSC7388 – Advanced Distributed Computing Presented By: Eshwar Rohit
Dynamo: Amazon's Highly Available Key-value Store Dr. Yingwu Zhu.
Dynamo: Amazon’s Highly Available Key-value Store DeCandia, Hastorun, Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels PRESENTED.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
D YNAMO : A MAZON ’ S H IGHLY A VAILABLE K EY - VALUE S TORE Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani 1 Professor : Dr Sheykh Esmaili.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.
Dynamo: Amazon’s Highly Available Key-value Store
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Partitioning and Replication.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
DYNAMO: AMAZON’S HIGHLY AVAILABLE KEY-VALUE STORE GIUSEPPE DECANDIA, DENIZ HASTORUN, MADAN JAMPANI, GUNAVARDHAN KAKULAPATI, AVINASH LAKSHMAN, ALEX PILCHIN,
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Introduction to Cloud.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
Big Data Yuan Xue CS 292 Special topics on.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
CSCI5570 Large Scale Data Processing Systems NoSQL Slide Ack.: modified based on the slides from Peter Vosshall James Cheng CSE, CUHK.
Cassandra The Fortune Teller
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Trade-offs in Cloud Databases
Partitioning and Replication
Dynamo: Amazon’s Highly Available Key-value Store
Lecturer : Dr. Pavle Mogin
Replication Middleware for Cloud Based Storage Service
Providing Secure Storage on the Internet
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Presentation transcript:

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer : Dr. Pavle Mogin

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 1 Plan for Amazon’s Dynamo Prologue Data Model Partitioning and Replication Data Versioning Executing get() and put() Membership changes Replica Synchronization and Anti-Entropy Algorithm –Reedings: Have a look at Readings on the Home Page

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 2 Prologue Dynamo is one of the CDBMS used at Amazon –The others are: SimpleDB or S3, and Simple Storage Service –Dynamo is used for simple services requiring data access via the primary key, like the Shopping Cart application At Amazon, Dynamo is used to manage services that: –Have very high reliability requirements and –Need a tight control over tradeoffs between: Availability, Consistency, Cost-effectiveness, and Performance Dynamo is already in use since 2006 and has influenced the design of a number of other NoSQL CDBMS’s

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 3 Design Requirements Technical context: –The infrastructure is made up of tens thousands of servers and network components located in many data centres around the world, –Commodity hardware is used, –Components failure is a “standard mode of operation”, –Amazon uses a highly decentralized, loosely coupled, service oriented architecture consisting of hundreds of services Business considerations: –A strict internal service level agreement (SLA) has to be met for, practically, all customers, regardless of the amount of processing their requests need A simple SLA : response time of 300 ms for 99.9% of requests for a peak client load of 500 requests per second –High reliability since even a slightest outage has significant financial consequences and impacts user’s trust –High scalability to support a continuous growth

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 4 System Design (Data Model and API) Data model: key/value –Most services at Amazon need only to store and retrieve data by primary key and do not require complex querying and data management functionality –The value part is a BLOB –Writes are limited to one key/value pair with no references Operations: –get(key), returning a list of object versions and a context –put(key, context, value), –context is the system metadata containing a version vector –The get() operation may return more than one value if there is a conflict between objects with the given key –Dynamo treats key and value as opaque arrays of bytes –The key is hashed by the MD5 algorithm to determine the storage node

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 5 Design (Partitioning and Replication) To provide for incremental scalability, Dynamo uses consistent hashing to dynamically partition data across the present storage hosts –Each physical node contains a number of virtual nodes according to its performance Dynamo uses optimistic replication to ensure availability and durability in an environment where machine crushes are a standard mode of operation –Each data object is replicated n times A typical value for n at Amazon is 3 –Each node contains a list of nodes, called the preference list, for each key range A node from the top of the preference list becomes responsible for storing and replicating a new object with the key k (belonging to a certain range)

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 6 System Design (Data Versioning) Dynamo is designed to be an eventually consistent system that is always update available: –An update operation returns before all replica nodes have received and applied the update –Also, an update is accepted from a client even if it is apparent that the client is not aware of the latest version of the object To handle multiple versions of an object: –Dynamo uses a version vector (called vector clock), and –Always creates a new and immutable version of the object updated Many of the object versions are reconciled syntactically by Dynamo itself Whenever two replicas have ordered version vectors But some reads, may return a set of conflicting object versions that have to be reconciled semantically by a client knowing schema and business logic

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 7 Design ( get() and put()) Dynamo allows any storage node to receive a get() or put() request for any key –The node then uses the preference list to forward the request to a healthy prioritized storage host (the coordinator) To provide a consistent view to clients, Dynamo applies a quorum consistency protocol –Values of r = 2, w = 2, and n = 3 satisfy Amazon’s SLA, where r and w are minimum numbers of storage host to take part in a successful read or write, respectively, –Parameters r, w, and n are configurable by the application, –Applications needing the highest level of availability may set w = 1: Then, a write request is rejected only if all nodes in the system are unavailable –To achieve a higher level of durability, w should be greater than 1

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 8 Design (Execution of put()) Upon receiving a write request ( put(k, o, V c ) ), the answering node i (called coordinator): –Creates a version vector V by augmenting the vector and –Stores the new object (k, o, V ) locally –Sends (k, o, V ) to all other n – 1 storage nodes responsible for storing replicas of the new object –Waits for at least w – 1 nodes to acknowledge and then returns to the client

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 9 Design (Execution of get()) When a storage host (coordinator) receives a read request ( get(k) ): –It asks the top n - 1 other nodes from the preference list for the key k to read data requested, –Waits for r – 1 nodes to respond, –If these nodes reply with conflicting versions of the object, the coordinator: Constructs a summary version vector V Returns ({o 1,..., o r }, V), where o i, i = 1,...r are conflicting objects –If there was no conflicting versions of the object returned: The coordinator returns the latest version of the object and its version vector to the client

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 10 Design (Membership Changes) Dynamo uses a gossip network communication protocol to transfer messages between nodes Node outages (due to a failure or maintenance) are often transient, although may last for extended intervals A node outage rarely signifies a permanent departure and therefore should not result in rebalancing of the partition assignment For these reasons, Dynamo uses an explicit mechanism for addition and removing nodes from a Dynamo consistent hashing ring

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 11 Design (Node Addition and Removal) (1) An administrator uses a command line tool to a node and issues a command for a node addition or removal The node stores the membership change The gossip protocol is used to propagate membership changes –Each second, a node chooses a random peer to exchange the information about membership changes When a new node joins the consistent hashing ring, a token is chosen for each of its virtual nodes and stored permanently –Tokens are spread to other nodes by gossip together with membership changes information –By having this information, nodes are able to send a request to a node responsible for the key range

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 12 Design (Node Addition and Removal) (2) Adding nodes to the system changes the ownership of key ranges on the ring When a node determines it is not responsible for a key range any more, it transfers objects to the new node At the removal of a node, database objects are relocated in a reverse process Temporary failure detection is performed during gossiping –To avoid failed attempts during get(), put() operations, and data transfers, a node A considers a node B temporarily inaccessible if the node B does not respond to a node A’s gossip message

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 13 Design (Handling of Failures) The hinted handoff is a technique used to compensate for not relocating database objects of temporarily failed nodes Dynamo’s quorum is a sloppy one, since the first n healthy nodes from the preference list for the key are used when executing a read or write operation –Some of these n nodes may even be not responsible for the key Hence, a new object may be written on a node j that is not responsible for the key, instead off on the node i being an intended recipient of the object’s replica The new object is stored along with a hint about the intended recipient node i of the replica When the node i revives, the node j sends the object to it and deletes the object

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 14 Hinted Handoff (Example) A H B C D E F G Replication factor n = 3 Temporary down The preference list for the key k: C, D, E. F, G,... The object (k, o) stored here (k, o) Hinted Handoff Not responsible for Range BC Cordinator

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 15 Design (Replica Synchronization) Hinted handoff works well if the system membership changes are infrequent and failures are transient There are scenarios under which hinted replicas may become unavailable before they can be returned to the original replica node To detect the inconsistencies between replicas faster and to minimize the amount of data transfer between nodes, Dynamo uses Merkle trees Merkle trees are used to discover differences in key sets of the same key range held on different nodes in the same way as in Cassandra

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 16 Design (Anti-Entropy Algorithm) Each physical node maintains a separate Merkle tree for each key range hosted by one of its virtual nodes Two nodes exchange roots of Merkle trees for the key ranges they have in common By applying the tree traversal scheme as in the case of Cassandra, the nodes determine if they have any differences –If a difference exists, nodes apply a corresponding corrective action by copying the missing object

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 17 Failure of a Whole Data Centre A highly available storage system should be able to handle the failure of an entire data centre –A data centre failures happen due to: Power outages, Cooling failures, Network failures, and Natural disasters Dynamo is configured in such a way that each object is replicated across multiple data centres –Nodes in the preference list for a key range belong to multiple centres

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 18 Summary (1) Dynamo is one of Amazon’s CDBMSs It is in use since 2006 and has influenced a number of other CDBMSs including Cassandra Data model: key-value with a very simple API Data partitioning and replication: consistent hash ring with optimistic replication Data versioning: vector clocks

Advanced Database Design and Implementation 2016 Amazon’s Dynamo 19 Summary(2) Network communication: gossip protocol Handling of failures: hinted hand-off is used to compensate for not relocating database objects of temporarily failed nodes Replica synchronization: to detect inconsistencies Merkle trees are used Anti-Entropy algorithm: two nodes exchange Merkle tree roots for key ranges they have in common, find differences in key ranges (if any), and apply corrective actions