Presentation is loading. Please wait.

Presentation is loading. Please wait.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.

Similar presentations


Presentation on theme: "VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer."— Presentation transcript:

1 VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer : Dr. Pavle Mogin

2 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 1 Plan for Amazon’s Dynamo Prologue Data Model Partitioning and Replication Data Versioning Executing get() and put() Membership changes Replica Synchronization and Anti-Entropy Algorithm –Reedings: Have a look at Readings on the Home Page

3 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 2 Prologue Dynamo is one of the CDBMS used at Amazon –The others are: SimpleDB or S3, and Simple Storage Service –Dynamo is used for simple services requiring data access via the primary key, like the Shopping Cart application At Amazon, Dynamo is used to manage services that: –Have very high reliability requirements and –Need a tight control over tradeoffs between: Availability, Consistency, Cost-effectiveness, and Performance Dynamo is already in use since 2006 and has influenced the design of a number of other NoSQL CDBMS’s

4 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 3 Design Requirements Technical context: –The infrastructure is made up of tens thousands of servers and network components located in many data centres around the world, –Commodity hardware is used, –Components failure is a “standard mode of operation”, –Amazon uses a highly decentralized, loosely coupled, service oriented architecture consisting of hundreds of services Business considerations: –A strict internal service level agreement (SLA) has to be met for, practically, all customers, regardless of the amount of processing their requests need A simple SLA : response time of 300 ms for 99.9% of requests for a peak client load of 500 requests per second –High reliability since even a slightest outage has significant financial consequences and impacts user’s trust –High scalability to support a continuous growth

5 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 4 System Design (Data Model and API) Data model: key/value –Most services at Amazon need only to store and retrieve data by primary key and do not require complex querying and data management functionality –The value part is a BLOB –Writes are limited to one key/value pair with no references Operations: –get(key), returning a list of object versions and a context –put(key, context, value), –context is the system metadata containing a version vector –The get() operation may return more than one value if there is a conflict between objects with the given key –Dynamo treats key and value as opaque arrays of bytes –The key is hashed by the MD5 algorithm to determine the storage node

6 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 5 Design (Partitioning and Replication) To provide for incremental scalability, Dynamo uses consistent hashing to dynamically partition data across the present storage hosts –Each physical node contains a number of virtual nodes according to its performance Dynamo uses optimistic replication to ensure availability and durability in an environment where machine crushes are a standard mode of operation –Each data object is replicated n times A typical value for n at Amazon is 3 –Each node contains a list of nodes, called the preference list, for each key range A node from the top of the preference list becomes responsible for storing and replicating a new object with the key k (belonging to a certain range)

7 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 6 System Design (Data Versioning) Dynamo is designed to be an eventually consistent system that is always update available: –An update operation returns before all replica nodes have received and applied the update –Also, an update is accepted from a client even if it is apparent that the client is not aware of the latest version of the object To handle multiple versions of an object: –Dynamo uses a version vector (called vector clock), and –Always creates a new and immutable version of the object updated Many of the object versions are reconciled syntactically by Dynamo itself Whenever two replicas have ordered version vectors But some reads, may return a set of conflicting object versions that have to be reconciled semantically by a client knowing schema and business logic

8 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 7 Design ( get() and put()) Dynamo allows any storage node to receive a get() or put() request for any key –The node then uses the preference list to forward the request to a healthy prioritized storage host (the coordinator) To provide a consistent view to clients, Dynamo applies a quorum consistency protocol –Values of r = 2, w = 2, and n = 3 satisfy Amazon’s SLA, where r and w are minimum numbers of storage host to take part in a successful read or write, respectively, –Parameters r, w, and n are configurable by the application, –Applications needing the highest level of availability may set w = 1: Then, a write request is rejected only if all nodes in the system are unavailable –To achieve a higher level of durability, w should be greater than 1

9 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 8 Design (Execution of put()) Upon receiving a write request ( put(k, o, V c ) ), the answering node i (called coordinator): –Creates a version vector V by augmenting the vector and –Stores the new object (k, o, V ) locally –Sends (k, o, V ) to all other n – 1 storage nodes responsible for storing replicas of the new object –Waits for at least w – 1 nodes to acknowledge and then returns to the client

10 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 9 Design (Execution of get()) When a storage host (coordinator) receives a read request ( get(k) ): –It asks the top n - 1 other nodes from the preference list for the key k to read data requested, –Waits for r – 1 nodes to respond, –If these nodes reply with conflicting versions of the object, the coordinator: Constructs a summary version vector V Returns ({o 1,..., o r }, V), where o i, i = 1,...r are conflicting objects –If there was no conflicting versions of the object returned: The coordinator returns the latest version of the object and its version vector to the client

11 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 10 Design (Membership Changes) Dynamo uses a gossip network communication protocol to transfer messages between nodes Node outages (due to a failure or maintenance) are often transient, although may last for extended intervals A node outage rarely signifies a permanent departure and therefore should not result in rebalancing of the partition assignment For these reasons, Dynamo uses an explicit mechanism for addition and removing nodes from a Dynamo consistent hashing ring

12 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 11 Design (Node Addition and Removal) (1) An administrator uses a command line tool to a node and issues a command for a node addition or removal The node stores the membership change The gossip protocol is used to propagate membership changes –Each second, a node chooses a random peer to exchange the information about membership changes When a new node joins the consistent hashing ring, a token is chosen for each of its virtual nodes and stored permanently –Tokens are spread to other nodes by gossip together with membership changes information –By having this information, nodes are able to send a request to a node responsible for the key range

13 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 12 Design (Node Addition and Removal) (2) Adding nodes to the system changes the ownership of key ranges on the ring When a node determines it is not responsible for a key range any more, it transfers objects to the new node At the removal of a node, database objects are relocated in a reverse process Temporary failure detection is performed during gossiping –To avoid failed attempts during get(), put() operations, and data transfers, a node A considers a node B temporarily inaccessible if the node B does not respond to a node A’s gossip message

14 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 13 Design (Handling of Failures) The hinted handoff is a technique used to compensate for not relocating database objects of temporarily failed nodes Dynamo’s quorum is a sloppy one, since the first n healthy nodes from the preference list for the key are used when executing a read or write operation –Some of these n nodes may even be not responsible for the key Hence, a new object may be written on a node j that is not responsible for the key, instead off on the node i being an intended recipient of the object’s replica The new object is stored along with a hint about the intended recipient node i of the replica When the node i revives, the node j sends the object to it and deletes the object

15 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 14 Hinted Handoff (Example) A H B C D E F G Replication factor n = 3 Temporary down The preference list for the key k: C, D, E. F, G,... The object (k, o) stored here (k, o) Hinted Handoff Not responsible for Range BC Cordinator

16 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 15 Design (Replica Synchronization) Hinted handoff works well if the system membership changes are infrequent and failures are transient There are scenarios under which hinted replicas may become unavailable before they can be returned to the original replica node To detect the inconsistencies between replicas faster and to minimize the amount of data transfer between nodes, Dynamo uses Merkle trees Merkle trees are used to discover differences in key sets of the same key range held on different nodes in the same way as in Cassandra

17 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 16 Design (Anti-Entropy Algorithm) Each physical node maintains a separate Merkle tree for each key range hosted by one of its virtual nodes Two nodes exchange roots of Merkle trees for the key ranges they have in common By applying the tree traversal scheme as in the case of Cassandra, the nodes determine if they have any differences –If a difference exists, nodes apply a corresponding corrective action by copying the missing object

18 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 17 Failure of a Whole Data Centre A highly available storage system should be able to handle the failure of an entire data centre –A data centre failures happen due to: Power outages, Cooling failures, Network failures, and Natural disasters Dynamo is configured in such a way that each object is replicated across multiple data centres –Nodes in the preference list for a key range belong to multiple centres

19 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 18 Summary (1) Dynamo is one of Amazon’s CDBMSs It is in use since 2006 and has influenced a number of other CDBMSs including Cassandra Data model: key-value with a very simple API Data partitioning and replication: consistent hash ring with optimistic replication Data versioning: vector clocks

20 Advanced Database Design and Implementation 2016 Amazon’s Dynamo 19 Summary(2) Network communication: gossip protocol Handling of failures: hinted hand-off is used to compensate for not relocating database objects of temporarily failed nodes Replica synchronization: to detect inconsistencies Merkle trees are used Anti-Entropy algorithm: two nodes exchange Merkle tree roots for key ranges they have in common, find differences in key ranges (if any), and apply corrective actions


Download ppt "VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer."

Similar presentations


Ads by Google