DynamoDB. Dynamo Amazon runs a world-wide web store that serves tens of millions of customers at peak times using tens of thousand servers located around.

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

Dynamo: Amazon’s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
© 2013 A. Haeberlen, Z. Ives Cloud Storage & Case Studies NETS 212: Scalable & Cloud Computing Fall 2014 Z. Ives University of Pennsylvania 1.
Chapter 19: Network Management Business Data Communications, 5e.
AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.
Distributed Hash Tables Chord and Dynamo Costin Raiciu, Advanced Topics in Distributed Systems 18/12/2012.
Amazon’s Dynamo Simple Cloud Storage. Foundations 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks”E.F. Codd –Idea of tabular.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Dynamo: Amazon's Highly Available Key-value Store Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Dynamo: Amazon’s Highly Available Key-value Store Adopted from slides and/or materials by paper authors (Giuseppe DeCandia, Deniz Hastorun, Madan Jampani,
1 Dynamo Amazon’s Highly Available Key-value Store Scott Dougan.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Chapter 13 (Web): Distributed Databases
Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.
NoSQL Databases: MongoDB vs Cassandra
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
Overview Distributed vs. decentralized Why distributed databases
Physical Database Monitoring and Tuning the Operational System.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Wide-area cooperative storage with CFS
Dynamo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as well as related cloud storage implementations.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, et.al., SOSP ‘07.
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
Dynamo: Amazon’s Highly Available Key-value Store Presented By: Devarsh Patel 1CS5204 – Operating Systems.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Databases with Scalable capabilities Presented by Mike Trischetta.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
© 2011 Cisco All rights reserved.Cisco Confidential 1 APP server Client library Memory (Managed Cache) Memory (Managed Cache) Queue to disk Disk NIC Replication.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
Dynamo: Amazon's Highly Available Key-value Store Dr. Yingwu Zhu.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
D YNAMO : A MAZON ’ S H IGHLY A VAILABLE K EY - VALUE S TORE Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani 1 Professor : Dr Sheykh Esmaili.
Dynamo: Amazon’s Highly Available Key-value Store
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Methodology – Physical Database Design for Relational Databases.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Big Data Yuan Xue CS 292 Special topics on.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
MongoDB Distributed Write and Read
Dynamo: Amazon’s Highly Available Key-value Store
Massively Parallel Cloud Data Storage Systems
Providing Secure Storage on the Internet
EECS 498 Introduction to Distributed Systems Fall 2017
Presented By: Aarushi Chawla ( ) Shiv Kandikuppa ( )
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Presentation transcript:

DynamoDB

Dynamo Amazon runs a world-wide web store that serves tens of millions of customers at peak times using tens of thousand servers located around the world As such, reliability and scalability are arguably the two most important features of its data management

Dynamo Amazon uses a service-oriented architecture consisting of hundreds of services This is because there is a need for storage technologies that are always available Customers should be able to view and add items to their shopping carts even if hardware is failing on the backend. Therefore, the cart service must always be able to write and read its data store – and this data must always be available from multiple locations Dynamo is designed to be an eventually consistent data store Dynamo uses an “always writeable” data store Complexity is handled at read time and not write Customer should always be able to write to their shopping cart Application handles the conflict resolution since it is aware of the data schema (instead of letting the data store handle it)

System Requirements Query model: The ability to read and write to data that is uniquely identified by a key ACID (Guarantees that transactions are processed reliably) Atomicity Consistency Isolation Durability Dynamo is only used by internal services, thus there are no security related requirements like authentication and authorization This also helps with efficiency as that means one less request per operation

Key Design Principles Incremental scalability Scale one node at a time with minimal impact on the system Symmetry Every node in Dynamo should have the same set of responsibilities as its peers. Decentralization Favors p2p techniques over centralized control That way, outages won’t occur if there is failure. One node’s problems will not take the entire system down with it Heterogeneity Work distribution is proportional to the capabilities of the individual services No additional work will need to be done if adding new nodes with higher capacity or capabilities

DynamoDB

DynamoDB Not based on Dynamo, but uses similar design principles DynamoDB is a fully managed NoSQL database service Provides fast and predictable performance with seamless scalability. You can use DynamoDB to create a database table that can store and retrieve any amount of data, and serve any level of request traffic. DynamoDB automatically spreads the data and traffic for the table over a sufficient number of servers to handle the request capacity specified by the customer and the amount of data stored, while maintaining consistent and fast performance.

DynamoDB Automatic Scaling Consistency Writes are always consistent Reads are eventually consistent Durability Written to disk and not memory (hence SSD) Availability

Provisioned Throughput Allows for predictable performance Reserve IOPS (per table) Set at creation, scale with API (whenever you want) i.e 500 reads per second, 1000 writes per second Per 1kb item: $.01 per hour per 10 writes per second $.01 per hour per 50 strongly consistent reads per second

DynamoDB

Dynamo Data Model NoSQL Database Collection of tables, items, and attributes Attributes are a name-value pair Single-valued Multi-valued Specify Schema – Only need key Schemas are not fixed. Each item may have a different number of attributes.

Example Items { Id = 202 ProductName = "21- Bicycle 202" Description = "202 description " BicycleType = " Road " Brand = " Brand - Company A " Price = 200 Gender = " M " Color = [ " Green ", " Black " ] ProductCategory = " Bike " } perguide/DataModel.html { Id = 201 ProductName = "18- Bicycle 201" Description = "201 description " BicycleType = " Road " Brand = " Brand - Company A " Price = 100 Gender = " M " Color = [ " Red ", " Black " ] ProductCategory = " Bike " }

Example – Basic Amazon

Creating a Table Log into Amazon Web Services (AWS) Services -> DynamoDB

Specifying a Primary Key Hash Key Type Unordered Hash index on primary key Hash and Range Key Type Primary key made of two attributes Unordered Hash index on primary key Sorted range index on the range key

Secondary Index Local Secondary Indexes - indexes on non-primary key attributes for quickly retrieving records in a hash partition (items that share the same hash value in their primary key) Global secondary indexes – allows querying over the whole table, not just within a partition as local secondary indexes, with any attributes. *Important for horizontal scaling

Throughput Capacity Throughput capacity can change with your needs Standard 100 MB of free storage, up to 5 writes/second and 10 reads/second of throughput capacity (432,000 writes/864,000 reads for free every day) Capacity Units Required ForHow to Calculate ReadsNumber of item reads per second × 4 KB item size (If you use eventually consistent reads, you'll get twice as many reads per second.) WritesNumber of item writes per second × 1 KB item size

Connecting to DynamoDB Create a Amazon AWS account Using AWS SDK Eclipse Plugin Create Client that is used to connect to the database

Creating Tables Connect to client Create a createTableRequest Key Type and Key Schema Specify throughput requirements Execute createTable on client

Adding an Item Use Put Operation Connect to database Create a PutItemRequest Execute PutItem using the put item request

Getting Data – Queries in DynamoDB Query vs. Scan Count and Limit Read consistency Default is eventual reads To perform a query or scan: Specify condition Create request object Run query or scan on client

Deleting Operations Connect to client Create DeleteItemRequest Specify TableName and Key Execute client.deleteItem(DeleteItemRequest)

Other Operations Conditional writes Update only if an item meets a certain condition Helps with concurrency support Safe to run operation again if response is not received Supported by putItem, updateItem, and deleteItem Atomic Counter Numeric attribute can be incremented or decremented Uses UpdateItem

Dynamo Design System Architecture & Logic

Review Partitioning Algorithm

Consistent Hashing

Preference List

Virtual Nodes

Virtual Nodes(Dilution)

Replication

Data Versioning (Eventual Consistency)

Data Versioning(Vector Clock)

Data Versioning

Reconciliation

Sloppy Quorum(Coordinator)

Sloppy Quorum(N)

Sloppy Quorum(R, W, N)

Membership and Failure Detection

Amazon measures their services at the 99.9 th percentile. Because of the nature of their business then, the utilities are targeted at controlling performance at this 99.9 th percentile. 99.9%

Amazon uses a gossip based protocol for membership and failure detection. In a gossip based protocol, each node contacts another node at random and “gossips” information to them about another nodes. As nodes gossip to each other, information about membership changes get propagated throughout the system. Gossip Protocol

Gossip-Enabled Monitoring Service(GEMS), published in 2006 goes into detail about the implementation of a Gossip Protocol The system was designed to be used for failure detection. Each node maintained three things independently: Gossip List Suspect List Suspect Matrix Gossip Protocol

In measuring for failure detection, three parameters are used: Gossip Time: The time interval between two consecutive gossip messages sent out by a node Cleanup Time: The interval between the time liveness information was last received for a particular node and the time it is suspected to have failed. Consensus Time: Time interval after which consensus is reached about a failed node. Gossip Protocol

In order for a node to fail, all other nodes have to reach a consensus on its failure. The opinion of suspected nodes are discarded when trying to reach a consensus. Once a consensus has been reached, the information is broadcast to all other nodes in the system. Each node also maintains its own list of live nodes that changes only once it has received information on a consensus In addition to propagating changes through a broadcast, nodes can propagate their live lists to other nodes as well. Gossip Protocol

Two other variations: Layered Gossiping: Similar to the gossip protocol mentioned above but nodes are grouped with each other in a layered pattern Consensus is reached within a group, but liveness is broadcast to all nodes. Biased Gossiping: Nodes are more likely to gossip to nodes that are closer to themselves (in terms of network delay) Gossip Protocol

In Dynamo every node keeps a membership list of all nodes in the system. In order to add a node, it must be done through a command line on a dynamo node. In this way, the node from which it was added updates its membership list to include the new node and propagates the membership change through the gossip protocol. A new node will map itself to the hash space. Mapping information is also reconciled at the same time membership information is reconciled. Dynamo Failure Detection

Dynamo has a local mode of failure detection as well, in which if node A considers node B failed if node B does not respond to node A’s messages. As soon as node A considers node B to be failed, it starts making requests along an alternate route, periodically checking for responsiveness from B. The gossip protocol is then used to propagate only explicit leave and join methods, while individual nodes detect communication failures. Dynamo Failure Detection

Conclusion / Insight

Strengths of DynamoDB Dynamo is well suited for dynamic and large data models Scalability allows it to shrink and grow incrementally and seamlessly as needed, by easily adding and removing nodes to the network Heterogeneity allows it to efficiently utilize many and disparate server bases to the best of their ability Can do this without requiring user-side manual adjustment on the workload of each node At the forefront of cloud computing Decentralization, small entry size, consistency and stability all lend themselves to this Not a lot of precedent for cloud database systems yet, so it has a chance to carve out a large niche for itself Favors simplicity in its implementation

Strengths of DynamoDB Cont’d At the forefront of cloud computing Decentralization, small entry size, consistency and stability all lend themselves to this Not a lot of precedent for cloud database systems yet, so it has a chance to carve out a large niche for itself Favors simplicity in its implementation Revolves around key-values and hash Using SSD implementation gives it a reliability that other cloud database services don’t offer

Strengths of DynamoDB DynamoDB’s main selling point is its adaptive throughput N, W, and R value “knobs” allow you to control latency This is particularly useful for applications with high variable demands Useful for: Companies experiencing large growth Applications whose demand is highly variable during the day Companies with little room for failure in R/W Applications with easily-resolvable versioning merges

Example

Example User – Stock Trading App Characteristics: Users’ portfolios will grow the longer they use the application, so it needs to be scalable to a large degree while managing load Automatic partitioning handles this hurdle Will need to be able to handle heavy traffic during trading hours, and then can scale back significantly in the evening Easily reconcilable versioning (a stock was either bought/sold or it wasn’t, so branch merges can simply mergesort all the different transactions)

Stock Trading App Cont’d Needs: Reliability of R/W Consistency AND availability Every transaction, as it deals with money, must succeed and be processed in an orderly fashion Must be able to weather the failure of a node, so synchronization is essential DynamoDB can satisfy all of these

Stock Trading App Cont’d Additional benefits: Atomicity and strong consistency allow the user to always get the absolute current prices and information on their portfolios and trading prices

Flaws with the Example A relational database may suit stock trading more – the amount of information you need for each stock is pretty consistent (Name, trading name, price, trend, etc.) Limitations on searching could be bad here (if we want to search more than the indices or the hash)

Weaknesses of DynamoDB 64KB limit on row size More suited to many small entries For instance, large BLOBs or long text entries would not work well with DynamoDB, particularly when replication for decentralization is concerned You would want to go with big-table or H-base in such a case Conversely, doesn’t accept binary data (only strings and numbers), which can lead to some inefficiencies

Weaknesses Cont’d Slight delay between table creation and usability (product of gossip- based distribution) Sacrifices flexibility and cost-efficiency for performance Is best suited for straightforward, high-traffic databases who put a premium on low latency and high performance