DynamoDB. Dynamo Amazon runs a world-wide web store that serves tens of millions of customers at peak times using tens of thousand servers located around.

DynamoDB

Dynamo Amazon runs a world-wide web store that serves tens of millions of customers at peak times using tens of thousand servers located around the world As such, reliability and scalability are arguably the two most important features of its data management http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf

Dynamo Amazon uses a service-oriented architecture consisting of hundreds of services This is because there is a need for storage technologies that are always available Customers should be able to view and add items to their shopping carts even if hardware is failing on the backend. Therefore, the cart service must always be able to write and read its data store – and this data must always be available from multiple locations Dynamo is designed to be an eventually consistent data store Dynamo uses an “always writeable” data store Complexity is handled at read time and not write Customer should always be able to write to their shopping cart Application handles the conflict resolution since it is aware of the data schema (instead of letting the data store handle it) http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf

System Requirements Query model: The ability to read and write to data that is uniquely identified by a key ACID (Guarantees that transactions are processed reliably) Atomicity Consistency Isolation Durability Dynamo is only used by internal services, thus there are no security related requirements like authentication and authorization This also helps with efficiency as that means one less request per operation http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf

Key Design Principles Incremental scalability Scale one node at a time with minimal impact on the system Symmetry Every node in Dynamo should have the same set of responsibilities as its peers. Decentralization Favors p2p techniques over centralized control That way, outages won’t occur if there is failure. One node’s problems will not take the entire system down with it Heterogeneity Work distribution is proportional to the capabilities of the individual services No additional work will need to be done if adding new nodes with higher capacity or capabilities http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf

DynamoDB http://aws.amazon.com/dynamodb/

DynamoDB Not based on Dynamo, but uses similar design principles DynamoDB is a fully managed NoSQL database service Provides fast and predictable performance with seamless scalability. You can use DynamoDB to create a database table that can store and retrieve any amount of data, and serve any level of request traffic. DynamoDB automatically spreads the data and traffic for the table over a sufficient number of servers to handle the request capacity specified by the customer and the amount of data stored, while maintaining consistent and fast performance. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html

DynamoDB Automatic Scaling Consistency Writes are always consistent Reads are eventually consistent Durability Written to disk and not memory (hence SSD) Availability

Provisioned Throughput Allows for predictable performance Reserve IOPS (per table) Set at creation, scale with API (whenever you want) i.e 500 reads per second, 1000 writes per second Per 1kb item: $.01 per hour per 10 writes per second $.01 per hour per 50 strongly consistent reads per second

DynamoDB

Dynamo Data Model NoSQL Database Collection of tables, items, and attributes Attributes are a name-value pair Single-valued Multi-valued Specify Schema – Only need key Schemas are not fixed. Each item may have a different number of attributes.

Example Items { Id = 202 ProductName = "21- Bicycle 202" Description = "202 description " BicycleType = " Road " Brand = " Brand - Company A " Price = 200 Gender = " M " Color = [ " Green ", " Black " ] ProductCategory = " Bike " } http://docs.aws.amazon.com/amazondynamodb/latest/develo perguide/DataModel.html { Id = 201 ProductName = "18- Bicycle 201" Description = "201 description " BicycleType = " Road " Brand = " Brand - Company A " Price = 100 Gender = " M " Color = [ " Red ", " Black " ] ProductCategory = " Bike " }

Example – Basic Amazon

Creating a Table Log into Amazon Web Services (AWS) Services -> DynamoDB

Specifying a Primary Key Hash Key Type Unordered Hash index on primary key Hash and Range Key Type Primary key made of two attributes Unordered Hash index on primary key Sorted range index on the range key

Secondary Index Local Secondary Indexes - indexes on non-primary key attributes for quickly retrieving records in a hash partition (items that share the same hash value in their primary key) Global secondary indexes – allows querying over the whole table, not just within a partition as local secondary indexes, with any attributes. *Important for horizontal scaling

Throughput Capacity Throughput capacity can change with your needs Standard 100 MB of free storage, up to 5 writes/second and 10 reads/second of throughput capacity (432,000 writes/864,000 reads for free every day) Capacity Units Required ForHow to Calculate ReadsNumber of item reads per second × 4 KB item size (If you use eventually consistent reads, you'll get twice as many reads per second.) WritesNumber of item writes per second × 1 KB item size

Connecting to DynamoDB Create a Amazon AWS account Using AWS SDK Eclipse Plugin Create Client that is used to connect to the database

Creating Tables Connect to client Create a createTableRequest Key Type and Key Schema Specify throughput requirements Execute createTable on client

Adding an Item Use Put Operation Connect to database Create a PutItemRequest Execute PutItem using the put item request

Getting Data – Queries in DynamoDB Query vs. Scan Count and Limit Read consistency Default is eventual reads To perform a query or scan: Specify condition Create request object Run query or scan on client

Deleting Operations Connect to client Create DeleteItemRequest Specify TableName and Key Execute client.deleteItem(DeleteItemRequest)

Other Operations Conditional writes Update only if an item meets a certain condition Helps with concurrency support Safe to run operation again if response is not received Supported by putItem, updateItem, and deleteItem Atomic Counter Numeric attribute can be incremented or decremented Uses UpdateItem

Dynamo Design System Architecture & Logic

Review Partitioning Algorithm

Consistent Hashing

Preference List

Virtual Nodes

Virtual Nodes(Dilution)

Replication

Data Versioning (Eventual Consistency)

Data Versioning(Vector Clock)

Data Versioning

Reconciliation

Sloppy Quorum(Coordinator)

Sloppy Quorum(N)

Sloppy Quorum(R, W, N)

Membership and Failure Detection

Amazon measures their services at the 99.9 th percentile. Because of the nature of their business then, the utilities are targeted at controlling performance at this 99.9 th percentile. 99.9% http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf

Amazon uses a gossip based protocol for membership and failure detection. In a gossip based protocol, each node contacts another node at random and “gossips” information to them about another nodes. As nodes gossip to each other, information about membership changes get propagated throughout the system. Gossip Protocol

Gossip-Enabled Monitoring Service(GEMS), published in 2006 goes into detail about the implementation of a Gossip Protocol The system was designed to be used for failure detection. Each node maintained three things independently: Gossip List Suspect List Suspect Matrix Gossip Protocol

In measuring for failure detection, three parameters are used: Gossip Time: The time interval between two consecutive gossip messages sent out by a node Cleanup Time: The interval between the time liveness information was last received for a particular node and the time it is suspected to have failed. Consensus Time: Time interval after which consensus is reached about a failed node. Gossip Protocol

In order for a node to fail, all other nodes have to reach a consensus on its failure. The opinion of suspected nodes are discarded when trying to reach a consensus. Once a consensus has been reached, the information is broadcast to all other nodes in the system. Each node also maintains its own list of live nodes that changes only once it has received information on a consensus In addition to propagating changes through a broadcast, nodes can propagate their live lists to other nodes as well. Gossip Protocol

Two other variations: Layered Gossiping: Similar to the gossip protocol mentioned above but nodes are grouped with each other in a layered pattern Consensus is reached within a group, but liveness is broadcast to all nodes. Biased Gossiping: Nodes are more likely to gossip to nodes that are closer to themselves (in terms of network delay) Gossip Protocol

In Dynamo every node keeps a membership list of all nodes in the system. In order to add a node, it must be done through a command line on a dynamo node. In this way, the node from which it was added updates its membership list to include the new node and propagates the membership change through the gossip protocol. A new node will map itself to the hash space. Mapping information is also reconciled at the same time membership information is reconciled. Dynamo Failure Detection

Dynamo has a local mode of failure detection as well, in which if node A considers node B failed if node B does not respond to node A’s messages. As soon as node A considers node B to be failed, it starts making requests along an alternate route, periodically checking for responsiveness from B. The gossip protocol is then used to propagate only explicit leave and join methods, while individual nodes detect communication failures. Dynamo Failure Detection

Conclusion / Insight

Strengths of DynamoDB Dynamo is well suited for dynamic and large data models Scalability allows it to shrink and grow incrementally and seamlessly as needed, by easily adding and removing nodes to the network Heterogeneity allows it to efficiently utilize many and disparate server bases to the best of their ability Can do this without requiring user-side manual adjustment on the workload of each node At the forefront of cloud computing Decentralization, small entry size, consistency and stability all lend themselves to this Not a lot of precedent for cloud database systems yet, so it has a chance to carve out a large niche for itself Favors simplicity in its implementation

Strengths of DynamoDB Cont’d At the forefront of cloud computing Decentralization, small entry size, consistency and stability all lend themselves to this Not a lot of precedent for cloud database systems yet, so it has a chance to carve out a large niche for itself Favors simplicity in its implementation Revolves around key-values and hash Using SSD implementation gives it a reliability that other cloud database services don’t offer

Strengths of DynamoDB DynamoDB’s main selling point is its adaptive throughput N, W, and R value “knobs” allow you to control latency This is particularly useful for applications with high variable demands Useful for: Companies experiencing large growth Applications whose demand is highly variable during the day Companies with little room for failure in R/W Applications with easily-resolvable versioning merges

Example

Example User – Stock Trading App Characteristics: Users’ portfolios will grow the longer they use the application, so it needs to be scalable to a large degree while managing load Automatic partitioning handles this hurdle Will need to be able to handle heavy traffic during trading hours, and then can scale back significantly in the evening Easily reconcilable versioning (a stock was either bought/sold or it wasn’t, so branch merges can simply mergesort all the different transactions)

Stock Trading App Cont’d Needs: Reliability of R/W Consistency AND availability Every transaction, as it deals with money, must succeed and be processed in an orderly fashion Must be able to weather the failure of a node, so synchronization is essential DynamoDB can satisfy all of these

Stock Trading App Cont’d Additional benefits: Atomicity and strong consistency allow the user to always get the absolute current prices and information on their portfolios and trading prices

Flaws with the Example A relational database may suit stock trading more – the amount of information you need for each stock is pretty consistent (Name, trading name, price, trend, etc.) Limitations on searching could be bad here (if we want to search more than the indices or the hash)

Weaknesses of DynamoDB 64KB limit on row size More suited to many small entries For instance, large BLOBs or long text entries would not work well with DynamoDB, particularly when replication for decentralization is concerned You would want to go with big-table or H-base in such a case Conversely, doesn’t accept binary data (only strings and numbers), which can lead to some inefficiencies

Weaknesses Cont’d Slight delay between table creation and usability (product of gossip- based distribution) Sacrifices flexibility and cost-efficiency for performance Is best suited for straightforward, high-traffic databases who put a premium on low latency and high performance

DynamoDB. Dynamo Amazon runs a world-wide web store that serves tens of millions of customers at peak times using tens of thousand servers located around.

Similar presentations

Presentation on theme: "DynamoDB. Dynamo Amazon runs a world-wide web store that serves tens of millions of customers at peak times using tens of thousand servers located around."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DynamoDB. Dynamo Amazon runs a world-wide web store that serves tens of millions of customers at peak times using tens of thousand servers located around.

Similar presentations

Presentation on theme: "DynamoDB. Dynamo Amazon runs a world-wide web store that serves tens of millions of customers at peak times using tens of thousand servers located around."— Presentation transcript:

Similar presentations

About project

Feedback