Presentation is loading. Please wait.

Presentation is loading. Please wait.

Christian Stark and Odbayar Badamjav

Similar presentations


Presentation on theme: "Christian Stark and Odbayar Badamjav"— Presentation transcript:

1 Christian Stark and Odbayar Badamjav
NoSQL: Dynamic DB Christian Stark and Odbayar Badamjav

2 History of NoSQL Eric Evans of Rackspace, a committer on the Cassandra project, introduced the term NoSQL in 2009. Amazon released research paper on Amazon Dynamo in 2007. MongoDB started in 2007 as a part of an open source cloud Facebook's open source Cassandra project (now maintained by Apache) in 2008. The advent of distributed and parallel computing gave rise to alternatives to relational databases. As cloud computing became more affordable and mainstream, home grown NoSQL DBs went open source.

3 What is NoSQL?

4 What is NoSQL? Not to be confused with the NoSQL database system (a RDBMS). "NoSQL" or "not only SQL" is a class of databases that are more broad and encompassing than SQL-based databases. Most relational databases are subsets of functionality that "NoSQL" databases can offer.

5 More about NoSQL It doesn't have a fixed data model, or predefined schema. Is not necessarily a "one-fits-all" solution. Leaves room for more tailored solutions, on a per- application basis. Many different types of implementations SQL NoSQL tables Collections Rows Documents Columns Fields

6 Usually favorable in distributed, large data settings
Why use NoSQL? Usually favorable in distributed, large data settings High availability High fault tolerance Open source Easy to implement Wide variety of solutions both as a service, and as different implementations The appeal of NoSQL is that it handles mass quantities of data, quickly, across a cluster of servers that share resources, making it both fast and reliable. The fact that it's open source keeps costs down, and it's easier to use than conventional databases

7 Brewer's CAP Theorem Consistency Availability Partition Tolerance
The theorem states that it is impossible to have all three of these aspects present in a distributed database system. It states that, at most, two may be present if compromises are made in the other. For NoSQL, the trade off is consistency for partition tolerance and availability. Most solutions lie somewhere in a continuum between ACID and BASE.

8 Brewer CAP Theorem

9 ACID or BASE? BASE (NoSQL)
Basic Availability -- Be able to expect a timely, or quick response Soft-state -- What the database replies is good enough for now. Eventual Consistency -- Code is written to handle each type of inconsistency as they are discovered. ACID (RDBMS) Atomicity -- All or nothing, per transaction Consistency -- Transactions leave in consistent state Isolation -- One transaction does not interfere with another. Durability -- Transactions persist restarts, other interruptions in database engine

10 Amazon Stock Quantity Example
If two users were to place the same item in their carts, and purchase the same product within a short period of time when there was only one of the item left, what should happen? RDBMS would ensure consistency, a process that would take more time than is tolerable for a customer. If it returned an answer in time, the answer it would return would be a success to one, and failure to another, and possibly offer to backorder the item for the customer.

11 Continued Amazon Example
A NoSQL/Eventual Consistency approach would most likely accept both purchases. When the system discovers the inconsistency in the data, it alert the customer that ordered the item last that it has been placed on back order. Companies have found that there can be severe penalties for future traffic when there are delays in making these types of transactions.

12 Real world usage What companies or organizations use NoSQL?
Google (BigTable) eBay (Hadoop) Amazon (Dynamo) Twitter (FlockDB, a graph-type db, and Cassandra) Yahoo (Hadoop) Facebook (Hadoop) Craigslist (MongoDB) Netflix (Apache's Cassandra) Many companies use NoSQL and RDBMS together for different parts of applications.

13 How it works: A look at MongoDB
Features: Dynamic schemas, JSON-style documents Full indexing for all fields/attributes Scales horizontally Fast, in-place updates to data GridFS, store files of any size, distributed Ad-hoc queries allow for dynamic queries that are similar to those of RDBMS "Sharding" -- Auto scaling for balancing and fault tolerance Official Drivers exist for Java, Ruby, PHP, Perl, C, C++, Erlang, Haskell, Javascript, Python, and Scala. Many community supported drivers available.

14 Database Name CouchDB MongoDB MySQL Data Model Document-Oriented (JSON) Document-Oriented (BSON) Relational Data Types string,number,boolean,array,object string, int, double, boolean, date, bytearray, object, array, others link Large Objects (Files) Yes (attachments) Yes (GridFS) Blobs Horizontal partitioning scheme CouchDB Lounge Auto-sharding Partitioning Replication Master-master (with developer supplied conflict resolution) Master-slave and replica sets Master-slave, multi-master, and circular replication Object(row) Storage One large repository Collection-based Table-based Query Method Map/reduce of javascript functions to lazily build an index per query Dynamic; object-based query language Dynamic; SQL Secondary Indexes Yes

15 Database Name CouchDB MongoDB MySQL Interface REST Native drivers ; REST add-on Native drivers Server-side batch data manipulation ? Map/Reduce, server-side javascript Yes (SQL) Written in Erlang C++ Concurrency Control MVCC Update in Place Geospatial Indexes GeoCouch Yes Spatial extensions Distributed Consistency Model Eventually consistent (master-master replication with versioning and version reconciliation) Strong consistency. Eventually consistent reads from secondaries are available. Atomicity Single document Single document Yes - advanced

16 Available Hosted Services
Amazon's DynamoDB - pay for what you use Amazon's SimpleDB - pay for what you use MongoLab (MongoDB) - free plan IrisCouch (CouchDB) - free for modest use Cloudant (CouchDB) - free plan Many more are available with free starter plans as well. Amazon and most others allow you to pay for only what you use, and save costs.

17 NoSQL Projects Apache's CouchDB -- Document-store type, incremental replication with "bi-directional conflict detection and resolution" Apache's Cassandra -- linear scalability and high availability mongoDB -- "scalable, high performance open source, NoSQL database" Apache's HBase -- sits on Hadoop/HDFS Redis -- In memory, distributed key/value store, with optional persistence Google's BigTable -- Available as Google App Engine Datastore; tabular (3-dimensional mapping)

18 Conclusion NoSQL can be a valuable tool for large, distributed data sets that need to scale and have high read/write ability. NoSQL is not a replacement for RDBMS, but a supplement for it. Most applications use both for different use cases. NoSQL can be a simple, resilient database that is easy to deploy. More information available. See Wikipedia.


Download ppt "Christian Stark and Odbayar Badamjav"

Similar presentations


Ads by Google