Intro to NoSQL Databases Tony Hannan November 2011
RDBMS problem 1: Expensive joins ● Joins are not simple/fast (on single server) ● Distributed joins do not scale well (horizontally)
NoSQL solution 1: Non-relational (no joins) ● Queries are simpler/faster ● Simpler to distribute data
RDMS problem 2: Expensive transactions ● Locking and logging are not simple/fast (single server) ● Distributed transactions do not scale well
NoSQL solution 2: No transactions ● Updates are simpler/faster ● Simpler to distribute data
RDBMS problem 3: Doesn't handle network partitions ● So can only distribute across a LAN
NoSQL solution 3: Network partition tolerant ● Can distribute across a WAN (Cloud, Internet)
RDBMS problem 4: Schema duplicity ● Schema defined in two places: application (types) and database (schema) ● Schema hard to evolve
NoSQL solution 4: Schemaless ● Database is schemaless, i.e. dynamically typed ● Schema defined in application
RDBMS problem 5: Handling non-relational data ● Hard to store/query non-relational data, e.g. trees
NoSQL solution 5: Alternative data models ● Graph ● Document ● Wide-column ● Key-value
RDBMS probem 6: Language mismatch ● Need mapping layer between programming language and SQL (overhead)
NoSQL solution 6: API ● Query language part of client programming language
Summary of NoSQL advantages 1.No joins 2.No transactions 3.WAN (partition) tolerant 4.Schemaless 5.Alternative data models 6.API Scalable Programmable } }
NoSQL problem 1: Non-relational (no joins)
Solutions 1 1.Embed 2.Denormalize 3.Client-side joins 4.Graph (hyper-joins)
NoSQL problem 2: No transactions
Solutions 2: single object transaction plus... 1.Embed so transaction hits single object 2.Relax transaction requirements 3.Compensating (single object) transactions 4.Application level transaction using single object transaction as primitive for locking
NoSQL problem 3: Weaker consistency or availability ● When distributed across a WAN (network partitions), CAP theorem states you must give up consistency or availability 1.Eventual consistency, or 2.One half of network partition can't write (but can still read)
No Solution 3: Must live with... 1.Eventual consistency, or 2.Unavailable writes in one half of partition, or 3.Distribute across LAN only (no network partitions)
NoSQL problem 4: Schemaless lacks integrity constraints
Solution 4: Application ensures integrity
NoSQL problem 5: Non-relational model
Solution 5: Adapt to alternative model ● Many alternative data models map nicely to programming language data types
NoSQL problem 6: No query language
Solution 6: Learn API ● API fits with programming language better
Summary of NoSql disadvantages 1.Non-relational 2.No transactions 3.Eventual consistency or unwritable half when partitioned 4.No data integrity checking 5.No end-user query language
Some NoSQL Databases