70s - Database access is hard and depends on the app 80s – Relational databases come on the scene 90s – Object oriented programming and DBs 00s – Interpreted languages, Agile
Means an app that supports millions of users Represents relationships Variable usage (viral apps) Data that is important aggregated, not by itself Time-to-market vs. proper design Uptime (availability) vs. correctness Ease of management vs. customization
Rejection of RDBMS as one-size-fits-all Minimal functions and minimal admin BASE Basically available, Soft state, Eventually consistent ACID Atomic, Consistent, Isolation, Durable
Written in C Open-source and free (no royalties) New BSD license By Salvatore Sanfilippo Created for a Web analytics project Sponsored by VMWare Used by various projects: github, craigslist, stackoverflow, digg
Key-value dictionary with sets, lists Single-threaded Delayed writes Data needs to be kept in-memory Simple protocol Lack of table, schema, or database Very basic security
Session store One (or more) sessions per user Many reads, few writes Throw-away data Timeouts
Logging Rapid, low latency writes Data you don’t care that much about Not that much data (must be in-memory)
Low-latency write Many reads throughout transaction Short (less than a day) Think a shopping cart or a file upload
Data that you don’t mind losing Records that can be accessed by a single primary key Schema that that is either a single value or is a serialized object
Java Jedis (github) .NET ServiceStack (Google Code) Ruby redis-rb (github)
SET k v GET k MSET k v [k2 v2] MGET k [k2 …] GETSET k v Returns value before set, sets new value SETNX k v (only sets if does not exist) SETEX k n v (expires a key after n seconds)
Set is unordered grouping of values SADD k v SCARD k – counts set SISMEMBER k v – checks to see if v is in set SUNION k [k2 …] – adds sets SINTER k [k2 …] – intersects sets SDIFF k [k2 …] – subtracts sets
Ordered group LPUSH k v – prepends LPOP k v – removes 1 st element LINSERT k BEFORE || AFTER n v – inserts v before or after the nth element RPUSH kv – appends RPOP k v – removes last element LLEN k – number of elements LRANGE k n m – gets range n to m inclusive
SLAVEOF host port Asynchronous Can chain together pub -> slave -> slave Cannot chain together pub pub
Sorted sets (indexed but with set operations, higher big-O complexity) Hashes (many values for one key) HSET k field v – sets v for field for k HGET k field MULTI / EXEC / DISCARD / WATCH – xactions Message queues (Pub/Sub)
Startup info Client logins Databases and number of keys Background saves and time Replication
Bottleneck on memory Low CPU Disk only on flush
Amortized SET, GET, etc – O(1) KEYS – O(N) ZADD, ZREM, etc – O(log(n))
General Types of Databases Relational Databases – Oracle, Postgres, MySQL Object Stores – Objectivity, Cache, db4o Key Value Stores – Berkelely DB, Riak, Cassandra Document Stores – Mongo, Lotus, Couch Graph Databases – Neo4j, InfoGrid Redis is a key value store
Intentionally made for clustering Replicas are not consistent Written in Java Much more robust organization Often called a column-store, though this is a misnomer Imitates Dynamo
Written in Erlang Robust organization REST Api Made for clustering, similar to Cassandra Imitates Dynamo
Document store – can represent objects natively – understands its data Can access by values Much more advanced architecture Auto-sharding