Databases with Scalable capabilities Presented by Mike Trischetta
Scalability ◦ Vertical Improve one resource ◦ Horizontal Add more resources Flexible based on needs NoSQL and Scaling ◦ Easier to partition ◦ More loosely specified ◦ Reduced overhead
Desirable Properties Transaction Processing Database Scalability
Desirable properties ◦ Data Consistency ◦ Availability ◦ Predictable performance ◦ Scalable & High Performance Storage Data structures ◦ Must handle large, mixed structures
ACID properties ◦ Atomicity ◦ Consistency ◦ Isolation ◦ Durability ◦ Strong Consistency Not essential for all use cases
CAP properties ◦ Consistency: whenever data is written, all calls to the DB will yield the same version of that data ◦ Availability: Each operation terminates in an intended response ◦ Partition tolerance: The database can still operate when parts of it are completely inaccessible Weak or Eventual Consistency CAP theorem ◦ For any system sharing data, it is impossible to guarantee each of the three CAP properties. ◦ Designer must prioritize C, A, or P
RDBMS ◦ Data replication for consistency ◦ Grids NoSQL ◦ Use data partitioning ◦ Concurrent computation ACID ◦ Never guaranteed with NoSQL ◦ Always guaranteed with RDBMS Can & do still scale vertically Concurrency still possible DBMS choice depends on use case
Data Models Querying Transactions Physical Data Storage
“Not only SQL” / post-relational Data Model ◦ Distributed Hash Tables (DHT) ◦ key value couples hashed into buckets ◦ Horizontal Row partitioning No join, aggregation, order, or nesting operations – must be done client-side Allows parallel operations
JSON format ◦ Key ~ attributes ◦ Document ~ tuples Big Hash Tables
Executed via key hashing Restricted SQL commands Additional operations ◦ get(key) ◦ put(key, value) ◦ execute(key, operations, parameters) ◦ Variations in complexity as per DBMS ◦ Usually return tuples
ACID not possible ◦ Caveat: some systems do allow it Over time, reach consistency ◦ Weak/Eventual consistency
Spread over multiple nodes ◦ Tablets ◦ Rows split between nodes ◦ Tablet = table name + end key Hierarchical File-based
DBMS choice depends on use cases NoSQL ◦ Increased speed for large networks ◦ Flexible horizontal scaling ◦ Cheaper than legacy systems RDBMS ◦ Retains ACID properties ◦ Flexible vertical scaling ◦ Can become expensive to upgrade/maintain
Questions?