Download presentation
Presentation is loading. Please wait.
Published byNatalie Goodman Modified over 6 years ago
1
NoSQL Know Your Enemy Shelly Noll SRT Solutions, Ann Arbor, MI
@shellynoll
2
Disclaimer There is lots of disagreement about this topic
Everything I say could be wrong depending on who you ask Even if it’s right today, it will probably be wrong soon
3
What is nosql? It is a database management system
with the following features: Queries do not use SQL Doesn’t guarantee ACID properties Fault-tolerant, distributed architecture Coined by Carlo Strozzi in 1998 to describe a database he created that did not expose a SQL interface Term was co-opted in 2009 when Eric Evans from Rackspace and Johan Oskarsson from Last.fm organized an event to discuss the growing trend of open-source, distributed databases
4
Consistency Availability Partition Tolerance CAP Theorem
All nodes see the same data at the same time Availability Every request receives a success/failure response Partition Tolerance Operates despite failure of part of the system A distributed system can satisfy any two of these guarantees at the same time, but not all three A couple of basic theories we need to talk about to understand the difference between relational and noSQL databases
5
ACID vs BASE Atomicity Consistency Isolation Durability
Basically Available Soft State Eventual Consistency Instead of ACID properties found in relational database, nosql has something different. What is the opposite of a an acid? Nosql databases exhibit BASE properties All or nothing (atomicity) Data must be adhere to schema and rules (consistency) No transaction interferes with another (isolation) Permanency (durability) an application works basically all the time (basically available) does not have to be consistent all the time (soft-state) but will be in some known-state state eventually (eventual consistency,
6
ACID vs BASE ACID BASE Strong consistency Isolation Focus on “commit”
Nested transactions Conservative (pessimistic) Difficult to change schema Weak consistency Best effort Approximate answer OK Aggressive (optimistic) Simpler Faster Easier to change Consistency – adheres to the rules Isolation – transactions do not interfere Dr. Eric A. Brewer (2000)
7
Why Did This Happen??? Data-related reasons
Avoidance of unneeded complexity Avoidance of object-relational mapping Avoidance of making schema changes Performance-related reasons Higher throughput Horizontal scalability and running on commodity hardware Complexity and cost of setting up database clusters Complexity – consider Twitter – You have users, status updates, relationships between users, direct messages and not much else Object-relational mapping – object-oriented programmers have to create a layer in their applications that take the data from the database and transforms it into objects the application can use – also creates the overhead in syncing the state of the objects in memory with the entities in the database – expensive, time-consuming, nosql APIs look more like the objects programmers use NoSQL compromises reliability for better performance
8
Database Types Key-Value Graph Document Store Column Store
9
Database type disagreement
Stephen Yen Ken North Rick Cattel Jonathan Ellis Wikipedia Amazon SimpleDB Entity-Attribute-Value Data Store Document Store Apache Hadoop Tabular Cassandra Wide Columnar Store Extensible Record Store Columnfamily Eventually-Consistent Key-Value Store Google Bigtable Key-Value Store HBase HyperTable Redis Data-Structures Server Collection Key-Value Cache
10
Key-Value Data is stored in a schema-less way with a key and a value
Limited querying capability Values can usually be of any data type, or could be a serialized object Variations Eventually consistent Hierarchical Ordered Key-value cache (in RAM or on disk) Memcached Redis Riak Basho Voldemort
11
Popular Key-Value stores
Vendor Language Used By Memcached Danga C LiveJournal, YouTube, Reddit, Zynga, Facebook, Twitter Redis Vmware ANSI C Github, Craigslist, Blizzard, Digg, Twitter, Flickr, Stackoverflow Riak Basho Erlang, C, C++, JavaScript Comcast, Mozilla, AOL, Ask.com Voldemort LinkedIn Java
12
Graph Based on graph theory
Data is stored as nodes (entities), properties, and edges (relationship) Allows for calculations between nodes Shortest distance between nodes Analysis of relationships AllegroGraph FlockDB GraphDB InfiniteGraph Neo4j OrientDB
13
Popular graph databases
Vendor Language Used By AllegroGraph Franz, Inc. Lisp Pfizer, Ford, Kodak, NASA, DoD FlockDB Twitter GraphDB Sones .NET InfiniteGraph Objectivity CIA, DoD Neo4j Neo Technology Java Adobe, Cisco OrientDB Apache A bunch of small companies no one’s heard of
14
Document Store Stores document-oriented or semi-structured data
Documents may be encoded as XML, YAML, JSON, BSON, PDF, MS Word, MS Excel, etc. Documents are not required to adhere to a standard schema Offers a query language to retrieve documents based on content Amazon SimpleDB Apache CouchDB Lotus Notes MongoDB
15
Popular Document stores
Vendor Language Used By CouchDB Apache Erlang Various Facebook applications MongoDB 10gen C++ MTV Networks, Craigslist, Foursquare SimpleDB Amazon
16
Column store Stores data in a tabular format
Different names for the exact same thing Wide Columnar Store ColumnFamily Tabular Entity-Attribute-Value Data Store Extensible Record Store Multivalue BigTable Apache Hadoop Cassandra Google Bigtable Hbase HyperTable
17
Popular column stores Vendor Language Used By Bigtable Google
Google File System Cassandra Apache Java Netflix, Twitter, Constant Contact, Reddit, Digg Hadoop Yahoo! HBase Facebook's messaging platform HyperTable Zvents C++ Baidu
18
An algorithm for dividing work across a distributed system
Map reduce An algorithm for dividing work across a distributed system Breaks a big task into smaller tasks that can be done in parallel Map Query Maps the input into a final format Reduce Query Operates over a set of results
19
Comparisons Performance Scalability Flexibility Complexity
Key-Value Stores High None Column Stores Moderate Low Document Stores Variable (High) Graph Databases Variable Relational Databases Ben Scofield (2010)
20
Mongodb example
21
Where wouldn’t you use nosql?
Data is critical to the function of the business/application Data has strong and/or slowly changing schema Need true transactional capabilities Need data mining capabilities Set-based updates Banking apps Healthcare apps Enterprise apps
22
Where would you use nosql?
Heavy read/write Single-user Simple, non- structured data Lack of interconnected data Doesn’t matter if it takes a while to get the data consistent Data is not critical Social networking apps Mobile apps
23
Future of nosql UnSQL A query language for NoSQL databases
Does not have data definition language Acquisition of NoSQL databases by larger companies Similar to what happened in the BI space where IBM, Microsoft, and HP acquired smaller players
24
Shelly Noll SRT Solutions, Ann Arbor, MI Twitter
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.