NoSQL & Document Stores

NoSQL & Document Stores
BCHB697

Outline NoSQL Document Stores Partition, Replication, Availability
XML, JSON Partition, Replication, Availability Map / Reduce BCHB697 - Edwards

NoSQL Not Only SQL Blanket term for non-traditional databases Minor and/or radical departures from tables, relational data modeling Column Stores, Document Stores, Triple Stores Typical rationale (at some cost) Scale/performance, Data model flexibility, Deployment BCHB697 - Edwards

NoSQL vs Relational Databases
Relational Databases: ACID Atomicity, Consistency, Isolation, Durability Ensures data is always self-consistent SQL query language w/ joins (relational) NoSQL: BASE Basically Available, Soft State, Eventually Consistent Give up on guarantees to achieve performance scalability Simple queries / manual "joins" BCHB697 - Edwards

NoSQL: Key Features Spread database over many nodes
Partition, replication, query execution Simple queries, indexes Flexible data-model: Data may be inconsistent, duplicated Data attributes may be dynamically added Data attributes may be inconsistent Data attributes may be complex structures BCHB697 - Edwards

NoSQL: Column Store Column Stores:
Organize table by column rather than rows Better compression of values Every column becomes an index Easy to add new columns to a table BCHB697 - Edwards

NoSQL: Document Store Database is a collection of "documents"
Structured format: JSON, XML Collection ↔ Table Document ↔ Row of Table Every document can have its own structure Different attributes, complex values Usually documents in a collection have (somewhat) consistent keys Query for documents using their keys BCHB697 - Edwards

XML Document <artist>
<artistname>Iron Maiden</<artistname> <albums> <album> <albumname>The Book of Souls</albumname> <datereleased>2015</datereleased> <genre>Hard Rock</genre> </album> <albumname>Killers</albumname> <datereleased>1981</datereleased> <albumname>Powerslave</albumname> <datereleased>1984</datereleased> <albumname>Somewhere in Time</albumname> <datereleased>1986</datereleased> </albums> </artist> BCHB697 - Edwards

JSON Document { "artistName" : "Iron Maiden", "albums" : [
"albumname" : "The Book of Souls", "datereleased" : 2015, "genre" : "Hard Rock" }, { "albumname" : "Killers", "datereleased" : 1981, "albumname" : "Powerslave", "datereleased" : 1984, "albumname" : "Somewhere in Time", "datereleased" : 1986, } ] BCHB697 - Edwards

JSON Syntax Dictionaries Lists Strings, Numbers, Boolean, Null
{ <key1>: <value1>, … ,<key2>: <value2> } Lists [ <value1>, …, <valuen> ] Strings, Numbers, Boolean, Null "string", 1, 5.6, true, null White-space is ignored Newlines, spaces, tabs Maps directly to modern programming lang. BCHB697 - Edwards

CouchDB Document Store for JSON documents Apache Foundation project
Can act as web-application back-end server Interactive browsing using Fauxton EdwardsLab: CouchDB, Fauxton UniProt database See also: MongoDB, CouchBase, … curl -X POST -H 'Content-Type: application/json' -u admin:admin ' BCHB697 - Edwards

Why Document Stores Documents can be partitioned across many commodity compute nodes Query requests sent to each compute node, executed in parallel on data partition Writes can be executed against any convenient node and in parallel Data can be replicated for performance and robustness reasons Flexible attributes can be determined later BCHB697 - Edwards

Why Not Document Stores
No complex relational queries Complex values can't be readily indexed. Inconsistent keys can make the application logic convoluted Flexible data-model can lead to ad-hoc and on-the-fly modeling decisions Logical data-model is still needed, even if only "on paper," for application success BCHB697 - Edwards

Partition, Replication, Availability
Documents spread across (many) commodity servers (sharding): Cheaper, more fault tolerant than massive server Replicate document for availability Inserts, retrievals can operate in parallel All documents must self contained Query by id can be sent to single server Query by key value is executed by all servers in parallel and results merged BCHB697 - Edwards

Map Reduce / Hadoop Simple computational model for large scale parallel data processing Esp. good for partitioned document store queries Map: Each server executes on its portion of the document collection independently Reduce: Results from each server are merged in batches as they become available BCHB697 - Edwards

Document Stores Flexible data-model: rapid prototyping
Simple queries, clear retrieval priorities Push application logic to the middle-tier or client – avoid complex queries in RDMS Scalability, sharding, partitioning esp. for writes, updates BCHB697 - Edwards

Exercise Explore Fauxton interface, URL interface to CouchDB:
CouchDB, Fauxton Explore JSON web-services here: Python script to interact with CouchDB: urllib, json modules Extract documents from uniprot database BCHB697 - Edwards

Exercise import urllib, json
base = + \ 'couchdb/uniprot/' data = json.loads(urllib.urlopen(base + '_all_docs').read()) for r in data['rows']: id = r['id']a entry = json.loads(urllib.urlopen(base + id).read()) print entry BCHB697 - Edwards

NoSQL & Document Stores

Similar presentations

Presentation on theme: "NoSQL & Document Stores"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NoSQL & Document Stores

Similar presentations

Presentation on theme: "NoSQL & Document Stores"— Presentation transcript:

Similar presentations

About project

Feedback