Presentation is loading. Please wait.

Presentation is loading. Please wait.

NoSQL & Document Stores

Similar presentations


Presentation on theme: "NoSQL & Document Stores"— Presentation transcript:

1 NoSQL & Document Stores
BCHB697

2 Outline NoSQL Document Stores Partition, Replication, Availability
XML, JSON Partition, Replication, Availability Map / Reduce BCHB697 - Edwards

3 NoSQL Not Only SQL Blanket term for non-traditional databases Minor and/or radical departures from tables, relational data modeling Column Stores, Document Stores, Triple Stores Typical rationale (at some cost) Scale/performance, Data model flexibility, Deployment BCHB697 - Edwards

4 NoSQL vs Relational Databases
Relational Databases: ACID Atomicity, Consistency, Isolation, Durability Ensures data is always self-consistent SQL query language w/ joins (relational) NoSQL: BASE Basically Available, Soft State, Eventually Consistent Give up on guarantees to achieve performance scalability Simple queries / manual "joins" BCHB697 - Edwards

5 NoSQL: Key Features Spread database over many nodes
Partition, replication, query execution Simple queries, indexes Flexible data-model: Data may be inconsistent, duplicated Data attributes may be dynamically added Data attributes may be inconsistent Data attributes may be complex structures BCHB697 - Edwards

6 NoSQL: Column Store Column Stores:
Organize table by column rather than rows Better compression of values Every column becomes an index Easy to add new columns to a table BCHB697 - Edwards

7 NoSQL: Document Store Database is a collection of "documents"
Structured format: JSON, XML Collection ↔ Table Document ↔ Row of Table Every document can have its own structure Different attributes, complex values Usually documents in a collection have (somewhat) consistent keys Query for documents using their keys BCHB697 - Edwards

8 XML Document <artist>
<artistname>Iron Maiden</<artistname> <albums> <album> <albumname>The Book of Souls</albumname> <datereleased>2015</datereleased> <genre>Hard Rock</genre> </album> <albumname>Killers</albumname> <datereleased>1981</datereleased> <albumname>Powerslave</albumname> <datereleased>1984</datereleased> <albumname>Somewhere in Time</albumname> <datereleased>1986</datereleased> </albums> </artist> BCHB697 - Edwards

9 JSON Document { "artistName" : "Iron Maiden", "albums" : [
"albumname" : "The Book of Souls", "datereleased" : 2015, "genre" : "Hard Rock" }, { "albumname" : "Killers", "datereleased" : 1981, "albumname" : "Powerslave", "datereleased" : 1984, "albumname" : "Somewhere in Time", "datereleased" : 1986, } ] BCHB697 - Edwards

10 JSON Syntax Dictionaries Lists Strings, Numbers, Boolean, Null
{ <key1>: <value1>, … ,<key2>: <value2> } Lists [ <value1>, …, <valuen> ] Strings, Numbers, Boolean, Null "string", 1, 5.6, true, null White-space is ignored Newlines, spaces, tabs Maps directly to modern programming lang. BCHB697 - Edwards

11 CouchDB Document Store for JSON documents Apache Foundation project
Can act as web-application back-end server Interactive browsing using Fauxton EdwardsLab: CouchDB, Fauxton UniProt database See also: MongoDB, CouchBase, … curl -X POST -H 'Content-Type: application/json' -u admin:admin ' BCHB697 - Edwards

12 Why Document Stores Documents can be partitioned across many commodity compute nodes Query requests sent to each compute node, executed in parallel on data partition Writes can be executed against any convenient node and in parallel Data can be replicated for performance and robustness reasons Flexible attributes can be determined later BCHB697 - Edwards

13 Why Not Document Stores
No complex relational queries Complex values can't be readily indexed. Inconsistent keys can make the application logic convoluted Flexible data-model can lead to ad-hoc and on-the-fly modeling decisions Logical data-model is still needed, even if only "on paper," for application success BCHB697 - Edwards

14 Partition, Replication, Availability
Documents spread across (many) commodity servers (sharding): Cheaper, more fault tolerant than massive server Replicate document for availability Inserts, retrievals can operate in parallel All documents must self contained Query by id can be sent to single server Query by key value is executed by all servers in parallel and results merged BCHB697 - Edwards

15 Map Reduce / Hadoop Simple computational model for large scale parallel data processing Esp. good for partitioned document store queries Map: Each server executes on its portion of the document collection independently Reduce: Results from each server are merged in batches as they become available BCHB697 - Edwards

16 Document Stores Flexible data-model: rapid prototyping
Simple queries, clear retrieval priorities Push application logic to the middle-tier or client – avoid complex queries in RDMS Scalability, sharding, partitioning esp. for writes, updates BCHB697 - Edwards

17 Exercise Explore Fauxton interface, URL interface to CouchDB:
CouchDB, Fauxton Explore JSON web-services here: Python script to interact with CouchDB: urllib, json modules Extract documents from uniprot database BCHB697 - Edwards

18 Exercise import urllib, json
base = + \ 'couchdb/uniprot/' data = json.loads(urllib.urlopen(base + '_all_docs').read()) for r in data['rows']: id = r['id']a entry = json.loads(urllib.urlopen(base + id).read()) print entry BCHB697 - Edwards


Download ppt "NoSQL & Document Stores"

Similar presentations


Ads by Google