Presentation is loading. Please wait.

Presentation is loading. Please wait.

NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB.

Similar presentations


Presentation on theme: "NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB."— Presentation transcript:

1 NoSQL Databases - CouchDB By Tom Sausner

2 Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB  Overview  Interacting with data  Examples Technologies applying Couch DB

3 What does it mean? Not Only SQL or NO! SQL A more general definition… a datastore that does not follow the relational model including using SQL to interact with the data. Why?  One size does not fit all  Relational Model has scaling issues  Freedom from the tyranny of the DBA?

4 CAP Theorem Eric Brewer of U.C. Berkeley, Seth Gilbert and Nancy Lynch, of MIT Relates to distributed systems Consistency, Availability, Partition Tolerance… pick 2 A distributed system is built of “nodes” (computers), which can (attempt to) send messages to each other over a network….

5 Consistency “is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time.”  Not the same as “ACID” Linearizability ~ operations behave as if there were no concurrency. Does not mention transactions

6 Available “every request received by a non-failing node in the system must result in a response.” says nothing about the content of the response. It could be anything; it need not be “successful” or “correct”.

7 Partition Tolerant any guarantee of consistency or availability is still guaranteed even if there is a partition. if a system is not partition-tolerant, that means that if the network can lose messages or any nodes can fail, then any guarantee of atomicity or consistency is voided.

8 Implications of CAP How to best scale your application? The world falls broadly into two ideological camps: the database crowd and the non-database crowd. The database crowd, unsurprisingly, like database technology and will tend to address scale by talking of things like optimistic locking and sharding The non-database crowd will tend to address scale by managing data outside of the database environment (avoiding the relational world) for as long as possible.

9 Types of NoSQL datastores Key - value stores Column stores Document stores Oject stores

10 Key Value stores Memcache ( just merged with CouchDB) Redis Riak

11 Column Stores Big Table ( Google ) Dynamo Cassandra Hadoop/HBase

12 Document Stores Couch DB Mongo

13 Graph, Object Stores Neo4J db4o

14 Couch DB - relax ( taken from website) An Apache project create by….Damien Katz… A document database server, accessible via a RESTful JSON API. Ad-hoc and schema-free with a flat address space. Distributed, featuring robust, incremental replication with bi-directional conflict detection and management. Recently merged with Membase

15 More on CouchDB The CouchDB file layout and commitment system features all Atomic Consistent Isolated Durable (ACID) properties. Document updates (add, edit, delete) are serialized, except for binary blobs which are written concurrently. CouchDB read operations use a Multi-Version Concurrency Control (MVCC) model where each client sees a consistent snapshot of the database from the beginning to the end of the read operation. Eventually Consistent

16 Couch DB Access via CURL curl http://127.0.0.1:5984/ curl -X GET http://127.0.0.1:5984/_all_dbs curl -X PUT http://127.0.0.1:5984/baseball // error.... already exist curl -X PUT http://127.0.0.1:5984/baseballhttp://127.0.0.1:5984/baseball curl -X DELETE http://127.0.0.1:5984/baseball

17 Adding Doc’s via CURL curl -X PUT http://127.0.0.1:5984/albumshttp://127.0.0.1:5984/albums curl -X PUT http://127.0.0.1:5984/albums/1000 - d '{"title":"Abbey Road","artist":"The Beatles"} ' Uuids curl -X GET http://127.0.0.1:5984/_uuids curl -X GET http://127.0.0.1:5984/albums/1000http://127.0.0.1:5984/albums/1000 _rev - If you want to update or delete a document, CouchDB expects you to include the _rev field of the revision you wish to change curl -X PUT http://127.0.0.1:5984/albums/1000 - d '{"_rev":"1- 42c7396a84eaf1728cdbf08415a09a41","title":"A bbey Road", "artist":"The Beatles","year":"1969"}'

18 Futon… Couch DB Maintenence http://127.0.0.1:5984/_utils/index.html Albums database review  Add another document Tools Database, Document, View Creation Secuity, Compact & Cleanup Create and Delete

19 Demo Setup Examples implemented in Groovy Use HttpBuilder to interact with the database Groovy RESTClient Use google GSON to move objects between JSON and Java/Groovy Use Federal Contribution database for our dataset. Eclipse

20 Data Loading Review Limited input to NY candidates, and only year 2010 contributions.fec.2010.csv Groovy bean for input data Readfile.groovy contribDB.put(path:"fed_contrib_test/${contrib.transactio nId}", contentType: JSON, requestContentType: JSON, body:json )

21 Couch DB Design Documents CouchDB is designed to work best when there is a one-to-one correspondence between applications and design documents. _design/”design_doc_name” Design Documents are applications  Ie. A CouchDB can be an application.

22 Design Documents contents Update Handler  updates: {"hello" : function(doc, req) {…} Views ( more on this later) Validation Shows Lists Filters libs

23 Updates If you have multiple design documents, each with a validate_doc_update function, all of those functions are called upon each incoming write request If any of the validate functions fail then the document is not added to the database

24 Validation Validation functions are a powerful tool to ensure that only documents you expect end up in your databases. validate_doc_update section of the view document function(newDoc, oldDoc, userCtx) {}  throw({forbidden : message});  throw({unauthorized : message});

25 Ok, how can I see my data? CouchDB design documents can contain a “views” section Views contain Map/Reduce functions Map/Reduce functions are implemented in javascript  However there are different Query Servers available using different languages

26 Views Filtering the documents in your database to find those relevant to a particular process. Building efficient indexes to find documents by any value or structure that resides in them Extracting data from your documents and presenting it in a specific order. Use these indexes to represent relationships among documents.

27 Map/Reduce dialog Bob: So, how do I query the database? IT guy: It’s not a database. It’s a key-value store. Bob: OK, it’s not a database. How do I query it? IT guy: You write a distributed map-reduce function in Erlang. Bob: Did you just tell me to go screw myself? IT guy: I believe I did, Bob.

28 Map/Reduce in CouchDB Map functions have a single parameter a document, and emit a list of key/value pairs of JSON values  CouchDB allows arbitrary JSON structures to be used as keys Map is called for every document in the database  Efficiency? emit() function can be called multiple times in the map function View results are stored in B-Trees

29 Reduce/Rereduce The reduce function is optional used to produce aggregate results for that view Reduce functions must accept, as input, results emitted by its corresponding map function as well as results returned by the reduce function itself(rereduce). On rereduce the key = null On a large database objects to be reduced will be sent to your reduce function in batches. These batches will be broken up on B-tree boundaries, which may occur in arbitrary places.

30 More on Map/Reduce Linked Documents - If you emit an object value which has {'_id': XXX} then include_docs=true will fetch the document with id XXX rather than the document which was processed to emit the key/value pair. Complex Keys  emit([lastName, firstName, zipcode], doc) Grouping Grouping Levels

31 Restrictions on Map/Reduce Map functions must be referentially transparent. Given the same doc will always issue the same key/value pairs  Allows for incremental update reduce functions must be able reduce on its own output  This requirement of reduce functions allows CouchDB to store off intermediated reductions directly into inner nodes of btree indexes, and the view index updates and retrievals will have logarithmic cost

32 List Donors Map: function(doc) { if(doc.recipientName){ emit(doc.recipientName, doc); } else if(doc.recipientType){ emit(doc.recipientType, doc) } No reduce function

33 List of Query Parameters key startkey, endkey startkey_docid, endkey_docid limit, skip, stale, decending group, grouplevel reduce include_docs, inclusive_end

34 List all NY candidates Want a list of all of the unique candidates in the database Map:  emit(doc.recipientType, null); Reduce:  return true Must set group = true

35 Total Candidate Donations List the total campaign contributions for each candidate Map:  emit(doc.recipientType, doc.amount) Reduce:  function(keys, values) { var sum = 0; for(var idx in keys) { sum = sum + parseFloat(values[idx]); } return sum; Must set group=true

36 Donation Totals by Zip Complex Keys In the map function:  emit([doc.recipientType, doc.contributorZipCode], doc.amount); Reduce:  function(keys, values) { var sum = 0; for(var idx in keys) { sum = sum + parseFloat(values[idx]); } return sum; }

37 Referencing other documents

38 Conflict Management Multi-Version Concurrency Control (MVCC) CouchDB does not attempt to merge the conflicting revisions this is an application If there is a conflict in revisions between nodes  App is ultimately responsible for resolving the conflict  All revisions are saved  One revision is selected as the most recent  _conflict property set

39 Database Replication “CouchDB has built-in conflict detection and management and the replication process is incremental and fast, copying only documents and individual fields changed since the previous replication.” replication is a unidirectional process. Databases in CouchDB have a sequence number that gets incremented every time the database is changed.

40 Replication Continued "continuous”: true…  automatically replicate over any new docs as they come into the source to the target…there’s a complex algorithm determining the ideal moment to replicate for maximum performance. Create albums_backup using futon replicator curl -X PUT http://127.0.0.1:5984/albums/1010 - d '{"title":"Let It Be","artist":"The Beatles"} '

41 Replication & Conflict Replicate albums db via Futon curl -X PUT http://127.0.0.1:5984/albums/1050 - d '{"title":”RJUG Roundup","artist":"Rob", ”year":”2010"} ’ Replicate again curl -X PUT http://127.0.0.1:5984/albums_backup/1050 -d '{"title":”RJUG Roundup","artist":"Rob", ”year":”2011"} ’ Replicate, review

42 Notifications Polling, long polling  _changes If executing not from a browser can request continuous changes Filters can be applied to changes  Ex only notify when level = error filterName:function(doc, req)  Req contains query parameters  Also contains userCtx

43 Security ships with OAuth, cookie auth handler, default - standard http Authorizations  Reader - read/write document  Database Admin - compact, add/edit views  Server Admin - create and remove databases

44 CouchDB Applied CouchOne  Hosting Services  CouchDB on Android CouchApp  HTML5 applications jCouchDB  Java layer for CouchDB access CouchDB Lounge  Clustering support

45 Links http://couchdb.apache.org/ http://wiki.apache.org/couchdb/FrontPage http://guide.couchdb.org/editions/1/en/inde x.html

46 Questions? Thanks!


Download ppt "NoSQL Databases - CouchDB By Tom Sausner. Agenda Introduction Review of NoSQL storage options  CAP Theorem  Review categories of storage options CouchDB."

Similar presentations


Ads by Google