MongoDB First Light
Mongo DB Basics Mongo is a document based NoSQL. –A document is just a JSON object. –A collection is just a (large) set of documents –A database is just a set of collections. JSON = JavaScript Object Notation –Actually BSON = binary encoded JSON Mongo shell is a JavaScript interpreter! –(And I have never coded JavaScript, ahem...) B Christensen 2
An In-between NoSQL The ends of the spectrum –Key-value stores Know the key to access opaque blob of anything Fire-and-forget (write-and-forget) –RDB Elaborate ad-hoc queries over highly structured data (Schema) Normalized meaning ‘lots’ of tables Transactions MongoDB sits somewhere in the middle Documents have elaborate (OO) structure (but not fixed!) Rather powerful query language (no joins though) From fire-and-forget to ‘acknowledge write on all replica’ B Christensen3
JSON Get used to key/value pairs! { course: ”SAiP”, semester:”E12”, teacher: ”hbc” } Basically close to fields of OO languages –The architectural mismatch between programming language and DB concepts is lessened! B Christensen 4
Basic commands… MongoDB creates objects and collections in the fly… B Christensen 5
No schema enforced... B Christensen6
Schema: Pro and Con Schema can provide a lot of data safety –Validating data, avoid hard-to-find bugs in clients,... However, they are also costly to migrate MongoDB is pretty handy in agile and early development when the ‘schema’ changes often... B Christensen7
find() You can formulate simple queries using ‘find()’ on a collection. Of course, the parameter of find is –A JSON object! B Christensen 8
More complex queries Regular expressions, and, or... B Christensen9
Hey – what about updates? Update –1 argument: the document to find –2 argument: the values to add/set/update B Christensen10 Mongo 3 has updated the API a bit!
Adding more structure Now, after I go home you decide to give my talk grades. –No new tables, schema, etc. –We just add more structure, similar to OO Ahh – one late grade arrives – justs $push it B Christensen11
Or - using SkyCave Bærbak Christensen12
RoomRecord like stuff Bærbak Christensen13
Pretty() is pretty nice Bærbak Christensen14
RegExps Bærbak Christensen15
Sorting on fields Bærbak Christensen16
Bounded result: ‘limit’ Bærbak Christensen17
Wall exercise? Bærbak Christensen18
Adding msg Bærbak Christensen19
Players Bærbak Christensen20
Now… How do we compose the ‘getShortRoomDesc()’? SELECT r.desc FROM room r, player p WHERE p.name = ”Mikkel” AND p.pos = r. pos ??? Bærbak Christensen21
The NoSQL answer The NoSQL answer: Manual references! –It is client-side responsibility to join Find p.pos using query 1; next find r.desc using query 2 –(§4.4.2 in MongoDB manual 3.0.6) Exercise –Why it is this the right answer in a NoSQL world? Hint: Think clients, think CPU cycles – where? Bærbak Christensen22
Alternatives Solution 2: –Denormalize / Embedded documents But not always possible for complex data structures But may actually slow queries down depending on search patterns –Searching inside documents is more tedious Solution 3: –DBRefs special MongoDB feature to make it even more SQL like B Christensen23
MongoDB modeling Comparing Documents to Tables B Christensen24
Entry on social network site: Schema B Christensen25
As RDB Schema The RDB version B Christensen26
Discussion Thus Mongo has less need for joining because the datamodel is richer –Arrays of complex objects –Sub objects Avoids the RDB idioms for modeling OneToMany relations ManyToMany handled by manual references –Two ‘find()’ instead of one ‘Select’ And –Replaces many random reads with fewer sequential B Christensen27
Going Large Durability, Scaling, Replication and Sharding B Christensen28
Durability RDBs guaranty Durability –Once a data update is acknowledged, data is stored MongoDB is configurable (write concern) –Unacknowledge:fire-and-forget –Acknowledged:acknowledge the write operation –Journaled:at least one will store data –Replica acknow.:at least N replica has received the write operation B Christensen29
Scaling out To get more power/space – just add more... B Christensen30
Replication Replica sets –Primary (handles writes/reads) –N secondaries (only reads) –Eventual consistency! Failover is automatic –Secondary votes –New primary selected Experience: Easy! B Christensen31
Sharding Key goals –No change in the client side API! When our EcoSense data grows out of its boxes we do not have to change our client programs! –Auto sharding You configure your shard key as ranges on your document keys –Shard balancing Migrates data automatically if one shard grows too large Experience: Nope B Christensen32