in 10 minutes Mohannad El Dafrawy Sara Rodriguez Lino Valdivia Jr
What is MongoDB? Document database o Data is structured as schema-less JSON documents One of the most popular NoSQL solutions Cross-platform and open source o written in C++ o supports Windows, Linux, Mac OS X, Solaris
Features (I) Document-based storage and querying o Queries themselves are JSON documents Full Index Support o Allows indexing on any attribute, just like in a traditional SQL solution Replication & High Availability o Supports mirroring of data for scalability
Features (II) Auto-Sharding (horizontal scaling) o Large data sets can be divided and distributed over multiple shards Fast In-Place Updates o Update operations are atomic for contention-free performance Integrated Map/Reduce framework o Can perform map/reduce operations on top of the data
History First developed by 10gen (later MongoDB, Inc.) in 2007 Name comes from “humongous” Became open source in 2009 Latest stable release (2.4.9) released Jan 2014
Basic Ideas { _id: 1234, author: { name: “Bob Jones”, }, post: “In these troubled times I like to...“, date: { $date: “ :23UTC” }, location: [ , ], rating: 2.2, comments: [ { user: upVotes: 22, downVotes: 14, text: “Great point! I agree” }, { user: upVotes: 421, downVotes: 22, text: “You are a...” } ], tags: [ “databases”, “mongo” ] } ●Collections of JSON objects ●Embed objects within a single document ●Flexible schema ●References
Query Example db.posts.find({ author.name: “mike” }) db.posts.find({ rating: { $gt: 2 }}) db.posts.find({ tags: “software” }) db.posts.find().sort({date: -1}).limit(10) // select * from posts where ‘economy’ in tags order by ts DESC db.posts find( {tags :‘economy’}).sort({ts :-1 }).limit(10);
Note on internals documents stored as BSON (Binary JSON) memory-mapped files indexes are B-Trees {_id: ObjectId(XXXXXXXXX), hello: “world”} \x27\x00\x00\x07 _i d\x00 X X X X X X X X\x02 h e l l o\x00\x06\x00 \x00\x00 w o r l d\x00\x00
Cassandra (1.2) Best used: When you write more than you read (logging). If every component of the system must be in Java. If you require Availability + Partition Tolerance For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is data analysis. MongoDB (2.2) Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you require Consistency + Partition Tolerance For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back. source: VS
Why (and why not) MongoDB? If you need dynamic queries If you prefer to define indexes, not map/reduce functions If you need good performance on a big DB If you wanted CouchDB, but your data changes too much, filling up disks It lacks transactions, so if you're a bank, don’t use it It doesn't support SQL It doesn't have any built-in revisioning like CouchDB It doesn't have real full text searching features
Production Users Archiving - Craigslist Content Management - MTV Networks E-Commerce - Customink Real-time Analytics - intuit Social Networking - Foursquare
Long-term goals for MongoDB To add new features as: Natural language processing Full text search engine More real-time search in data
Personal conclusion Getting up to speed with MongoDB (document oriented and schema free) Advanced usage (tons of features) Administration (Easy to admin,replication,sharding) Advanced usage (Index & aggregation) BSON and Memory-Mapped There are times where not all clients can read or write. CP (Consistency and Partition Tolerance).
References MongoDB.org ( Wikipedia: MongoDB ( DB-Engines Ranking ( Interview about the future of MongoDB ( mongodb.html) mongodb.html MongoDB Inside and Outside by Kyle Banker ( How This Web Site Uses MongoDB ( ) Cassandra and MongoDB comparison ( vs-redis)